Re: [PR] Open-API: Refactor updates with discriminator [iceberg]

2023-12-11 Thread via GitHub


Fokko merged PR #9240:
URL: https://github.com/apache/iceberg/pull/9240


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422076161


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -239,33 +240,24 @@ public boolean dropTable(TableIdentifier identifier, 
boolean purge) {
 TableReference tableReference = parseTableReference(identifier);
 return client
 .withReference(tableReference.getReference(), tableReference.getHash())
-.dropTable(identifierWithoutTableReference(identifier, 
tableReference), purge);
+.dropTable(identifierWithoutTableReference(identifier, 
tableReference), false);

Review Comment:
   @ajantha-bhat shouldn't this keep the `purge` flag?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]

2023-12-11 Thread via GitHub


nastra commented on issue #9086:
URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1849563358

   For parameterized testing we would need to have 
https://github.com/apache/iceberg/issues/9210 implemented first. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9161:
URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422097196


##
LICENSE:
##
@@ -323,3 +323,15 @@ This product includes code from Apache HttpComponents 
Client.
 Copyright: 1999-2022 The Apache Software Foundation.
 Home page: https://hc.apache.org/
 License: https://www.apache.org/licenses/LICENSE-2.0
+
+
+
+This product includes code from Apache Flink.
+
+* Parameterized test at class level logic in ParameterizedTestExtension.java
+* Parameter provider annotation for parameterized tests in Parameters.java
+* Parameter field annotation for parameterized tests in Parameter.java
+
+Copyright: 1999-2022 The Apache Software Foundation.
+Home page: https://hc.apache.org/

Review Comment:
   ```suggestion
   Home page: https://flink.apache.org/
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9161:
URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422099227


##
api/src/test/java/org/apache/iceberg/Parameter.java:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+import org.junit.runners.Parameterized;
+
+/**
+ * The annotation is used to replace {@link Parameterized.Parameter} for Junit 
5 parameterized
+ * tests.

Review Comment:
   The javadoc for this and the other classes should mention that this is taken 
from the Flink codebase (with a link pointing to it)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9161:
URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422100956


##
api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java:
##
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.Method;
+import java.text.MessageFormat;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Stream;
+import org.assertj.core.util.Preconditions;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.BeforeEachCallback;
+import org.junit.jupiter.api.extension.Extension;
+import org.junit.jupiter.api.extension.ExtensionContext;
+import org.junit.jupiter.api.extension.ParameterContext;
+import org.junit.jupiter.api.extension.ParameterResolutionException;
+import org.junit.jupiter.api.extension.ParameterResolver;
+import org.junit.jupiter.api.extension.TestTemplateInvocationContext;
+import org.junit.jupiter.api.extension.TestTemplateInvocationContextProvider;
+import org.junit.platform.commons.support.AnnotationSupport;
+import org.junit.platform.commons.support.HierarchyTraversalMode;
+
+/**
+ * This extension is used to implement parameterized tests for Junit 5 to 
replace Parameterized in
+ * Junit4.
+ *
+ * When use this extension, all tests must be annotated by {@link 
TestTemplate}.
+ */
+public class ParameterizedTestExtension implements 
TestTemplateInvocationContextProvider {
+
+  private static final ExtensionContext.Namespace NAMESPACE =
+  ExtensionContext.Namespace.create("parameterized");
+  private static final String PARAMETERS_STORE_KEY = "parameters";
+  private static final String PARAMETER_FIELD_STORE_KEY_PREFIX = 
"parameterField_";
+  private static final String INDEX_TEMPLATE = "{index}";
+
+  @Override
+  public boolean supportsTestTemplate(ExtensionContext context) {
+return true;
+  }
+
+  @Override
+  public Stream 
provideTestTemplateInvocationContexts(
+  ExtensionContext context) {
+
+// Search method annotated with @Parameters
+final List parameterProviders =
+AnnotationSupport.findAnnotatedMethods(
+context.getRequiredTestClass(), Parameters.class, 
HierarchyTraversalMode.TOP_DOWN);
+if (parameterProviders.isEmpty()) {
+  throw new IllegalStateException("Cannot find any parameter provider");
+}
+if (parameterProviders.size() > 1) {
+  throw new IllegalStateException("Multiple parameter providers are 
found");
+}
+
+Method parameterProvider = parameterProviders.get(0);
+// Get potential test name
+String testNameTemplate = 
parameterProvider.getAnnotation(Parameters.class).name();
+
+// Get parameter values
+final Object parameterValues;
+try {
+  parameterProvider.setAccessible(true);
+  parameterValues = parameterProvider.invoke(null);
+  context.getStore(NAMESPACE).put(PARAMETERS_STORE_KEY, parameterValues);
+} catch (Exception e) {
+  throw new IllegalStateException("Failed to invoke parameter provider", 
e);
+}
+Preconditions.checkState(parameterValues != null, "Parameter values cannot 
be null");
+
+// Parameter values could be Object[][]
+if (parameterValues instanceof Object[][]) {
+  Object[][] typedParameterValues = (Object[][]) parameterValues;
+  return createContextForParameters(
+  Arrays.stream(typedParameterValues), testNameTemplate, context);
+}
+
+// or a Collection
+if (parameterValues instanceof Collection) {
+  final Collection typedParameterValues = (Collection) 
parameterValues;
+  final Stream parameterValueStream =
+  typedParameterValues.stream()
+  .map(
+  (Function)
+  parameterValue -> {
+if (parameterValue instanceof Object[]) {
+  return (Object[]) parameterValue;
+} else {
+  return new Object[] {parameterValue};
+}
+  });
+  

Re: [I] iceberg-mr: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]

2023-12-11 Thread via GitHub


nastra closed issue #9083: iceberg-mr: Switch tests to JUnit5 + AssertJ-style 
assertions
URL: https://github.com/apache/iceberg/issues/9083


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-11 Thread via GitHub


nastra merged PR #9241:
URL: https://github.com/apache/iceberg/pull/9241


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Modernize the readme's Status paragraph [iceberg]

2023-12-11 Thread via GitHub


mt-ronkorving opened a new pull request, #9272:
URL: https://github.com/apache/iceberg/pull/9272

   I've updated the ReadMe's Status paragraph to be more inline with the 
reality today and hopefully won't have to be kept up-to-date going forward. 
Please let me know if there are any inaccuracies in the text, or stylistic 
changes anyone would like to see. Thank you.
   
   Closes #9127


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] AWS access denied from metadata file on S3 [iceberg-python]

2023-12-11 Thread via GitHub


itaise opened a new issue, #201:
URL: https://github.com/apache/iceberg-python/issues/201

   ### Apache Iceberg version
   
   0.1.0
   
   ### Please describe the bug 🐞
   
   Hi, 
   I am trying to read a table schema and getting permission denied. 
   This is the code that I use: 
   ```
   from pyiceberg.catalog import load_catalog
   catalog = load_catalog("prod")
   table = catalog.load_table("stores.checkout")
   table
   ```
   
   The code fails when trying to open the metadata file which is in S3 [On this 
line](https://github.com/apache/iceberg-python/blob/main/pyiceberg/catalog/hive.py#L240)
   
   I am getting the following error (I removed the file name and bucket name 
but they are the correct ones):
   (, OSError("When reading information for key 'X' in bucket 
'X': AWS Error ACCESS_DENIED during HeadObject operation: No response body."), 
)
   
   I checked that the credentials have HeadObject permission to this file. 
   
   The Env variables which I set are: 
   PYICEBERG_CATALOG__DEFAULT__S3__ACCESS_KEY_ID
   PYICEBERG_CATALOG__DEFAULT__S3__SECRET_ACCESS_KEY
   PYICEBERG_CATALOG__PROD__URI
   
   Can someone please assist? 
   Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] The "Status" paragraph in the readme seems very outdated [iceberg]

2023-12-11 Thread via GitHub


ronkorving commented on issue #9127:
URL: https://github.com/apache/iceberg/issues/9127#issuecomment-1849677224

   PR is ready for review ^


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Data: Add GenericFileWriterFactory [iceberg]

2023-12-11 Thread via GitHub


aokolnychyi commented on code in PR #9267:
URL: https://github.com/apache/iceberg/pull/9267#discussion_r1422245545


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java:
##
@@ -219,27 +219,27 @@ public void testPrimitiveColumns() throws Exception {
 
 Row binaryCol =
 Row.of(
-59L,

Review Comment:
   The column sizes have changed cause the new writer picks up all table 
properties, which was not true before. I validated the actual values are 
correct using `parquet-tools`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Data: Add GenericFileWriterFactory [iceberg]

2023-12-11 Thread via GitHub


aokolnychyi commented on PR #9267:
URL: https://github.com/apache/iceberg/pull/9267#issuecomment-1849745072

   @nastra @Fokko @flyrain @amogh-jahagirdar, could you check this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can databricks volume be implemented on Iceberg? [iceberg]

2023-12-11 Thread via GitHub


coolderli commented on issue #9249:
URL: https://github.com/apache/iceberg/issues/9249#issuecomment-1849747058

   @Fokko Thanks for your reply. I think we can use a table for the files that 
have schema. 
   
   If a file does not have a schema, tables cannot be used. Of course, files 
can be directly used on object storage, but directories are difficult to manage
   
   I am wondering if it is possible to provide external references to files 
stored on object storage in a unified directory structure, such as using a 
fixed prefix : /volume/catalog_ Name/database_ Name/volume_ Name
   
   We can record the actual object storage address of the file in the iceberg 
manifest file, so that we can also provide a snapshot without relying on atomic 
renaming of object storage
   
   This does indeed reduce the read and write performance of file semantics. If 
submitted too frequently, it may cause small file problems, but it is possible 
if we use Spark for batch commit. In addition, I found that Spark does not have 
a catalog of the files, so maybe it is possible to implement a FileCatalog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Data: Add GenericFileWriterFactory [iceberg]

2023-12-11 Thread via GitHub


aokolnychyi commented on code in PR #9267:
URL: https://github.com/apache/iceberg/pull/9267#discussion_r1422256919


##
data/src/test/java/org/apache/iceberg/io/TestFileWriterFactory.java:
##
@@ -76,18 +76,15 @@ public static Object[] parameters() {
 
   private final FileFormat fileFormat;
   private final boolean partitioned;
-  private final List dataRows;
 
   private StructLike partition = null;
   private OutputFileFactory fileFactory = null;
+  private List dataRows;
 
   public TestFileWriterFactory(FileFormat fileFormat, boolean partitioned) {
 super(TABLE_FORMAT_VERSION);
 this.fileFormat = fileFormat;
 this.partitioned = partitioned;
-this.dataRows =

Review Comment:
   Had to move the initialization after the table creation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1422303733


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");

Review Comment:
   Aah, I used `checkNotNull` before instead of `checkState` before as it was 
just replacing NULL check. 
   
   But yeah. It is better to throw  `IllegalStateException` instead of `NPE`.  
I have updated now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]

2023-12-11 Thread via GitHub


chinmay-bhat commented on issue #9086:
URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1849868289

   Ok, so can I pick this up once the parameterized tests PR is merged? 
   
   Also, iceberg-spark has folders for each version (v3.2, 3.3, 3.4, 3.5). Do 
you recommend creating separate PRs for each?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]

2023-12-11 Thread via GitHub


nastra commented on issue #9086:
URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1849884128

   We should be mostly focusing on migrating Spark 3.5, which is already a big 
task


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Properly suppress historical snapshots when building TableMetadata with suppressHistoricalSnapshots() [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9234:
URL: https://github.com/apache/iceberg/pull/9234#discussion_r1422341041


##
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##
@@ -1384,7 +1386,9 @@ public Builder setPreviousFileLocation(String 
previousFileLocation) {
 private boolean hasChanges() {
   return changes.size() != startingChangeCount
   || (discardChanges && !changes.isEmpty())
-  || metadataLocation != null;
+  || metadataLocation != null
+  || suppressHistoricalSnapshots

Review Comment:
   this is for the server-side to indicate that there are changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Properly suppress historical snapshots when building TableMetadata with suppressHistoricalSnapshots() [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9234:
URL: https://github.com/apache/iceberg/pull/9234#discussion_r1422342150


##
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##
@@ -1384,7 +1386,9 @@ public Builder setPreviousFileLocation(String 
previousFileLocation) {
 private boolean hasChanges() {
   return changes.size() != startingChangeCount
   || (discardChanges && !changes.isEmpty())
-  || metadataLocation != null;
+  || metadataLocation != null
+  || suppressHistoricalSnapshots
+  || null != snapshotsSupplier;

Review Comment:
   this is for the client-side so that it doesn't exit early in 
https://github.com/apache/iceberg/blob/f19643a93f5dac99bbdbc9881ef19c89d7bcd3eb/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L359-L368
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Properly suppress historical snapshots when building TableMetadata with suppressHistoricalSnapshots() [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9234:
URL: https://github.com/apache/iceberg/pull/9234#discussion_r1422342150


##
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##
@@ -1384,7 +1386,9 @@ public Builder setPreviousFileLocation(String 
previousFileLocation) {
 private boolean hasChanges() {
   return changes.size() != startingChangeCount
   || (discardChanges && !changes.isEmpty())
-  || metadataLocation != null;
+  || metadataLocation != null
+  || suppressHistoricalSnapshots
+  || null != snapshotsSupplier;

Review Comment:
   this is for the client-side in 
https://github.com/apache/iceberg/blob/f19643a93f5dac99bbdbc9881ef19c89d7bcd3eb/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L359-L368
   so that it doesn't exit early in the `build()` method  
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Properly suppress historical snapshots when building TableMetadata with suppressHistoricalSnapshots() [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9234:
URL: https://github.com/apache/iceberg/pull/9234#discussion_r1422341041


##
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##
@@ -1384,7 +1386,9 @@ public Builder setPreviousFileLocation(String 
previousFileLocation) {
 private boolean hasChanges() {
   return changes.size() != startingChangeCount
   || (discardChanges && !changes.isEmpty())
-  || metadataLocation != null;
+  || metadataLocation != null
+  || suppressHistoricalSnapshots

Review Comment:
   this is for the server-side that calls `suppressHistoricalSnapshots()`to 
indicate that there are changes and to not exit early in `build()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#issuecomment-1849920644

   PR is ready @danielcweeks, @nastra, @Fokko   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Spark 3.5: Add Spark application id to summary of RewriteDataFilesSparkAction [iceberg]

2023-12-11 Thread via GitHub


manuzhang opened a new pull request, #9273:
URL: https://github.com/apache/iceberg/pull/9273

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422497424


##
core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java:
##
@@ -400,8 +400,15 @@ public void 
replaceTableViaTransactionThatAlreadyExistsAsView() {
 .buildTable(viewIdentifier, SCHEMA)
 .replaceTransaction()
 .commitTransaction())
-.isInstanceOf(NoSuchTableException.class)
-.hasMessageStartingWith("Table does not exist: ns.view");
+.satisfiesAnyOf(

Review Comment:
   Since #9102 is merged and I rebased. This change is no longer exist. 



##
core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java:
##
@@ -465,8 +472,15 @@ public void replaceViewThatAlreadyExistsAsTable() {
 .withDefaultNamespace(tableIdentifier.namespace())
 .withQuery("spark", "select * from ns.tbl")
 .replace())
-.isInstanceOf(NoSuchViewException.class)
-.hasMessageStartingWith("View does not exist: ns.table");
+.satisfiesAnyOf(

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] iceberg-spark: Switch tests to JUnit5 + AssertJ-style assertions [iceberg]

2023-12-11 Thread via GitHub


chinmay-bhat commented on issue #9086:
URL: https://github.com/apache/iceberg/issues/9086#issuecomment-1850148369

   As it's a big task, I'll get started migrating tests in `iceberg-spark` v3.5 
that are not parameterized, and later open a new PR for the parameterized ones 
:)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


dimas-b commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422508540


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference(
   protected Map properties() {
 return catalogOptions;
   }
+
+  @Override
+  protected ViewOperations newViewOps(TableIdentifier identifier) {
+TableReference tr = parseTableReference(identifier);
+return new NessieViewOperations(
+ContentKey.of(
+
org.projectnessie.model.Namespace.of(identifier.namespace().levels()), 
tr.getName()),
+client.withReference(tr.getReference(), tr.getHash()),
+fileIO);
+  }
+
+  @Override
+  public List listViews(Namespace namespace) {
+return client.listViews(namespace);
+  }
+
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropView(identifierWithoutTableReference(identifier, tableReference), 
false);
+  }
+
+  @Override
+  public void renameView(TableIdentifier from, TableIdentifier to) {
+TableReference fromTableReference = parseTableReference(from);
+TableReference toTableReference = parseTableReference(to);
+
+validateReferenceForRename(fromTableReference, toTableReference, 
Content.Type.ICEBERG_VIEW);
+
+TableIdentifier fromIdentifier =
+NessieUtil.removeCatalogName(
+identifierWithoutTableReference(from, fromTableReference), name());
+TableIdentifier toIdentifier =
+NessieUtil.removeCatalogName(identifierWithoutTableReference(to, 
toTableReference), name());
+client
+.withReference(fromTableReference.getReference(), 
fromTableReference.getHash())
+.renameView(fromIdentifier, toIdentifier);
+  }
+
+  private void validateReferenceForRename(
+  TableReference fromTableReference, TableReference toTableReference, 
Content.Type type) {
+String fromReference =
+fromTableReference.hasReference()
+? fromTableReference.getReference()
+: client.getRef().getName();
+String toReference =
+toTableReference.hasReference()
+? toTableReference.getReference()
+: client.getRef().getName();
+Preconditions.checkArgument(
+fromReference.equalsIgnoreCase(toReference),
+"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':"
++ " source and target references must be the same.",
+NessieUtil.contentTypeString(type).toLowerCase(),

Review Comment:
   Since the expected text is in English we should use 
`.toLowerCase(Locale.ENGLISH))`. Using the default locale can have surprising 
results may be not in this particular case, but we ought to ensure 
correctness in general, I think.
   
   Try this: `assertThat("VIEW".toLowerCase(new 
Locale("TR"))).isEqualTo("view");`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422521681


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference(
   protected Map properties() {
 return catalogOptions;
   }
+
+  @Override
+  protected ViewOperations newViewOps(TableIdentifier identifier) {
+TableReference tr = parseTableReference(identifier);
+return new NessieViewOperations(
+ContentKey.of(
+
org.projectnessie.model.Namespace.of(identifier.namespace().levels()), 
tr.getName()),
+client.withReference(tr.getReference(), tr.getHash()),
+fileIO);
+  }
+
+  @Override
+  public List listViews(Namespace namespace) {
+return client.listViews(namespace);
+  }
+
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropView(identifierWithoutTableReference(identifier, tableReference), 
false);
+  }
+
+  @Override
+  public void renameView(TableIdentifier from, TableIdentifier to) {
+TableReference fromTableReference = parseTableReference(from);
+TableReference toTableReference = parseTableReference(to);
+
+validateReferenceForRename(fromTableReference, toTableReference, 
Content.Type.ICEBERG_VIEW);
+
+TableIdentifier fromIdentifier =
+NessieUtil.removeCatalogName(
+identifierWithoutTableReference(from, fromTableReference), name());
+TableIdentifier toIdentifier =
+NessieUtil.removeCatalogName(identifierWithoutTableReference(to, 
toTableReference), name());
+client
+.withReference(fromTableReference.getReference(), 
fromTableReference.getHash())
+.renameView(fromIdentifier, toIdentifier);
+  }
+
+  private void validateReferenceForRename(
+  TableReference fromTableReference, TableReference toTableReference, 
Content.Type type) {
+String fromReference =
+fromTableReference.hasReference()
+? fromTableReference.getReference()
+: client.getRef().getName();
+String toReference =
+toTableReference.hasReference()
+? toTableReference.getReference()
+: client.getRef().getName();
+Preconditions.checkArgument(
+fromReference.equalsIgnoreCase(toReference),
+"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':"
++ " source and target references must be the same.",
+NessieUtil.contentTypeString(type).toLowerCase(),

Review Comment:
   While adding it, I did check the Iceberg code.
   `.toLowerCase()` is used in other places also. So, I followed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422535596


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference(
   protected Map properties() {
 return catalogOptions;
   }
+
+  @Override
+  protected ViewOperations newViewOps(TableIdentifier identifier) {
+TableReference tr = parseTableReference(identifier);
+return new NessieViewOperations(
+ContentKey.of(
+
org.projectnessie.model.Namespace.of(identifier.namespace().levels()), 
tr.getName()),
+client.withReference(tr.getReference(), tr.getHash()),
+fileIO);
+  }
+
+  @Override
+  public List listViews(Namespace namespace) {
+return client.listViews(namespace);
+  }
+
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropView(identifierWithoutTableReference(identifier, tableReference), 
false);
+  }
+
+  @Override
+  public void renameView(TableIdentifier from, TableIdentifier to) {
+TableReference fromTableReference = parseTableReference(from);
+TableReference toTableReference = parseTableReference(to);
+
+validateReferenceForRename(fromTableReference, toTableReference, 
Content.Type.ICEBERG_VIEW);
+
+TableIdentifier fromIdentifier =
+NessieUtil.removeCatalogName(
+identifierWithoutTableReference(from, fromTableReference), name());
+TableIdentifier toIdentifier =
+NessieUtil.removeCatalogName(identifierWithoutTableReference(to, 
toTableReference), name());
+client
+.withReference(fromTableReference.getReference(), 
fromTableReference.getHash())
+.renameView(fromIdentifier, toIdentifier);
+  }
+
+  private void validateReferenceForRename(
+  TableReference fromTableReference, TableReference toTableReference, 
Content.Type type) {
+String fromReference =
+fromTableReference.hasReference()
+? fromTableReference.getReference()
+: client.getRef().getName();
+String toReference =
+toTableReference.hasReference()
+? toTableReference.getReference()
+: client.getRef().getName();
+Preconditions.checkArgument(
+fromReference.equalsIgnoreCase(toReference),
+"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':"
++ " source and target references must be the same.",
+NessieUtil.contentTypeString(type).toLowerCase(),

Review Comment:
   Fixed for new code added by this PR. 
   
   We can have GH issue for existing code to track it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Api: Track partition statistics via TableMetadata [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8502:
URL: https://github.com/apache/iceberg/pull/8502#discussion_r1422541957


##
core/src/main/java/org/apache/iceberg/ReachableFileUtil.java:
##
@@ -137,7 +137,9 @@ public static List manifestListLocations(Table 
table, Set snapshot
*
* @param table table for which statistics files needs to be listed
* @return the location of statistics files
+   * @deprecated use the {@code allStatisticsFilesLocations(table)} instead.
*/
+  @Deprecated

Review Comment:
   removed the changes in this file as the new APIs added was not called from 
anywhere (this is needed for spark action). 
   
   I will raise a follow up PR for Spark actions to consider partition stats 
file for remove orphan files and expire snapshots action. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Api: Track partition statistics via TableMetadata [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8502:
URL: https://github.com/apache/iceberg/pull/8502#discussion_r1422543077


##
.palantir/revapi.yml:
##
@@ -873,6 +873,10 @@ acceptedBreaks:
   new: "method void org.apache.iceberg.encryption.Ciphers::()"
   justification: "Static utility class - should not have public 
constructor"
   "1.4.0":
+org.apache.iceberg:iceberg-api:
+- code: "java.method.addedToInterface"

Review Comment:
   added as default



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


dimas-b commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422564943


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##
@@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 boolean newTable = base == null;
 String newMetadataLocation = writeNewMetadataIfRequired(newTable, 
metadata);
 
-String refName = client.refName();
 boolean failure = false;
 try {
   String contentId = table == null ? null : table.getId();
   client.commitTable(base, metadata, newMetadataLocation, contentId, key);
-} catch (NessieConflictException ex) {
-  failure = true;
-  if (ex instanceof NessieReferenceConflictException) {
-// Throws a specialized exception, if possible
-maybeThrowSpecializedException((NessieReferenceConflictException) ex);
+} catch (NessieConflictException | NessieNotFoundException | 
HttpClientException ex) {
+  if (ex instanceof NessieConflictException || ex instanceof 
NessieNotFoundException) {
+failure = true;
   }
-  throw new CommitFailedException(
-  ex,
-  "Cannot commit: Reference hash is out of date. "
-  + "Update the reference '%s' and try again",
-  refName);
-} catch (HttpClientException ex) {
-  // Intentionally catch all nessie-client-exceptions here and not just 
the "timeout" variant
-  // to catch all kinds of network errors (e.g. connection reset). Network 
code implementation
-  // details and all kinds of network devices can induce unexpected 
behavior. So better be
-  // safe than sorry.
-  throw new CommitStateUnknownException(ex);
-} catch (NessieNotFoundException ex) {
+  NessieUtil.handleExceptionsForCommits(ex, client.refName(), 
Content.Type.ICEBERG_TABLE)
+  .ifPresent(
+  exception -> {
+throw exception;

Review Comment:
   If I follow the logic correctly, `handleExceptionsForCommits` will always 
return something for the exception types caught in this case, so `ifPresent()` 
is really confusing. Could the code be refactored to avoid unnecessary 
conditional execution?



##
nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java:
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.apache.iceberg.view.ViewMetadataParser;
+import org.projectnessie.client.http.HttpClientException;
+import org.projectnessie.error.NessieBadRequestException;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieContentNotFoundException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Content;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.Reference;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class NessieViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(NessieViewOperations.class);
+
+  private final NessieIcebergClient client;
+  private final ContentKey key;
+  private final FileIO fileIO;
+  private IcebergView icebergView;
+
+  NessieViewOperations(ContentKey key, NessieIcebergClient client, FileIO 
fileIO) {
+this.key = key;
+this.client = client;
+this.fileIO = fileIO;
+  }
+
+  @Override
+  public void doRefresh() {
+try {
+  client.refresh();
+} catch (NessieNotFoundException e) {
+  throw new RuntimeException(
+  String.format(
+  "Failed to refresh as ref '%s' is no longer valid.", 
client.getRef().getName()),
+  e);
+}
+String metadataLocation = null;
+Reference reference = client.getRef().getReference();
+try {
+  Content content = 
client.getApi().getContent().key(key).reference(reference).get().get(key);
+  LOG.

[PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]

2023-12-11 Thread via GitHub


manuzhang opened a new pull request, #9274:
URL: https://github.com/apache/iceberg/pull/9274

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422622180


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java:
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.apache.iceberg.view.ViewMetadataParser;
+import org.projectnessie.client.http.HttpClientException;
+import org.projectnessie.error.NessieBadRequestException;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieContentNotFoundException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Content;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.Reference;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class NessieViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(NessieViewOperations.class);
+
+  private final NessieIcebergClient client;
+  private final ContentKey key;
+  private final FileIO fileIO;
+  private IcebergView icebergView;
+
+  NessieViewOperations(ContentKey key, NessieIcebergClient client, FileIO 
fileIO) {
+this.key = key;
+this.client = client;
+this.fileIO = fileIO;
+  }
+
+  @Override
+  public void doRefresh() {
+try {
+  client.refresh();
+} catch (NessieNotFoundException e) {
+  throw new RuntimeException(
+  String.format(
+  "Failed to refresh as ref '%s' is no longer valid.", 
client.getRef().getName()),
+  e);
+}
+String metadataLocation = null;
+Reference reference = client.getRef().getReference();
+try {
+  Content content = 
client.getApi().getContent().key(key).reference(reference).get().get(key);
+  LOG.debug("Content '{}' at '{}': {}", key, reference, content);
+  if (content == null) {
+if (currentMetadataLocation() != null) {
+  throw new NoSuchViewException("View does not exist: %s in %s", key, 
reference);
+}
+  } else {
+this.icebergView =
+content
+.unwrap(IcebergView.class)
+.orElseThrow(() -> new NessieContentNotFoundException(key, 
reference.getName()));
+metadataLocation = icebergView.getMetadataLocation();
+  }
+} catch (NessieNotFoundException ex) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s in %s", key, 
reference);
+  }
+}
+refreshFromMetadataLocation(
+metadataLocation,
+null,
+2,
+location ->
+NessieUtil.loadViewMetadata(
+ViewMetadataParser.read(io().newInputFile(location)), 
location, reference));
+  }
+
+  @Override
+  public void doCommit(ViewMetadata base, ViewMetadata metadata) {
+String newMetadataLocation = writeNewMetadataIfRequired(metadata);
+
+boolean failure = false;
+try {
+  String contentId = icebergView == null ? null : icebergView.getId();
+  client.commitView(base, metadata, newMetadataLocation, contentId, key);
+} catch (NessieConflictException | NessieNotFoundException | 
HttpClientException ex) {
+  if (ex instanceof NessieConflictException || ex instanceof 
NessieNotFoundException) {
+failure = true;
+  }
+  NessieUtil.handleExceptionsForCommits(ex, client.refName(), 
Content.Type.ICEBERG_VIEW)
+  .ifPresent(
+  exception -> {
+throw exception;
+  });
+} catch (NessieBadRequestException ex) {
+  failure = true;
+  throw NessieUtil.handleBadRequestForCommit(client, key, 
Content.Type.ICEBERG_VIEW).orElse(ex);
+} finally {
+  if (failure) {
+io().deleteFile(newMetadataLocation);

Review Comment:
   I tried before. 
   But one works on `client.commitView` and one is `client.commitTable`. So, it 
canno

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422626565


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java:
##
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Path;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.CatalogProperties;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.view.BaseView;
+import org.apache.iceberg.view.View;
+import org.apache.iceberg.view.ViewCatalogTests;
+import org.apache.iceberg.view.ViewMetadata;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.junit.jupiter.api.extension.RegisterExtension;
+import org.junit.jupiter.api.io.TempDir;
+import org.projectnessie.client.api.NessieApiV1;
+import org.projectnessie.client.ext.NessieApiVersion;
+import org.projectnessie.client.ext.NessieApiVersions;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.jaxrs.ext.NessieJaxRsExtension;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.Reference;
+import org.projectnessie.model.Tag;
+import org.projectnessie.versioned.storage.common.persist.Persist;
+import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory;
+import org.projectnessie.versioned.storage.testextension.NessieBackend;
+import org.projectnessie.versioned.storage.testextension.NessiePersist;
+import org.projectnessie.versioned.storage.testextension.PersistExtension;
+
+@ExtendWith(PersistExtension.class)
+@NessieBackend(InmemoryBackendTestFactory.class)
+@NessieApiVersions // test all versions
+public class TestNessieViewCatalog extends ViewCatalogTests {
+
+  @NessiePersist static Persist persist;
+
+  @RegisterExtension
+  static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() 
-> persist);
+
+  @TempDir private Path temp;
+
+  private NessieCatalog catalog;
+  private NessieApiV1 api;
+  private NessieApiVersion apiVersion;
+  private Configuration hadoopConfig;
+  private String initialHashOfDefaultBranch;
+  private String uri;
+
+  @BeforeEach
+  public void setUp(NessieClientFactory clientFactory, @NessieClientUri URI 
nessieUri)
+  throws NessieNotFoundException {
+api = clientFactory.make();
+apiVersion = clientFactory.apiVersion();
+initialHashOfDefaultBranch = api.getDefaultBranch().getHash();
+uri = nessieUri.toASCIIString();
+hadoopConfig = new Configuration();
+catalog = initNessieCatalog("main");
+  }
+
+  @AfterEach
+  public void afterEach() throws IOException {
+resetData();
+try {
+  if (catalog != null) {
+catalog.close();
+  }
+  api.close();
+} finally {
+  catalog = null;
+  api = null;
+  hadoopConfig = null;
+}
+  }
+
+  private void resetData() throws NessieConflictException, 
NessieNotFoundException {
+Branch defaultBranch = api.getDefaultBranch();
+for (Reference r : api.getAllReferences().get().getReferences()) {

Review Comment:
   I don't see the advantage of adding at Nessie side. But it can be added as 
Util for the test classes. But maybe as a follow up PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422634486


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##
@@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 boolean newTable = base == null;
 String newMetadataLocation = writeNewMetadataIfRequired(newTable, 
metadata);
 
-String refName = client.refName();
 boolean failure = false;
 try {
   String contentId = table == null ? null : table.getId();
   client.commitTable(base, metadata, newMetadataLocation, contentId, key);
-} catch (NessieConflictException ex) {
-  failure = true;
-  if (ex instanceof NessieReferenceConflictException) {
-// Throws a specialized exception, if possible
-maybeThrowSpecializedException((NessieReferenceConflictException) ex);
+} catch (NessieConflictException | NessieNotFoundException | 
HttpClientException ex) {
+  if (ex instanceof NessieConflictException || ex instanceof 
NessieNotFoundException) {
+failure = true;
   }
-  throw new CommitFailedException(
-  ex,
-  "Cannot commit: Reference hash is out of date. "
-  + "Update the reference '%s' and try again",
-  refName);
-} catch (HttpClientException ex) {
-  // Intentionally catch all nessie-client-exceptions here and not just 
the "timeout" variant
-  // to catch all kinds of network errors (e.g. connection reset). Network 
code implementation
-  // details and all kinds of network devices can induce unexpected 
behavior. So better be
-  // safe than sorry.
-  throw new CommitStateUnknownException(ex);
-} catch (NessieNotFoundException ex) {
+  NessieUtil.handleExceptionsForCommits(ex, client.refName(), 
Content.Type.ICEBERG_TABLE)
+  .ifPresent(
+  exception -> {
+throw exception;

Review Comment:
   +1 on improving exception handling in the nessie code as I've raised this 
already before, because it's difficult to read & understand when certain things 
are thrown.
   However, I would suggest to do this as an immediate follow-up after this PR 
is merged, as otherwise this makes it more difficult to review the changes 
being introduced for Views



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


dimas-b commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422637569


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference(
   protected Map properties() {
 return catalogOptions;
   }
+
+  @Override
+  protected ViewOperations newViewOps(TableIdentifier identifier) {
+TableReference tr = parseTableReference(identifier);
+return new NessieViewOperations(
+ContentKey.of(
+
org.projectnessie.model.Namespace.of(identifier.namespace().levels()), 
tr.getName()),
+client.withReference(tr.getReference(), tr.getHash()),
+fileIO);
+  }
+
+  @Override
+  public List listViews(Namespace namespace) {
+return client.listViews(namespace);
+  }
+
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropView(identifierWithoutTableReference(identifier, tableReference), 
false);
+  }
+
+  @Override
+  public void renameView(TableIdentifier from, TableIdentifier to) {
+TableReference fromTableReference = parseTableReference(from);
+TableReference toTableReference = parseTableReference(to);
+
+validateReferenceForRename(fromTableReference, toTableReference, 
Content.Type.ICEBERG_VIEW);
+
+TableIdentifier fromIdentifier =
+NessieUtil.removeCatalogName(
+identifierWithoutTableReference(from, fromTableReference), name());
+TableIdentifier toIdentifier =
+NessieUtil.removeCatalogName(identifierWithoutTableReference(to, 
toTableReference), name());
+client
+.withReference(fromTableReference.getReference(), 
fromTableReference.getHash())
+.renameView(fromIdentifier, toIdentifier);
+  }
+
+  private void validateReferenceForRename(
+  TableReference fromTableReference, TableReference toTableReference, 
Content.Type type) {
+String fromReference =
+fromTableReference.hasReference()
+? fromTableReference.getReference()
+: client.getRef().getName();
+String toReference =
+toTableReference.hasReference()
+? toTableReference.getReference()
+: client.getRef().getName();
+Preconditions.checkArgument(
+fromReference.equalsIgnoreCase(toReference),
+"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':"
++ " source and target references must be the same.",
+NessieUtil.contentTypeString(type).toLowerCase(),

Review Comment:
   I did a quick scan of the Iceberg codebase for this, and I do a few places 
where `.toLowerCase()` may be a concern:
   
   1) `TableIdentifier.toLoweCase()` is used only in `CachingCatalog` 
internally, which is probably ok, since the lower case strings are not exposed 
to the outside.
   
   2) 
[VectorizedSupport](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/mr/src/main/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedSupport.java#L33)
 seems like it may be a problem, but I do not really know whether the lower 
case data is exposed and how.
   
   3) 
[IcebergRecordObjectInspector](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java#L55)
 converts field names to lower case, and that seems to be affected by the 
locale problem as various case names are accepted as input parameters to its 
methods.
   
   4) `jQuery` code uses a log of `toLowerCase()`, but I do not really know how 
it is supposed to be used.
   
   @nastra : Do you think this is worth opening an issue? To the best of my 
knowledge this kind of case conversion can be problematic only in German and 
Turkish locales. The German locale affects only proper German language words 
(so it is less of a problem), but the Turkish locale can cause English words to 
be converted in unexpected ways. Does Iceberg support using its libraries in 
user-defined locales?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


dimas-b commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422637569


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference(
   protected Map properties() {
 return catalogOptions;
   }
+
+  @Override
+  protected ViewOperations newViewOps(TableIdentifier identifier) {
+TableReference tr = parseTableReference(identifier);
+return new NessieViewOperations(
+ContentKey.of(
+
org.projectnessie.model.Namespace.of(identifier.namespace().levels()), 
tr.getName()),
+client.withReference(tr.getReference(), tr.getHash()),
+fileIO);
+  }
+
+  @Override
+  public List listViews(Namespace namespace) {
+return client.listViews(namespace);
+  }
+
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropView(identifierWithoutTableReference(identifier, tableReference), 
false);
+  }
+
+  @Override
+  public void renameView(TableIdentifier from, TableIdentifier to) {
+TableReference fromTableReference = parseTableReference(from);
+TableReference toTableReference = parseTableReference(to);
+
+validateReferenceForRename(fromTableReference, toTableReference, 
Content.Type.ICEBERG_VIEW);
+
+TableIdentifier fromIdentifier =
+NessieUtil.removeCatalogName(
+identifierWithoutTableReference(from, fromTableReference), name());
+TableIdentifier toIdentifier =
+NessieUtil.removeCatalogName(identifierWithoutTableReference(to, 
toTableReference), name());
+client
+.withReference(fromTableReference.getReference(), 
fromTableReference.getHash())
+.renameView(fromIdentifier, toIdentifier);
+  }
+
+  private void validateReferenceForRename(
+  TableReference fromTableReference, TableReference toTableReference, 
Content.Type type) {
+String fromReference =
+fromTableReference.hasReference()
+? fromTableReference.getReference()
+: client.getRef().getName();
+String toReference =
+toTableReference.hasReference()
+? toTableReference.getReference()
+: client.getRef().getName();
+Preconditions.checkArgument(
+fromReference.equalsIgnoreCase(toReference),
+"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':"
++ " source and target references must be the same.",
+NessieUtil.contentTypeString(type).toLowerCase(),

Review Comment:
   I did a quick scan of the Iceberg codebase for this, and I do a few places 
where `.toLowerCase()` may be a concern:
   
   1) `TableIdentifier.toLoweCase()` is used only in `CachingCatalog` 
internally, which is probably ok, since the lower case strings are not exposed 
to the outside.
   
   2) 
[VectorizedSupport](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/mr/src/main/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedSupport.java#L33)
 seems like it may be a problem, but I do not really know whether the lower 
case data is exposed and how.
   
   3) 
[IcebergRecordObjectInspector](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java#L55)
 converts field names to lower case, and that seems to be affected by the 
locale problem as various case names are accepted as input parameters to its 
methods.
   
   4) `jQuery` code uses a lot of `toLowerCase()`, but I do not really know how 
it is supposed to be used.
   
   @nastra : Do you think this is worth opening an issue? To the best of my 
knowledge this kind of case conversion can be problematic only in German and 
Turkish locales. The German locale affects only proper German language words 
(so it is less of a problem), but the Turkish locale can cause English words to 
be converted in unexpected ways. Does Iceberg support using its libraries in 
user-defined locales?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


dimas-b commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422655391


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java:
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.apache.iceberg.view.ViewMetadataParser;
+import org.projectnessie.client.http.HttpClientException;
+import org.projectnessie.error.NessieBadRequestException;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieContentNotFoundException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Content;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.Reference;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class NessieViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(NessieViewOperations.class);
+
+  private final NessieIcebergClient client;
+  private final ContentKey key;
+  private final FileIO fileIO;
+  private IcebergView icebergView;
+
+  NessieViewOperations(ContentKey key, NessieIcebergClient client, FileIO 
fileIO) {
+this.key = key;
+this.client = client;
+this.fileIO = fileIO;
+  }
+
+  @Override
+  public void doRefresh() {
+try {
+  client.refresh();
+} catch (NessieNotFoundException e) {
+  throw new RuntimeException(
+  String.format(
+  "Failed to refresh as ref '%s' is no longer valid.", 
client.getRef().getName()),
+  e);
+}
+String metadataLocation = null;
+Reference reference = client.getRef().getReference();
+try {
+  Content content = 
client.getApi().getContent().key(key).reference(reference).get().get(key);
+  LOG.debug("Content '{}' at '{}': {}", key, reference, content);
+  if (content == null) {
+if (currentMetadataLocation() != null) {
+  throw new NoSuchViewException("View does not exist: %s in %s", key, 
reference);
+}
+  } else {
+this.icebergView =
+content
+.unwrap(IcebergView.class)
+.orElseThrow(() -> new NessieContentNotFoundException(key, 
reference.getName()));
+metadataLocation = icebergView.getMetadataLocation();
+  }
+} catch (NessieNotFoundException ex) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s in %s", key, 
reference);
+  }
+}
+refreshFromMetadataLocation(
+metadataLocation,
+null,
+2,
+location ->
+NessieUtil.loadViewMetadata(
+ViewMetadataParser.read(io().newInputFile(location)), 
location, reference));
+  }
+
+  @Override
+  public void doCommit(ViewMetadata base, ViewMetadata metadata) {
+String newMetadataLocation = writeNewMetadataIfRequired(metadata);
+
+boolean failure = false;
+try {
+  String contentId = icebergView == null ? null : icebergView.getId();
+  client.commitView(base, metadata, newMetadataLocation, contentId, key);
+} catch (NessieConflictException | NessieNotFoundException | 
HttpClientException ex) {
+  if (ex instanceof NessieConflictException || ex instanceof 
NessieNotFoundException) {
+failure = true;
+  }
+  NessieUtil.handleExceptionsForCommits(ex, client.refName(), 
Content.Type.ICEBERG_VIEW)
+  .ifPresent(
+  exception -> {
+throw exception;
+  });
+} catch (NessieBadRequestException ex) {
+  failure = true;
+  throw NessieUtil.handleBadRequestForCommit(client, key, 
Content.Type.ICEBERG_VIEW).orElse(ex);
+} finally {
+  if (failure) {
+io().deleteFile(newMetadataLocation);

Review Comment:
   We could define our own functional `interface` with declared exceptions.



-- 
This is an automated message

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


dimas-b commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422656154


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java:
##
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Path;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.CatalogProperties;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.view.BaseView;
+import org.apache.iceberg.view.View;
+import org.apache.iceberg.view.ViewCatalogTests;
+import org.apache.iceberg.view.ViewMetadata;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.junit.jupiter.api.extension.RegisterExtension;
+import org.junit.jupiter.api.io.TempDir;
+import org.projectnessie.client.api.NessieApiV1;
+import org.projectnessie.client.ext.NessieApiVersion;
+import org.projectnessie.client.ext.NessieApiVersions;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.jaxrs.ext.NessieJaxRsExtension;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.Reference;
+import org.projectnessie.model.Tag;
+import org.projectnessie.versioned.storage.common.persist.Persist;
+import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory;
+import org.projectnessie.versioned.storage.testextension.NessieBackend;
+import org.projectnessie.versioned.storage.testextension.NessiePersist;
+import org.projectnessie.versioned.storage.testextension.PersistExtension;
+
+@ExtendWith(PersistExtension.class)
+@NessieBackend(InmemoryBackendTestFactory.class)
+@NessieApiVersions // test all versions
+public class TestNessieViewCatalog extends ViewCatalogTests {
+
+  @NessiePersist static Persist persist;
+
+  @RegisterExtension
+  static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() 
-> persist);
+
+  @TempDir private Path temp;
+
+  private NessieCatalog catalog;
+  private NessieApiV1 api;
+  private NessieApiVersion apiVersion;
+  private Configuration hadoopConfig;
+  private String initialHashOfDefaultBranch;
+  private String uri;
+
+  @BeforeEach
+  public void setUp(NessieClientFactory clientFactory, @NessieClientUri URI 
nessieUri)
+  throws NessieNotFoundException {
+api = clientFactory.make();
+apiVersion = clientFactory.apiVersion();
+initialHashOfDefaultBranch = api.getDefaultBranch().getHash();
+uri = nessieUri.toASCIIString();
+hadoopConfig = new Configuration();
+catalog = initNessieCatalog("main");
+  }
+
+  @AfterEach
+  public void afterEach() throws IOException {
+resetData();
+try {
+  if (catalog != null) {
+catalog.close();
+  }
+  api.close();
+} finally {
+  catalog = null;
+  api = null;
+  hadoopConfig = null;
+}
+  }
+
+  private void resetData() throws NessieConflictException, 
NessieNotFoundException {
+Branch defaultBranch = api.getDefaultBranch();
+for (Reference r : api.getAllReferences().get().getReferences()) {

Review Comment:
   fair enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


dimas-b commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422658039


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##
@@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 boolean newTable = base == null;
 String newMetadataLocation = writeNewMetadataIfRequired(newTable, 
metadata);
 
-String refName = client.refName();
 boolean failure = false;
 try {
   String contentId = table == null ? null : table.getId();
   client.commitTable(base, metadata, newMetadataLocation, contentId, key);
-} catch (NessieConflictException ex) {
-  failure = true;
-  if (ex instanceof NessieReferenceConflictException) {
-// Throws a specialized exception, if possible
-maybeThrowSpecializedException((NessieReferenceConflictException) ex);
+} catch (NessieConflictException | NessieNotFoundException | 
HttpClientException ex) {
+  if (ex instanceof NessieConflictException || ex instanceof 
NessieNotFoundException) {
+failure = true;
   }
-  throw new CommitFailedException(
-  ex,
-  "Cannot commit: Reference hash is out of date. "
-  + "Update the reference '%s' and try again",
-  refName);
-} catch (HttpClientException ex) {
-  // Intentionally catch all nessie-client-exceptions here and not just 
the "timeout" variant
-  // to catch all kinds of network errors (e.g. connection reset). Network 
code implementation
-  // details and all kinds of network devices can induce unexpected 
behavior. So better be
-  // safe than sorry.
-  throw new CommitStateUnknownException(ex);
-} catch (NessieNotFoundException ex) {
+  NessieUtil.handleExceptionsForCommits(ex, client.refName(), 
Content.Type.ICEBERG_TABLE)
+  .ifPresent(
+  exception -> {
+throw exception;

Review Comment:
   I'm fine with a follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] java.lang.ClassNotFoundException: Failed to find data source: iceberg. Issue when we are using Java Custom Catalog [iceberg]

2023-12-11 Thread via GitHub


mahendrachandrasekhar opened a new issue, #9275:
URL: https://github.com/apache/iceberg/issues/9275

   We are using a Java Custom Catalog with iceberg. The Table is created 
properly, however we get an issue when we insert the data.
   
   `public String createCustomTable(String tableName) {
   try {
   TableIdentifier tableIdentifier = TableIdentifier.of(name(), 
tableName);
   Schema schema = readSchema(tableIdentifier);
   Map properties = ImmutableMap.of(
   TableProperties.DEFAULT_FILE_FORMAT, 
FileFormat.PARQUET.name()
   );
   PartitionSpec partitionSpec = PartitionSpec.builderFor(schema)
   
.identity(getPartitionKeyfromSchema(tableIdentifier.name()))
   .build();
   String tableLocation = defaultLocation + 
tableIdentifier.namespace().toString() + "/" + tableIdentifier.name();
   catalog.createTable(tableIdentifier, schema, partitionSpec, 
tableLocation, properties);
   catalog.loadTable(TableIdentifier.of(name(), tableName));
   return "Table created";
   } catch (Exception e) {
   return e.getMessage();
   }
   }
   
   
   public String insertData(String tableName, String csvPath) throws 
IOException {
   Table icebergTable = 
catalog.loadTable(TableIdentifier.of(name(), tableName));
   SparkSession spark = SparkSession.builder()
   .config("spark.master", "local")
   .getOrCreate();
   String headerJson = readHeaderJson(tableName);
   LOGGER.info("Header JSON for {}: {}", tableName, headerJson);
   
   String[] columns = headerJson.split(",");
   Dataset df = spark.read()
   .option("header", "false")
   .option("inferSchema", "false")
   .option("comment", "#")
   .option("sep", "|")
   .csv(csvPath)
   .toDF(columns);
   LOGGER.info("Actual columns: {}", Arrays.toString(df.columns()));
   
   for (String col : df.columns()) {
   df = df.withColumn(col, df.col(col).cast("string"));
   }
   
df.write().format("iceberg").mode(SaveMode.Append).save(icebergTable.location());
   LOGGER.info("Data inserted successfully into table: {}", 
tableName);
   }
   `
   
   When we execute this via a main program in Java it works perfectly
   
   However when we create a jar out of this and call this from Spark it gives 
us this error
   `
   ERROR:root:Error: An error occurred while calling o0.insertData.
   : java.lang.ClassNotFoundException:
   Failed to find data source: iceberg. Please find packages at
   http://spark.apache.org/third-party-projects.html
   
   at 
org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:443)
at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:670)
   at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
   at 
org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
   at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
   at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
   at com.xyz.catalog.CustomCatalog.insertData(CustomCatalog.java:178)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   at py4j.Gateway.invoke(Gateway.java:282)
   at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   at py4j.commands.CallCommand.execute(CallCommand.java:79)
   at py4j.GatewayConnection.run(GatewayConnection.java:238)
   at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: java.lang.ClassNotFoundException: iceberg.DefaultSource
   at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
   at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
   at 
org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:656)

Re: [I] java.lang.ClassNotFoundException: Failed to find data source: iceberg. Issue when we are using Java Custom Catalog [iceberg]

2023-12-11 Thread via GitHub


mahendrachandrasekhar commented on issue #9275:
URL: https://github.com/apache/iceberg/issues/9275#issuecomment-1850354505

   We have this included in our docker image (which is based of 
bitnami/spark:3.2.4
   
   RUN curl -L -o /home/airflow/spark/jars/hadoop-aws.jar 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-aws/2.10.2/hadoop-aws-2.10.2.jar
 && \
   curl -L -o /home/airflow/spark/jars/caffeine.jar 
https://repo.maven.apache.org/maven2/com/github/ben-manes/caffeine/caffeine/3.1.3/caffeine-3.1.3.jar
 && \
   curl -L -o /home/airflow/spark/jars/iceberg-core.jar 
https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-core/1.4.2/iceberg-core-1.4.2.jar
 && \
   curl -L -o /home/airflow/spark/jars/iceberg-spark.jar 
https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-spark-3.5_2.13/1.4.2/iceberg-spark-3.5_2.13-1.4.2.jar
 && \
   curl -L -o /home/airflow/spark/jars/iceberg-spark-runtime.jar 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.13/1.4.2/iceberg-spark-runtime-3.5_2.13-1.4.2.jar
 && \
   curl -L -o /home/airflow/spark/jars/iceberg-parquet.jar 
https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-parquet/1.4.2/iceberg-parquet-1.4.2.jar
   
   ENV SPARK_DIST_CLASSPATH 
$SPARK_DIST_CLASSPATH:/home/airflow/spark/jars/hadoop-aws.jar:/home/airflow/spark/jars/iceberg-core.jar:/home/airflow/spark/jars/caffeine.jar:/home/airflow/spark/jars/iceberg-spark-runtime.jar:/home/airflow/spark/jars/iceberg-parquet.jar:/home/airflow/spark/jars/iceberg-spark.jar
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422706464


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference(
   protected Map properties() {
 return catalogOptions;
   }
+
+  @Override
+  protected ViewOperations newViewOps(TableIdentifier identifier) {
+TableReference tr = parseTableReference(identifier);
+return new NessieViewOperations(
+ContentKey.of(
+
org.projectnessie.model.Namespace.of(identifier.namespace().levels()), 
tr.getName()),
+client.withReference(tr.getReference(), tr.getHash()),
+fileIO);
+  }
+
+  @Override
+  public List listViews(Namespace namespace) {
+return client.listViews(namespace);
+  }
+
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropView(identifierWithoutTableReference(identifier, tableReference), 
false);
+  }
+
+  @Override
+  public void renameView(TableIdentifier from, TableIdentifier to) {
+TableReference fromTableReference = parseTableReference(from);
+TableReference toTableReference = parseTableReference(to);
+
+validateReferenceForRename(fromTableReference, toTableReference, 
Content.Type.ICEBERG_VIEW);
+
+TableIdentifier fromIdentifier =
+NessieUtil.removeCatalogName(
+identifierWithoutTableReference(from, fromTableReference), name());
+TableIdentifier toIdentifier =
+NessieUtil.removeCatalogName(identifierWithoutTableReference(to, 
toTableReference), name());
+client
+.withReference(fromTableReference.getReference(), 
fromTableReference.getHash())
+.renameView(fromIdentifier, toIdentifier);
+  }
+
+  private void validateReferenceForRename(
+  TableReference fromTableReference, TableReference toTableReference, 
Content.Type type) {
+String fromReference =
+fromTableReference.hasReference()
+? fromTableReference.getReference()
+: client.getRef().getName();
+String toReference =
+toTableReference.hasReference()
+? toTableReference.getReference()
+: client.getRef().getName();
+Preconditions.checkArgument(
+fromReference.equalsIgnoreCase(toReference),
+"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':"
++ " source and target references must be the same.",
+NessieUtil.contentTypeString(type).toLowerCase(),

Review Comment:
   it's probably a good idea to open an issue and raise awareness about this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422710861


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##
@@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 boolean newTable = base == null;
 String newMetadataLocation = writeNewMetadataIfRequired(newTable, 
metadata);
 
-String refName = client.refName();
 boolean failure = false;
 try {
   String contentId = table == null ? null : table.getId();
   client.commitTable(base, metadata, newMetadataLocation, contentId, key);
-} catch (NessieConflictException ex) {
-  failure = true;
-  if (ex instanceof NessieReferenceConflictException) {
-// Throws a specialized exception, if possible
-maybeThrowSpecializedException((NessieReferenceConflictException) ex);
+} catch (NessieConflictException | NessieNotFoundException | 
HttpClientException ex) {
+  if (ex instanceof NessieConflictException || ex instanceof 
NessieNotFoundException) {
+failure = true;
   }
-  throw new CommitFailedException(
-  ex,
-  "Cannot commit: Reference hash is out of date. "
-  + "Update the reference '%s' and try again",
-  refName);
-} catch (HttpClientException ex) {
-  // Intentionally catch all nessie-client-exceptions here and not just 
the "timeout" variant
-  // to catch all kinds of network errors (e.g. connection reset). Network 
code implementation
-  // details and all kinds of network devices can induce unexpected 
behavior. So better be
-  // safe than sorry.
-  throw new CommitStateUnknownException(ex);
-} catch (NessieNotFoundException ex) {
+  NessieUtil.handleExceptionsForCommits(ex, client.refName(), 
Content.Type.ICEBERG_TABLE)
+  .ifPresent(
+  exception -> {
+throw exception;

Review Comment:
   Slightly improved now ;) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422711613


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##
@@ -133,69 +114,30 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 boolean newTable = base == null;
 String newMetadataLocation = writeNewMetadataIfRequired(newTable, 
metadata);
 
-String refName = client.refName();
 boolean failure = false;
 try {
   String contentId = table == null ? null : table.getId();
   client.commitTable(base, metadata, newMetadataLocation, contentId, key);
-} catch (NessieConflictException ex) {
-  failure = true;
-  if (ex instanceof NessieReferenceConflictException) {
-// Throws a specialized exception, if possible
-maybeThrowSpecializedException((NessieReferenceConflictException) ex);
+} catch (NessieConflictException | NessieNotFoundException | 
HttpClientException ex) {
+  if (ex instanceof NessieConflictException || ex instanceof 
NessieNotFoundException) {
+failure = true;

Review Comment:
   That will be more complicated I guess. 
   
   I have simplified a bit now. Please take a look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422711943


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##
@@ -448,33 +480,85 @@ public void renameTable(TableIdentifier from, 
TableIdentifier to) {
 // behavior. So better be safe than sorry.
   }
 
+  private static void validateToContentForRename(
+  TableIdentifier from, TableIdentifier to, IcebergContent 
existingToContent) {
+if (existingToContent != null) {
+  if (existingToContent.getType() == Content.Type.ICEBERG_VIEW) {
+throw new AlreadyExistsException("Cannot rename %s to %s. View already 
exists", from, to);
+  } else if (existingToContent.getType() == Content.Type.ICEBERG_TABLE) {
+throw new AlreadyExistsException("Cannot rename %s to %s. Table 
already exists", from, to);
+  } else {
+throw new AlreadyExistsException(
+"Cannot rename %s to %s. Another content of type %s with same name 
already exists",
+from, to, existingToContent.getType());
+  }
+}
+  }
+
+  private static void validateFromContentForRename(
+  TableIdentifier from, Content.Type type, IcebergContent 
existingFromContent) {
+if (existingFromContent == null) {
+  if (type == Content.Type.ICEBERG_VIEW) {
+throw new NoSuchViewException("View does not exist: %s", from);
+  } else if (type == Content.Type.ICEBERG_TABLE) {
+throw new NoSuchTableException("Table does not exist: %s", from);
+  } else {
+throw new RuntimeException("Cannot perform rename for content type: " 
+ type);
+  }
+} else if (existingFromContent.getType() != type) {
+  throw new RuntimeException(
+  String.format("content type of from identifier %s should be of %s", 
from, type));
+}
+  }
+
   public boolean dropTable(TableIdentifier identifier, boolean purge) {
+return dropContent(identifier, purge, Content.Type.ICEBERG_TABLE);
+  }
+
+  public boolean dropView(TableIdentifier identifier, boolean purge) {
+return dropContent(identifier, purge, Content.Type.ICEBERG_VIEW);
+  }
+
+  private boolean dropContent(TableIdentifier identifier, boolean purge, 
Content.Type type) {
 getRef().checkMutable();
 
-IcebergTable existingTable = table(identifier);
-if (existingTable == null) {
+IcebergContent existingContent = fetchContent(identifier);
+if (existingContent == null) {
   return false;
 }
 
+if (existingContent.getType() != type) {
+  throw new RuntimeException(
+  String.format(
+  "Cannot drop %s: not matching with the type `%s`",
+  identifier, NessieUtil.contentTypeString(type)));
+}
+
+String contentType = 
NessieUtil.contentTypeString(type).toLowerCase(Locale.ENGLISH);
+
 if (purge) {
-  LOG.info("Purging data for table {} was set to true but is ignored", 
identifier.toString());
+  LOG.info(
+  "Purging data for {} {} was set to true but is ignored",
+  contentType,
+  identifier.toString());
 }
 
-// We try to drop the table. Simple retry after ref update.
+// We try to drop the content. Simple retry after ref update.
 try {
   commitRetry(
   String.format("Iceberg delete table %s", identifier),

Review Comment:
   good catch. Induced during rebase. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Table Requirements Validation [iceberg-python]

2023-12-11 Thread via GitHub


Fokko merged PR #200:
URL: https://github.com/apache/iceberg-python/pull/200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #8918:
URL: https://github.com/apache/iceberg/pull/8918#discussion_r1422744205


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##
@@ -261,6 +261,12 @@ public void renameTable(TableIdentifier from, 
TableIdentifier originalTo) {
 } catch (InterruptedException e) {
   Thread.currentThread().interrupt();
   throw new RuntimeException("Interrupted in call to rename", e);
+} catch (RuntimeException e) {

Review Comment:
   @pvary @nk1506 what's the outcome of this? Are we ok with the changes in 
`HiveCatalog` or not? 
   This PR is required for https://github.com/apache/iceberg/pull/8907



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #8907:
URL: https://github.com/apache/iceberg/pull/8907#discussion_r1422744838


##
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveViewCatalog.java:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import java.util.Collections;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.view.ViewCatalogTests;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+
+public class TestHiveViewCatalog extends ViewCatalogTests {
+
+  private HiveCatalog catalog;
+
+  @BeforeEach
+  public void before() throws Exception {
+HiveMetastoreTest.startMetastore(Collections.emptyMap());

Review Comment:
   this should use the JUnit5 extension from 
https://github.com/apache/iceberg/pull/8918 rather than using 
`HiveMetastoreTest` directly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #8918:
URL: https://github.com/apache/iceberg/pull/8918#discussion_r1422755860


##
hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java:
##
@@ -31,6 +31,10 @@
 import org.junit.jupiter.api.AfterAll;
 import org.junit.jupiter.api.BeforeAll;
 
+/*
+ * This meta-setup has been deprecated; use {@link HiveMetastoreExtension} 
instead.
+ * */
+@Deprecated

Review Comment:
   @nk1506 can you please open a proper PR against the Iceberg repo with those 
changes where subclasses of `HiveMetastoreTest` are moved to JUnit5?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #8918:
URL: https://github.com/apache/iceberg/pull/8918#discussion_r1422755860


##
hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java:
##
@@ -31,6 +31,10 @@
 import org.junit.jupiter.api.AfterAll;
 import org.junit.jupiter.api.BeforeAll;
 
+/*
+ * This meta-setup has been deprecated; use {@link HiveMetastoreExtension} 
instead.
+ * */
+@Deprecated

Review Comment:
   @nk1506 can you please open a proper PR against the Iceberg repo with those 
changes where subclasses of `HiveMetastoreTest` are moved to JUnit5? 
   That being said, I don't think the deprecation here is necessary, since 
`HiveMetastoreTest` could also be seen as a Junit4-thing, whereas the 
`HiveMetastoreExtension` is what you would use for JUnit5.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9161:
URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422771246


##
api/src/test/java/org/apache/iceberg/TestHelpers.java:
##
@@ -173,6 +178,60 @@ public static void assertSameSchemaMap(Map map1, Map

Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]

2023-12-11 Thread via GitHub


nastra commented on code in PR #9161:
URL: https://github.com/apache/iceberg/pull/9161#discussion_r1422771246


##
api/src/test/java/org/apache/iceberg/TestHelpers.java:
##
@@ -173,6 +178,60 @@ public static void assertSameSchemaMap(Map map1, Map

Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]

2023-12-11 Thread via GitHub


pvary commented on code in PR #8918:
URL: https://github.com/apache/iceberg/pull/8918#discussion_r1422793318


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java:
##
@@ -261,6 +261,12 @@ public void renameTable(TableIdentifier from, 
TableIdentifier originalTo) {
 } catch (InterruptedException e) {
   Thread.currentThread().interrupt();
   throw new RuntimeException("Interrupted in call to rename", e);
+} catch (RuntimeException e) {

Review Comment:
   From my side:
   - I am not sure #9143 is generally ok - maybe there were some reason behind 
the `RuntimeException`, which I am not aware of.
   - I prefer not to have to unwrap the `RuntimeException` in exception 
handling - We need to understand why it is needed, and remove the root cause



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add SQLite support [iceberg-python]

2023-12-11 Thread via GitHub


rdblue merged PR #178:
URL: https://github.com/apache/iceberg-python/pull/178


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Support schema.name-mapping.default Column Projection property [iceberg-python]

2023-12-11 Thread via GitHub


syun64 opened a new issue, #202:
URL: https://github.com/apache/iceberg-python/issues/202

   ### Feature Request / Improvement
   
   schema.name-mapping.default property is well supported in Spark Iceberg in 
order to enable column consistent reads for Iceberg tables that rely on 
migration procedures, where the field metadata of the underlying data files may 
not align with the current schema ids of the iceberg table.
   
   We should enable support for this property on PyIceberg to make sure that 
read operations are not misaligning data onto the wrong columns.
   
   [Add NameMapping to Spec 
Document](https://github.com/apache/iceberg/issues/3542)
   [Column Projection](https://iceberg.apache.org/spec/#column-projection)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-11 Thread via GitHub


dimas-b commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1422818781


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -347,4 +339,65 @@ private TableIdentifier identifierWithoutTableReference(
   protected Map properties() {
 return catalogOptions;
   }
+
+  @Override
+  protected ViewOperations newViewOps(TableIdentifier identifier) {
+TableReference tr = parseTableReference(identifier);
+return new NessieViewOperations(
+ContentKey.of(
+
org.projectnessie.model.Namespace.of(identifier.namespace().levels()), 
tr.getName()),
+client.withReference(tr.getReference(), tr.getHash()),
+fileIO);
+  }
+
+  @Override
+  public List listViews(Namespace namespace) {
+return client.listViews(namespace);
+  }
+
+  @Override
+  public boolean dropView(TableIdentifier identifier) {
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropView(identifierWithoutTableReference(identifier, tableReference), 
false);
+  }
+
+  @Override
+  public void renameView(TableIdentifier from, TableIdentifier to) {
+TableReference fromTableReference = parseTableReference(from);
+TableReference toTableReference = parseTableReference(to);
+
+validateReferenceForRename(fromTableReference, toTableReference, 
Content.Type.ICEBERG_VIEW);
+
+TableIdentifier fromIdentifier =
+NessieUtil.removeCatalogName(
+identifierWithoutTableReference(from, fromTableReference), name());
+TableIdentifier toIdentifier =
+NessieUtil.removeCatalogName(identifierWithoutTableReference(to, 
toTableReference), name());
+client
+.withReference(fromTableReference.getReference(), 
fromTableReference.getHash())
+.renameView(fromIdentifier, toIdentifier);
+  }
+
+  private void validateReferenceForRename(
+  TableReference fromTableReference, TableReference toTableReference, 
Content.Type type) {
+String fromReference =
+fromTableReference.hasReference()
+? fromTableReference.getReference()
+: client.getRef().getName();
+String toReference =
+toTableReference.hasReference()
+? toTableReference.getReference()
+: client.getRef().getName();
+Preconditions.checkArgument(
+fromReference.equalsIgnoreCase(toReference),
+"Cannot rename %s '%s' on reference '%s' to '%s' on reference '%s':"
++ " source and target references must be the same.",
+NessieUtil.contentTypeString(type).toLowerCase(),

Review Comment:
   #9276 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add doc for rewriting manifest with spec id [iceberg]

2023-12-11 Thread via GitHub


puchengy commented on PR #9253:
URL: https://github.com/apache/iceberg/pull/9253#issuecomment-1850509409

   @aokolnychyi Addressed your comment, PTAL, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] DatasourceV2 does not prune columns after V2ScanRelationPushDown [iceberg]

2023-12-11 Thread via GitHub


rdblue commented on issue #9268:
URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850720036

   I don't think I'm following the logic here. Is there a case where you're not 
seeing columns being properly pruned?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] JDBC catalog fix namespaceExists check [iceberg]

2023-12-11 Thread via GitHub


dramaticlly commented on PR #8340:
URL: https://github.com/apache/iceberg/pull/8340#issuecomment-1850721026

   @rdblue @amogh-jahagirdar can you take another look? I think it's awesome if 
we can merge this fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] DatasourceV2 does not prune columns after V2ScanRelationPushDown [iceberg]

2023-12-11 Thread via GitHub


akshayakp97 commented on issue #9268:
URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850779601

   Thanks for your response. 
   
   I am looking at TPCDS q16 physical plan for Iceberg on EMR. 
   
   Link to q16 - 
https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql
   
   The physical plan looks like - 
https://gist.github.com/akshayakp97/102715c66eee44bc6f72493f427528f8
   
   Line 46 projects only two columns from `Project [cs_warehouse_sk#54840, 
cs_order_number#54843L]`, however it looks like Iceberg is scanning all columns 
for  the `catalog_sales` table in Line 47. 
   
   Upon further digging, I found out that `ColumnPruning` 
[rule](https://github.com/apache/spark/blob/7db85642600b1e3b39ca11e41d4e3e0bf1c8962b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L859)
 adds the new `Project [cs_warehouse_sk#54840, cs_order_number#54843L]` 
operator, but we still see all columns read by the corresponding BatchScanExec. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

2023-12-11 Thread via GitHub


singhpk234 commented on PR #8980:
URL: https://github.com/apache/iceberg/pull/8980#issuecomment-1850795022

   > Can you explain what is the purpose of using existingFilesCount here ?
   
   I am not fully aware of this logically i totally agree with you it makes no 
sense to keep it but what i am skeptical is if there is some handling happening 
due to backward compatibility (not sure about it either) and you also suggested 
here that you are not able to populate this field (refering to your comment 
here : https://github.com/apache/iceberg/pull/8980#discussion_r1386580961) so 
my rationale here was to add this change when we can add some coverage or 
better would to take this change out of the current pr and involve more folks 
for the discussion and let this pr go as a functionality to skip overwrite 
commit only, please let me know your thoughts ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]

2023-12-11 Thread via GitHub


singhpk234 commented on code in PR #9274:
URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423038582


##
docs/spark-procedures.md:
##
@@ -639,6 +639,7 @@ Keep in mind the `add_files` procedure will fetch the 
Parquet metadata from each
 | `source_table`  | ✔️| string  | Table where 
files should come from, paths are also possible in the form of 
\`file_format\`.\`path\` |
 | `partition_filter`  | ️ | map | A map of 
partitions in the source table to import from   
   |
 | `check_duplicate_files` | ️ | boolean | Whether to 
prevent files existing in the table from being added (defaults to true) 
 |
+| `listing_parallelism`   |   | long| The parallelism 
to use when listing files (defaults to 1)   
|

Review Comment:
   can this be an int ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] DatasourceV2 does not prune columns after V2ScanRelationPushDown [iceberg]

2023-12-11 Thread via GitHub


akshayakp97 commented on issue #9268:
URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850797730

   After `ColumnPruning` adds the new `Project [cs_warehouse_sk#54840, 
cs_order_number#54843L]`, when `V2ScanRelationPushDown` rule triggers, it 
doesn't match the 
[`ScanOperation`](https://github.com/apache/spark/blob/7db85642600b1e3b39ca11e41d4e3e0bf1c8962b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L352-L353)
 pattern, because, the scan operator for `catalog_sales` in the logical plan 
seems to have been updated in the previous iterations of 
`V2ScanRelationPushDown` rule - which resulted in the conversion of 
`ScanBuilderHolder` to `DataSourceV2ScanRelation`. 
   To summarize, in this case, it looks like a new `Project` was added after 
the creation of `DataSourceV2ScanRelation` in the logical plan, causing it not 
prune columns. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] DatasourceV2 does not prune columns after V2ScanRelationPushDown [iceberg]

2023-12-11 Thread via GitHub


akshayakp97 commented on issue #9268:
URL: https://github.com/apache/iceberg/issues/9268#issuecomment-1850822846

   In general, if a `Project` is added after the execution of 
`V2ScanRelationPushDown` rule - how do the columns get pruned? Or, do we not 
expect any new `Project`'s? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]

2023-12-11 Thread via GitHub


GianlucaPrincipini commented on code in PR #9161:
URL: https://github.com/apache/iceberg/pull/9161#discussion_r1423154765


##
api/src/test/java/org/apache/iceberg/TestHelpers.java:
##
@@ -173,6 +178,60 @@ public static void assertSameSchemaMap(Map map1, Map

[PR] Core: Add StandardEncryptionManager [iceberg]

2023-12-11 Thread via GitHub


rdblue opened a new pull request, #9277:
URL: https://github.com/apache/iceberg/pull/9277

   This is an update to https://github.com/apache/iceberg/pull/6884 with a few 
final fixes to the `StandardEncryptionManager` and related classes. This is 
almost entirely @ggershinsky's work, I'm just submitting this to unblock 
merging since he's OOO right now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Build: Bump mkdocs-material from 9.5.1 to 9.5.2 [iceberg-python]

2023-12-11 Thread via GitHub


dependabot[bot] opened a new pull request, #203:
URL: https://github.com/apache/iceberg-python/pull/203

   Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 
9.5.1 to 9.5.2.
   
   Release notes
   Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mkdocs-material's 
releases.
   
   mkdocs-material-9.5.2
   
   Fixed types for slugify settings in blog plugin config
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6469";>#6469:
 Horizontal scrollbars on MathJax containers
   
   
   
   
   Changelog
   Sourced from https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG";>mkdocs-material's
 changelog.
   
   mkdocs-material-9.5.2+insiders-4.47.1 (2023-12-11)
   
   Improved editing experience for projects plugin
   Improved resilience of optimize and social plugin
   Fixed race condition when writing manifest in optimize and social 
plugin
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6475";>#6475:
 Logo not taking precedence over icon in social card
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6399";>#6399:
 Projects plugin doesn't pick up added/removed projects
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6306";>#6306:
 Projects plugin cache not correctly updated
   
   mkdocs-material-9.5.2 (2023-12-11)
   
   Fixed types for slugify settings in blog plugin config
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6469";>#6469:
 Horizontal scrollbars on MathJax containers
   
   mkdocs-material-9.5.1+insiders-4.47.0 (2023-12-08)
   
   Added support for staying on page when switching languages
   Added configurable logging capabilities to projects plugin
   Removed temporary warning on blog plugin authors file format change
   Fixed projects plugin logging messages twice on Linux systems
   Fixed projects plugin trying to hoist theme assets of divergent 
themes
   Fixed compatibility of optimize plugin and projects plugin
   Fixed compatibility of social plugin and projects plugin
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6448";>#6448:
 Code line selection broken for code blocks with custom ids
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6437";>#6437:
 Projects plugin crashing for certain site URL configurations
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6414";>#6414:
 Projects plugin doesn't prefix messages coming from projects
   
   mkdocs-material-9.5.1 (2023-12-08)
   
   Updated Greek translations
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6464";>#6464:
 Privacy plugin cannot be enabled
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6461";>#6461:
 Sorting blog posts ignores time component in date
   
   mkdocs-material-9.5.0 (2023-12-07)
   Merged Insiders features of 'Goat's Horn' funding goal
   
   Added privacy plugin: automatic downloading of external assets
   Added support for card grids and grid layouts
   Added support for improved tooltips
   Added support for content tabs anchor links (deep linking)
   Added support for automatic dark/light mode
   Added support for document contributors
   
   mkdocs-material-9.4.14+insiders-4.46.0 (2023-11-26)
   
   Added support for author profiles in blog plugin
   Fixed custom index pages yielding two navigation items (4.45.0 
regression)
   
   mkdocs-material-9.4.14 (2023-11-26)
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/squidfunk/mkdocs-material/commit/2ca5cf0ea0627982f595c14cb8080447f9a9d24c";>2ca5cf0
 Prepare 9.5.2 release
   https://github.com/squidfunk/mkdocs-material/commit/52feaba4034960aadda79761d8a0d4085021df72";>52feaba
 Merge branch 'master' of github.com:squidfunk/mkdocs-material
   https://github.com/squidfunk/mkdocs-material/commit/649db514291140d372735183b9456072d22c65ac";>649db51
 Updated dependencies
   https://github.com/squidfunk/mkdocs-material/commit/738dd7dc3f626741d6b7c28a4383adbf7c7db41f";>738dd7d
 Merge pull request https://redirect.github.com/squidfunk/mkdocs-material/issues/6486";>#6486
 from squidfunk/dependabot/github_actions/actions/set...
   https://github.com/squidfunk/mkdocs-material/commit/ef00a474f5029774b5f9528e91c0a7c5f96a72f3";>ef00a47
 Merge pull request https://redirect.github.com/squidfunk/mkdocs-material/issues/6487";>#6487
 from squidfunk/dependabot/github_actions/actions/dep...
   https://github.com/squidfunk/mkdocs-material/commit/cc03f10d63997b79f50f32174eeb522b58b77b29";>cc03f10
 Bump actions/deploy-pages from 2 to 3
   https://github.com/squidfunk/mkdocs-material/commit/85a3298713dda1288276d6bdb673b531e3d47c2c";>85a3298
 Bump actions/setup-python from 4 to 5
   https://github.com/squidfunk/mkdocs-material/commit/419898a337f047f8eee7d6cd37472355a915053e";>419898a
 Documentation (https://redirect.github.com/squidfunk/mkdocs-material/issues/6477";>#6477)
   https://github.com

[PR] Build: Bump typing-extensions from 4.8.0 to 4.9.0 [iceberg-python]

2023-12-11 Thread via GitHub


dependabot[bot] opened a new pull request, #204:
URL: https://github.com/apache/iceberg-python/pull/204

   Bumps [typing-extensions](https://github.com/python/typing_extensions) from 
4.8.0 to 4.9.0.
   
   Release notes
   Sourced from https://github.com/python/typing_extensions/releases";>typing-extensions's 
releases.
   
   4.9.0
   This feature release adds typing_extensions.ReadOnly, as 
specified
   by PEP 705, and makes various other improvements, especially to
   @typing_extensions.deprecated().
   There are no changes since 4.9.0rc1.
   4.9.0rc1
   
   Add support for PEP 705, adding typing_extensions.ReadOnly. 
Patch
   by Jelle Zijlstra.
   All parameters on NewType.__call__ are now positional-only. 
This means that
   the signature of typing_extensions.NewType.__call__ now exactly 
matches the
   signature of typing.NewType.__call__. Patch by Alex 
Waygood.
   Fix bug with using @deprecated on a mixin class. Inheriting 
from a
   deprecated class now raises a DeprecationWarning. Patch by 
Jelle Zijlstra.
   @deprecated now gives a better error message if you pass a 
non-str
   argument to the msg parameter. Patch by Alex Waygood.
   @deprecated is now implemented as a class for better 
introspectability.
   Patch by Jelle Zijlstra.
   Exclude __match_args__ from Protocol members.
   Backport of https://redirect.github.com/python/cpython/pull/110683";>python/cpython#110683
 by Nikita Sobolev.
   When creating a typing_extensions.NamedTuple class, ensure 
__set_name__
   is called on all objects that define __set_name__ and exist in 
the values
   of the NamedTuple class's class dictionary. Patch by Alex 
Waygood,
   backporting https://redirect.github.com/python/cpython/pull/111876";>python/cpython#111876.
   Improve the error message when trying to call issubclass() 
against a
   Protocol that has non-method members. Patch by Alex Waygood 
(backporting
   https://redirect.github.com/python/cpython/pull/112344";>python/cpython#112344,
 by Randolph Scholz).
   
   
   
   
   Changelog
   Sourced from https://github.com/python/typing_extensions/blob/main/CHANGELOG.md";>typing-extensions's
 changelog.
   
   Release 4.9.0 (December 9, 2023)
   This feature release adds typing_extensions.ReadOnly, as 
specified
   by PEP 705, and makes various other improvements, especially to
   @typing_extensions.deprecated().
   There are no changes since 4.9.0rc1.
   Release 4.9.0rc1 (November 29, 2023)
   
   Add support for PEP 705, adding typing_extensions.ReadOnly. 
Patch
   by Jelle Zijlstra.
   All parameters on NewType.__call__ are now positional-only. 
This means that
   the signature of typing_extensions.NewType.__call__ now exactly 
matches the
   signature of typing.NewType.__call__. Patch by Alex 
Waygood.
   Fix bug with using @deprecated on a mixin class. Inheriting 
from a
   deprecated class now raises a DeprecationWarning. Patch by 
Jelle Zijlstra.
   @deprecated now gives a better error message if you pass a 
non-str
   argument to the msg parameter. Patch by Alex Waygood.
   @deprecated is now implemented as a class for better 
introspectability.
   Patch by Jelle Zijlstra.
   Exclude __match_args__ from Protocol members.
   Backport of https://redirect.github.com/python/cpython/pull/110683";>python/cpython#110683
 by Nikita Sobolev.
   When creating a typing_extensions.NamedTuple class, ensure 
__set_name__
   is called on all objects that define __set_name__ and exist in 
the values
   of the NamedTuple class's class dictionary. Patch by Alex 
Waygood,
   backporting https://redirect.github.com/python/cpython/pull/111876";>python/cpython#111876.
   Improve the error message when trying to call issubclass() 
against a
   Protocol that has non-method members. Patch by Alex Waygood 
(backporting
   https://redirect.github.com/python/cpython/pull/112344";>python/cpython#112344,
 by Randolph Scholz).
   
   
   
   
   Commits
   
   https://github.com/python/typing_extensions/commit/fc461d6faf4585849b561f2e4cbb06e9db095307";>fc461d6
 Release 4.9.0 (https://redirect.github.com/python/typing_extensions/issues/313";>#313)
   https://github.com/python/typing_extensions/commit/f82d6367f3ff8f16b6291de06394ec6b9318bfc3";>f82d636
 Prepare release 4.9.0rc1 (https://redirect.github.com/python/typing_extensions/issues/306";>#306)
   https://github.com/python/typing_extensions/commit/daa793141c3d504ce0a1d19ef032ea83466ba5c2";>daa7931
 Run typed-argument-parser tests on 3.12 in the daily workflow (https://redirect.github.com/python/typing_extensions/issues/307";>#307)
   https://github.com/python/typing_extensions/commit/0b0166d649cebcb48e7e208ae5da36cfab5965fe";>0b0166d
 Add support for PEP 705 (https://redirect.github.com/python/typing_extensions/issues/284";>#284)
   https://github.com/python/typing_extensions/commit/db6f9b4a0e1c18c6269691691e72e6b80a247ebd";>db6f9b4
 Update https://github.com/deprecated";>@​deprecated 
implementation (https://redirect.github.com/python/typing_extensions/issues/302";

[PR] Core, API: Move SQLViewRepresentation to API [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar opened a new pull request, #9278:
URL: https://github.com/apache/iceberg/pull/9278

   This change is similar to the one done a while back for `BaseViewVersion`. 
SQLViewRepresentation is moved from the core module to the API module (since it 
is a concept that's apparent in the spec).  This stems from this thread 
https://github.com/apache/iceberg/pull/9247/files#r1419759456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar commented on code in PR #9278:
URL: https://github.com/apache/iceberg/pull/9278#discussion_r1423218929


##
core/src/main/java/org/apache/iceberg/view/BaseSQLViewRepresentation.java:
##
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import org.immutables.value.Value;
+
+@Value.Immutable
+@SuppressWarnings("ImmutablesStyle")
+@Value.Style(
+typeImmutable = "ImmutableSQLViewRepresentation",
+visibilityString = "PUBLIC",
+builderVisibilityString = "PUBLIC")
+interface BaseSQLViewRepresentation extends SQLViewRepresentation {}

Review Comment:
   Package private `BaseSQLViewRepresentation` which extends the public 
SQLViewRepresentation. The generated Immutables class is public for preserving 
compatibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Querying metadata tables for a branch or tag [iceberg]

2023-12-11 Thread via GitHub


wypoon opened a new issue, #9279:
URL: https://github.com/apache/iceberg/issues/9279

   ### Query engine
   
   _No response_
   
   ### Question
   
   The documentation states that metadata tables can be queried with `TIMESTAMP 
AS OF` and `VERSION AS OF`. As tables can be queried for a branch or tag using 
`VERSION AS OF` as well, this suggests that metadata tables can be queried for 
a branch or tag this way too. However, I have found that `VERSION AS OF` seems 
to have an effect on only some metadata tables and not all. E.g., querying the 
`history` table always returns the history of the main branch.
   
   Which metadata tables support querying for a branch or tag (or snapshot for 
that matter)? And which ones do not? For the ones that do not, is it simply 
that the support hasn't been implemented yet? or is there some inherent 
technical reason it is not supported?
   @jackye1995 @amogh-jahagirdar 
   
   In any case, it would be helpful to document the current state of support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar commented on code in PR #9278:
URL: https://github.com/apache/iceberg/pull/9278#discussion_r1423222132


##
.palantir/revapi.yml:
##
@@ -877,6 +877,10 @@ acceptedBreaks:
 - code: "java.field.serialVersionUIDChanged"
   new: "field org.apache.iceberg.util.SerializableMap.serialVersionUID"
   justification: "Serialization is not be used"
+- code: "java.method.visibilityReduced"
+  old: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation 
org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.SQLViewRepresentation)"
+  new: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation 
org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.BaseSQLViewRepresentation)"
+  justification: "Immutables generated copyOf visibility changed"

Review Comment:
   I think there must have been some behavior change in Immutables between the 
original code generation for SQLViewRepresentation and now. Immutables now 
generates a package-private copyOf (confirmed with ViewVersion). However, prior 
to this change the generated copyOf for SQLViewRepresentation was actually 
public. This is technically breaking to downgrade the copyOf but I don't see 
any options in the `Immutables` doc to control this behavior. Before merging 
this though, I'll explore a few options



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar commented on code in PR #9278:
URL: https://github.com/apache/iceberg/pull/9278#discussion_r1423222132


##
.palantir/revapi.yml:
##
@@ -877,6 +877,10 @@ acceptedBreaks:
 - code: "java.field.serialVersionUIDChanged"
   new: "field org.apache.iceberg.util.SerializableMap.serialVersionUID"
   justification: "Serialization is not be used"
+- code: "java.method.visibilityReduced"
+  old: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation 
org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.SQLViewRepresentation)"
+  new: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation 
org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.BaseSQLViewRepresentation)"
+  justification: "Immutables generated copyOf visibility changed"

Review Comment:
   I think there must have been some behavior change in Immutables between the 
original code generation for SQLViewRepresentation and now. Immutables now 
generates a package-private copyOf (confirmed with ViewVersion). However, prior 
to this change the generated copyOf for SQLViewRepresentation was actually 
public. This is technically breaking to downgrade the copyOf but I don't see 
any options in the `Immutables` docs to control this behavior. Before merging 
this though, I'll explore a few options



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar closed pull request #9278: Core, API: Move 
SQLViewRepresentation to API
URL: https://github.com/apache/iceberg/pull/9278


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar commented on PR #9278:
URL: https://github.com/apache/iceberg/pull/9278#issuecomment-1851051133

   Looks like some unrelated Flink tests failed, noting here:
   
   ```
   
   TestIcebergSourceWithWatermarkExtractor > testThrottling FAILED
   java.lang.AssertionError: 
   Expecting
 

Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar commented on PR #9278:
URL: https://github.com/apache/iceberg/pull/9278#issuecomment-1851053058

   Ah I see the flink failure was fixed recently in 
https://github.com/apache/iceberg/pull/9216, I just need to rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar commented on code in PR #9274:
URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423238036


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java:
##
@@ -530,14 +515,16 @@ private static void importUnpartitionedSparkTable(
* @param spec a partition spec
* @param stagingDir a staging directory to store temporary manifest files
* @param checkDuplicateFiles if true, throw exception if import results in 
a duplicate data file
+   * @param listingParallelism the parallelism to use when listing files
*/
   public static void importSparkPartitions(
   SparkSession spark,
   List partitions,
   Table targetTable,
   PartitionSpec spec,
   String stagingDir,
-  boolean checkDuplicateFiles) {
+  boolean checkDuplicateFiles,
+  int listingParallelism) {

Review Comment:
   Important: This is a public utility, we can't just add a parameter without 
breaking users who may be using this (and this API has existed for a while) . 
It's better to duplicate and add a new a method (and then leverage the 
implementation of the parallel one in the original one)



##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java:
##
@@ -417,33 +427,6 @@ public static void importSparkTable(
 }
   }
 
-  /**
-   * Import files from an existing Spark table to an Iceberg table.
-   *
-   * The import uses the Spark session to get table metadata. It assumes no 
operation is going on
-   * the original and target table and thus is not thread-safe.
-   *
-   * @param spark a Spark session
-   * @param sourceTableIdent an identifier of the source Spark table
-   * @param targetTable an Iceberg table where to import the data
-   * @param stagingDir a staging directory to store temporary manifest files
-   * @param checkDuplicateFiles if true, throw exception if import results in 
a duplicate data file
-   */
-  public static void importSparkTable(
-  SparkSession spark,

Review Comment:
   Important: this is a public utility, we want to maintain compatibility and 
not remove it. If there's a reason we want to deprecate it, it should go down a 
deprecation path. But I don't see why we would want to deprecate this



##
data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java:
##
@@ -215,11 +215,11 @@ private static DataFile buildDataFile(
 .build();
   }
 
-  private static ExecutorService migrationService(int concurrentDeletes) {
+  private static ExecutorService migrationService(int numThreads) {

Review Comment:
   Nit: This seems like an unnecessary name change to me, there are other 
places in the code where concurrentX refers to parallelism we want, and I don't 
see a need to change that convention



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core, API: Move SQLViewRepresentation to API [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar commented on code in PR #9278:
URL: https://github.com/apache/iceberg/pull/9278#discussion_r1423222132


##
.palantir/revapi.yml:
##
@@ -877,6 +877,10 @@ acceptedBreaks:
 - code: "java.field.serialVersionUIDChanged"
   new: "field org.apache.iceberg.util.SerializableMap.serialVersionUID"
   justification: "Serialization is not be used"
+- code: "java.method.visibilityReduced"
+  old: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation 
org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.SQLViewRepresentation)"
+  new: "method org.apache.iceberg.view.ImmutableSQLViewRepresentation 
org.apache.iceberg.view.ImmutableSQLViewRepresentation::copyOf(org.apache.iceberg.view.BaseSQLViewRepresentation)"
+  justification: "Immutables generated copyOf visibility changed"

Review Comment:
   I think there must have been some behavior change in Immutables between the 
original code generation for SQLViewRepresentation and now. Immutables now 
generates a package-private copyOf (confirmed with ViewVersion). However, prior 
to this change the generated copyOf for SQLViewRepresentation was actually 
public. This is technically breaking to downgrade the copyOf but I don't see 
any options in the `Immutables` docs to control this behavior. Before merging 
this though, I'll explore a few options. Since the view APIs are newer we have 
more flexibility here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Quickstart should give an example of a REST catalog enablement in the spark iceberg docker compose setup [iceberg]

2023-12-11 Thread via GitHub


github-actions[bot] commented on issue #7615:
URL: https://github.com/apache/iceberg/issues/7615#issuecomment-1851097951

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Quickstart should give an example of a REST catalog enablement in the spark iceberg docker compose setup [iceberg]

2023-12-11 Thread via GitHub


github-actions[bot] closed issue #7615: Quickstart should give an example of a 
REST catalog enablement in the spark iceberg docker compose setup
URL: https://github.com/apache/iceberg/issues/7615


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Is is possible to control the number of partitions (groups) for compaction ? [iceberg]

2023-12-11 Thread via GitHub


github-actions[bot] commented on issue #7506:
URL: https://github.com/apache/iceberg/issues/7506#issuecomment-1851097979

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Inconsistent API for remove_orphan_files and DeleteOrphanFiles [iceberg]

2023-12-11 Thread via GitHub


github-actions[bot] commented on issue #7480:
URL: https://github.com/apache/iceberg/issues/7480#issuecomment-1851098001

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] DecimalType declaration check missing proper assertions [iceberg]

2023-12-11 Thread via GitHub


github-actions[bot] commented on issue #7420:
URL: https://github.com/apache/iceberg/issues/7420#issuecomment-1851098029

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]

2023-12-11 Thread via GitHub


manuzhang commented on code in PR #9274:
URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423283454


##
data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java:
##
@@ -215,11 +215,11 @@ private static DataFile buildDataFile(
 .build();
   }
 
-  private static ExecutorService migrationService(int concurrentDeletes) {
+  private static ExecutorService migrationService(int numThreads) {

Review Comment:
   However, this service is not for deletes. It might have been copied from 
somewhere. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]

2023-12-11 Thread via GitHub


manuzhang commented on code in PR #9274:
URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423305940


##
docs/spark-procedures.md:
##
@@ -639,6 +639,7 @@ Keep in mind the `add_files` procedure will fetch the 
Parquet metadata from each
 | `source_table`  | ✔️| string  | Table where 
files should come from, paths are also possible in the form of 
\`file_format\`.\`path\` |
 | `partition_filter`  | ️ | map | A map of 
partitions in the source table to import from   
   |
 | `check_duplicate_files` | ️ | boolean | Whether to 
prevent files existing in the table from being added (defaults to true) 
 |
+| `listing_parallelism`   |   | long| The parallelism 
to use when listing files (defaults to 1)   
|

Review Comment:
   There's no `asInt` method in `ProcedureInput`. Maybe shall I add one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Add Spark application id to summary of RewriteDataFilesSparkAction [iceberg]

2023-12-11 Thread via GitHub


manuzhang commented on PR #9273:
URL: https://github.com/apache/iceberg/pull/9273#issuecomment-1851176081

   @amogh-jahagirdar @nastra could you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Documentation [iceberg-rust]

2023-12-11 Thread via GitHub


liurenjie1024 commented on issue #114:
URL: https://github.com/apache/iceberg-rust/issues/114#issuecomment-1851182139

   cc @Xuanwo 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Parallelize file listing in add_files procedure [iceberg]

2023-12-11 Thread via GitHub


amogh-jahagirdar commented on code in PR #9274:
URL: https://github.com/apache/iceberg/pull/9274#discussion_r1423322127


##
data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java:
##
@@ -215,11 +215,11 @@ private static DataFile buildDataFile(
 .build();
   }
 
-  private static ExecutorService migrationService(int concurrentDeletes) {
+  private static ExecutorService migrationService(int numThreads) {

Review Comment:
   Ah you are right. I confused myself, in that case this name is a good thing. 
Thanks for fixing this! 



##
data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java:
##
@@ -215,11 +215,11 @@ private static DataFile buildDataFile(
 .build();
   }
 
-  private static ExecutorService migrationService(int concurrentDeletes) {
+  private static ExecutorService migrationService(int numThreads) {

Review Comment:
   Ah you are right. I confused myself, in that case this name change is a good 
change. Thanks for fixing this! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Documentation [iceberg-rust]

2023-12-11 Thread via GitHub


Xuanwo commented on issue #114:
URL: https://github.com/apache/iceberg-rust/issues/114#issuecomment-1851232081

   > Just curious, what would be the relationship with docs of crates on 
docs.rs?
   
   From the ASF perspective, anything outside of `apache.org` does not exist. 
We should host our guide and API documentation on `rust.iceberg.apache.org` as 
the single source of truth. The documents on `docs.rs` are merely for the 
convenience of Rust users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Create Branches / TAGS between 2 snapshots [iceberg]

2023-12-11 Thread via GitHub


fanaticjo opened a new issue, #9280:
URL: https://github.com/apache/iceberg/issues/9280

   ### Feature Request / Improvement
   
   Is there a way where we can create a branch / TAG based on 2 snapshot ids or 
only latest data 
   
   We have a use case where we write monthly generated report to a iceberg 
table , but for every month we want to tag / branch the data for audit purposes 
. 
   
   Currently branch / tag creates the data from that snapshot to first 
snapshots . 
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Create Branches / TAGS between 2 snapshots [iceberg]

2023-12-11 Thread via GitHub


fanaticjo opened a new issue, #9281:
URL: https://github.com/apache/iceberg/issues/9281

   ### Feature Request / Improvement
   
   Is there a way where we can create a branch / TAG based on 2 snapshot ids or 
only latest data 
   
   We have a use case where we write monthly generated report to a iceberg 
table , but for every month we want to tag / branch the data for audit purposes 
. 
   
   Currently branch / tag creates the data from that snapshot to first 
snapshots . 
   
   if this task is possible , please let us know or if we can contribute also 
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Unable to merge CDC data into snapshot data. java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long [iceberg]

2023-12-11 Thread via GitHub


harshith-bolar-rapido commented on issue #8333:
URL: https://github.com/apache/iceberg/issues/8333#issuecomment-1851252377

   Any update on this? Facing the same issue with 1.2.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Unable to merge CDC data into snapshot data. java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long [iceberg]

2023-12-11 Thread via GitHub


chandu-1101 commented on issue #8333:
URL: https://github.com/apache/iceberg/issues/8333#issuecomment-1851283161

   Abandon Iceberg, Abandon Hudi. Both are useless frameworks. Go with Parquet 
and spark sql. Your life will be simple and happy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Data: Add GenericFileWriterFactory [iceberg]

2023-12-11 Thread via GitHub


szehon-ho commented on code in PR #9267:
URL: https://github.com/apache/iceberg/pull/9267#discussion_r1423408972


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java:
##
@@ -219,27 +219,27 @@ public void testPrimitiveColumns() throws Exception {
 
 Row binaryCol =
 Row.of(
-59L,

Review Comment:
   I probably should have originally picked these up from the file itself, 
instead of hard-coding it, but its a suggestion for another PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Data: Add GenericFileWriterFactory [iceberg]

2023-12-11 Thread via GitHub


szehon-ho commented on code in PR #9267:
URL: https://github.com/apache/iceberg/pull/9267#discussion_r1423408972


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestMetadataTableReadableMetrics.java:
##
@@ -219,27 +219,27 @@ public void testPrimitiveColumns() throws Exception {
 
 Row binaryCol =
 Row.of(
-59L,

Review Comment:
   I probably should have originally picked these up from the file itself, but 
its a suggestion for another PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Hive: Use jUnit5 based HiveMetastoreExtension with Hive catalog tests [iceberg]

2023-12-11 Thread via GitHub


nk1506 opened a new pull request, #9282:
URL: https://github.com/apache/iceberg/pull/9282

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]

2023-12-11 Thread via GitHub


nk1506 commented on code in PR #8918:
URL: https://github.com/apache/iceberg/pull/8918#discussion_r1423500567


##
hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java:
##
@@ -31,6 +31,10 @@
 import org.junit.jupiter.api.AfterAll;
 import org.junit.jupiter.api.BeforeAll;
 
+/*
+ * This meta-setup has been deprecated; use {@link HiveMetastoreExtension} 
instead.
+ * */
+@Deprecated

Review Comment:
   Created a [PR](https://github.com/apache/iceberg/pull/9282)  with jUnit5. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Set HadoopFileIO as default for HadoopCatalog [iceberg]

2023-12-11 Thread via GitHub


agrawalreetika commented on PR #9283:
URL: https://github.com/apache/iceberg/pull/9283#issuecomment-1851415341

   @nastra, Could you please help me with review on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



  1   2   >