[PR] Spark 3.5: Fix testReplacePartitionField for Rewrite Manifests [iceberg]

2023-12-08 Thread via GitHub


bknbkn opened a new pull request, #9250:
URL: https://github.com/apache/iceberg/pull/9250

   Influenced by patch: https://github.com/apache/iceberg/pull/6695/files, 
rewrite manifests op will not execute in testReplacePartitionField due to only 
one record in table. This will cause the function to not properly test for 
patch : https://github.com/apache/iceberg/pull/5691


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore at [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #9030:
URL: https://github.com/apache/iceberg/issues/9030#issuecomment-1846738796

   @whymed 
   Hi,
   Thank you so much.
   can you please explain this part more to me ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-08 Thread via GitHub


tmnd1991 commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1846764267

   > How do you determine that the SystemFunctions are not pushed down?
   > 
   > Spark will push down predicate(which includes predicates containing system 
functions) through join(except for full outer join), see: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1912
 . So I don't think you need to handle joins specifically in 
ReplaceStaticInvoke.
   
   Thanks @advancedxy , now, that explains a lot of what I was observing 
happening in my project.
   During a MERGE (which is 2 joins, one LeftSemi + one FullOuter) I was 
observing that the first join was correctly pruning the partitions, while the 
secondo one, was not. Adding this patch though helps pruning more partitions, 
this is because the batch scan on the target table cannot prune partitions 
because the file names (collected as a result of the first join) are not known 
when performing physical planning. I think we should limit the replacement to 
the "full outer" case, what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-08 Thread via GitHub


tmnd1991 commented on code in PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#discussion_r1420130446


##
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestSystemFunctionPushDownDQL.java:
##
@@ -224,6 +226,35 @@ private void testBucketLongFunction(boolean partitioned) {
 Assertions.assertThat(actual.size()).isEqualTo(5);
   }
 
+  @Test
+  public void testBucketLongFunctionInClauseOnUnpartitionedTable() {

Review Comment:
   yes, that was the idea, let's focus on 3.5 and when we reach consensus on 
that I back port it to 3.4



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: switch to use SortKey for data statistics [iceberg]

2023-12-08 Thread via GitHub


pvary commented on code in PR #9212:
URL: https://github.com/apache/iceberg/pull/9212#discussion_r1420131007


##
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsOperator.java:
##
@@ -119,9 +121,9 @@ public void testProcessElement() throws Exception {
 testHarness = createHarness(this.operator)) {
   StateInitializationContext stateContext = getStateContext();
   operator.initializeState(stateContext);
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("a";
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("a";
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("b";
+  operator.processElement(new StreamRecord<>(genericRowDataA));
+  operator.processElement(new StreamRecord<>(genericRowDataA));

Review Comment:
   Sorry - my midnight review was not clear enough. 😢 
   
   I was trying to suggest to process a different `RowData` object, with the 
same key, like:
   ```
   private final GenericRowData genericRowDataA_1 = 
GenericRowData.of(StringData.fromString("a"), 1);
   private final GenericRowData genericRowDataA_2 = 
GenericRowData.of(StringData.fromString("a"), 2);
   private final GenericRowData genericRowDataB = 
GenericRowData.of(StringData.fromString("b"), 3);
   [..]
   operator.processElement(new StreamRecord<>(genericRowDataA_1));
   operator.processElement(new StreamRecord<>(genericRowDataA_2));
   operator.processElement(new StreamRecord<>(genericRowDataB));
   ```
   
   I know that we have individual test for the correct grouping, but I consider 
this as an e2e tests, and it would be nice to test this out as well.



##
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsOperator.java:
##
@@ -119,9 +121,9 @@ public void testProcessElement() throws Exception {
 testHarness = createHarness(this.operator)) {
   StateInitializationContext stateContext = getStateContext();
   operator.initializeState(stateContext);
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("a";
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("a";
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("b";
+  operator.processElement(new StreamRecord<>(genericRowDataA));
+  operator.processElement(new StreamRecord<>(genericRowDataA));

Review Comment:
   Sorry - my midnight review was not clear enough. 😢 
   
   I was trying to suggest to process a different `RowData` object, with the 
same key, like:
   ```
   private final GenericRowData genericRowDataA_1 = 
GenericRowData.of(StringData.fromString("a"), 1);
   private final GenericRowData genericRowDataA_2 = 
GenericRowData.of(StringData.fromString("a"), 2);
   private final GenericRowData genericRowDataB = 
GenericRowData.of(StringData.fromString("b"), 3);
   [..]
   operator.processElement(new StreamRecord<>(genericRowDataA_1));
   operator.processElement(new StreamRecord<>(genericRowDataA_2));
   operator.processElement(new StreamRecord<>(genericRowDataB));
   ```
   
   I know that we have individual test for the correct grouping, but I consider 
this as an e2e tests, and it would be nice to test this out as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Fix IcebergSource tableloader lifecycle management in batch mode [iceberg]

2023-12-08 Thread via GitHub


pvary commented on code in PR #9173:
URL: https://github.com/apache/iceberg/pull/9173#discussion_r1420143541


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##
@@ -105,12 +107,12 @@ public class IcebergSource implements Sourcehttps://github.com/apache/iceberg/blob/820fc3ceda386149f42db8b54e6db9171d1a3a6d/flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java#L481-L489
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Documentation [iceberg-rust]

2023-12-08 Thread via GitHub


Fokko opened a new issue, #114:
URL: https://github.com/apache/iceberg-rust/issues/114

   It would be great to have a minimal set of docs. I think we can get a lot of 
inspiration from the PyIceberg project where we use mkdocs: 
https://github.com/apache/iceberg-python/tree/main/mkdocs
   
   We can copy the structure and the CI to deploy it. 
   
   With the [`CNAME` 
file](https://github.com/apache/iceberg-python/blob/gh-pages/CNAME) we can tell 
the ASF to set up a DNS entry to the docs. We could do 
`https://rs.iceberg.apache.org/` or `https://rust.iceberg.apache.org/`, but 
I'll leave that up to the project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Documentation [iceberg-rust]

2023-12-08 Thread via GitHub


Xuanwo commented on issue #114:
URL: https://github.com/apache/iceberg-rust/issues/114#issuecomment-1846826826

   The name `https://rust.iceberg.apache.org/` seems appropriate given that our 
repository is named `iceberg-rust`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Rework DeleteFileIndexBenchmark [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi commented on code in PR #9165:
URL: https://github.com/apache/iceberg/pull/9165#discussion_r1420161446


##
core/src/test/java/org/apache/iceberg/FileGenerationUtil.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.nio.ByteBuffer;
+import java.util.Map;
+import java.util.Random;
+import java.util.UUID;
+import org.apache.iceberg.io.DeleteSchemaUtil;
+import org.apache.iceberg.io.LocationProvider;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Conversions;
+import org.apache.iceberg.types.Types;
+
+public class FileGenerationUtil {
+
+  private static final Random RANDOM = new Random();
+
+  private FileGenerationUtil() {}
+
+  public static DataFile generateDataFile(Table table, StructLike partition) {
+Schema schema = table.schema();
+PartitionSpec spec = table.spec();
+LocationProvider locations = table.locationProvider();
+String path = locations.newDataLocation(spec, partition, 
generateFileName());
+long fileSize = generateFileSize();
+Metrics metrics = generateRandomMetrics(schema);
+return DataFiles.builder(spec)
+.withPath(path)
+.withPartition(partition)
+.withFileSizeInBytes(fileSize)
+.withFormat(FileFormat.PARQUET)
+.withMetrics(metrics)
+.build();
+  }
+
+  public static DeleteFile generatePositionDeleteFile(Table table, StructLike 
partition) {
+PartitionSpec spec = table.spec();
+LocationProvider locations = table.locationProvider();
+String path = locations.newDataLocation(spec, partition, 
generateFileName());
+long fileSize = generateFileSize();
+Metrics metrics = generatePositionDeleteMetrics();
+return FileMetadata.deleteFileBuilder(table.spec())
+.ofPositionDeletes()
+.withPath(path)
+.withPartition(partition)
+.withFileSizeInBytes(fileSize)
+.withFormat(FileFormat.PARQUET)
+.withMetrics(metrics)
+.build();
+  }
+
+  public static DeleteFile generatePositionDeleteFile(Table table, DataFile 
dataFile) {
+PartitionSpec spec = table.spec();
+StructLike partition = dataFile.partition();
+LocationProvider locations = table.locationProvider();
+String path = locations.newDataLocation(spec, partition, 
generateFileName());
+long fileSize = generateFileSize();
+Metrics metrics = generatePositionDeleteMetrics(dataFile);
+return FileMetadata.deleteFileBuilder(table.spec())
+.ofPositionDeletes()
+.withPath(path)
+.withPartition(partition)
+.withFileSizeInBytes(fileSize)
+.withFormat(FileFormat.PARQUET)
+.withMetrics(metrics)
+.build();
+  }
+
+  public static String generateFileName() {
+int partitionId = RANDOM.nextInt(100_000);
+int taskId = RANDOM.nextInt(100);
+UUID operationId = UUID.randomUUID();

Review Comment:
   Added a comment indicating that this code replicates `OutputFileFactory`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Rework DeleteFileIndexBenchmark [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi commented on code in PR #9165:
URL: https://github.com/apache/iceberg/pull/9165#discussion_r1420161870


##
core/src/test/java/org/apache/iceberg/FileGenerationUtil.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.nio.ByteBuffer;
+import java.util.Map;
+import java.util.Random;
+import java.util.UUID;
+import org.apache.iceberg.io.DeleteSchemaUtil;
+import org.apache.iceberg.io.LocationProvider;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Conversions;
+import org.apache.iceberg.types.Types;
+
+public class FileGenerationUtil {
+
+  private static final Random RANDOM = new Random();
+
+  private FileGenerationUtil() {}
+
+  public static DataFile generateDataFile(Table table, StructLike partition) {
+Schema schema = table.schema();
+PartitionSpec spec = table.spec();
+LocationProvider locations = table.locationProvider();
+String path = locations.newDataLocation(spec, partition, 
generateFileName());
+long fileSize = generateFileSize();
+Metrics metrics = generateRandomMetrics(schema);
+return DataFiles.builder(spec)
+.withPath(path)
+.withPartition(partition)
+.withFileSizeInBytes(fileSize)
+.withFormat(FileFormat.PARQUET)
+.withMetrics(metrics)
+.build();
+  }
+
+  public static DeleteFile generatePositionDeleteFile(Table table, StructLike 
partition) {
+PartitionSpec spec = table.spec();
+LocationProvider locations = table.locationProvider();
+String path = locations.newDataLocation(spec, partition, 
generateFileName());
+long fileSize = generateFileSize();
+Metrics metrics = generatePositionDeleteMetrics();
+return FileMetadata.deleteFileBuilder(table.spec())
+.ofPositionDeletes()
+.withPath(path)
+.withPartition(partition)
+.withFileSizeInBytes(fileSize)
+.withFormat(FileFormat.PARQUET)
+.withMetrics(metrics)
+.build();
+  }
+
+  public static DeleteFile generatePositionDeleteFile(Table table, DataFile 
dataFile) {
+PartitionSpec spec = table.spec();
+StructLike partition = dataFile.partition();
+LocationProvider locations = table.locationProvider();
+String path = locations.newDataLocation(spec, partition, 
generateFileName());
+long fileSize = generateFileSize();
+Metrics metrics = generatePositionDeleteMetrics(dataFile);
+return FileMetadata.deleteFileBuilder(table.spec())
+.ofPositionDeletes()
+.withPath(path)
+.withPartition(partition)
+.withFileSizeInBytes(fileSize)
+.withFormat(FileFormat.PARQUET)
+.withMetrics(metrics)
+.build();
+  }
+
+  public static String generateFileName() {
+int partitionId = RANDOM.nextInt(100_000);

Review Comment:
   It mostly means Spark write partition ID to mimic real file names.



##
core/src/test/java/org/apache/iceberg/FileGenerationUtil.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.nio.ByteBuffer;
+import java.util.Map;
+import java.util.Random;
+import java.util.UUID;
+import org.apache.iceberg.io.DeleteSchemaUtil;
+import org.apache.iceberg.io.LocationProvider;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.ice

Re: [PR] Spark 3.5: Rework DeleteFileIndexBenchmark [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi merged PR #9165:
URL: https://github.com/apache/iceberg/pull/9165


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Rework DeleteFileIndexBenchmark [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi commented on PR #9165:
URL: https://github.com/apache/iceberg/pull/9165#issuecomment-1846842066

   Thanks for reviewing, @flyrain!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Fix testReplacePartitionField for Rewrite Manifests [iceberg]

2023-12-08 Thread via GitHub


bknbkn commented on PR #9250:
URL: https://github.com/apache/iceberg/pull/9250#issuecomment-1846867489

   cc @rdblue @ajantha-bhat 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420174672


##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -176,8 +178,10 @@ public void testCreateDropTableToCatalog() throws 
IOException {
 HadoopCatalog catalog = new CustomHadoopCatalog(conf, warehouseLocation);
 Table table = catalog.loadTable(identifier);
 
-Assert.assertEquals(SchemaParser.toJson(SCHEMA), 
SchemaParser.toJson(table.schema()));
-Assert.assertEquals(PartitionSpecParser.toJson(SPEC), 
PartitionSpecParser.toJson(table.spec()));
+Assertions.assertThat(SchemaParser.toJson(table.schema()))
+.isEqualTo(SchemaParser.toJson(SCHEMA));
+Assertions.assertThat(PartitionSpecParser.toJson(table.spec()))

Review Comment:
   please use the statically imported `assertThat()` method here and everywhere 
else



##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -54,9 +53,9 @@ public class TestCatalogs {
 
   private Configuration conf;
 
-  @Rule public TemporaryFolder temp = new TemporaryFolder();
+  @TempDir public Path temp;

Review Comment:
   ```suggestion
 @TempDir private Path temp;
   ```



##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -198,11 +202,11 @@ public void testCreateDropTableToCatalog() throws 
IOException {
   public void testLoadCatalogDefault() {
 String catalogName = "barCatalog";
 Optional defaultCatalog = Catalogs.loadCatalog(conf, catalogName);
-Assert.assertTrue(defaultCatalog.isPresent());
+Assertions.assertThat(defaultCatalog.isPresent()).isTrue();

Review Comment:
   ```suggestion
   assertThat(defaultCatalog).isPresent();
   ```



##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -212,11 +216,11 @@ public void testLoadCatalogHive() {
 InputFormatConfig.catalogPropertyConfigKey(catalogName, 
CatalogUtil.ICEBERG_CATALOG_TYPE),
 CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
 Optional hiveCatalog = Catalogs.loadCatalog(conf, catalogName);
-Assert.assertTrue(hiveCatalog.isPresent());
+Assertions.assertThat(hiveCatalog.isPresent()).isTrue();

Review Comment:
   same as above



##
mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -218,13 +218,12 @@ public static void assertEquals(Record expected, Record 
actual) {
 for (int i = 0; i < expected.size(); ++i) {
   if (expected.get(i) instanceof OffsetDateTime) {
 // For OffsetDateTime we just compare the actual instant
-Assert.assertEquals(
-((OffsetDateTime) expected.get(i)).toInstant(),
-((OffsetDateTime) actual.get(i)).toInstant());
+Assertions.assertThat(((OffsetDateTime) actual.get(i)).toInstant())
+.isEqualTo(((OffsetDateTime) expected.get(i)).toInstant());
   } else if (expected.get(i) instanceof byte[]) {
-Assert.assertArrayEquals((byte[]) expected.get(i), (byte[]) 
actual.get(i));
+Assertions.assertThat((byte[]) actual.get(i)).isEqualTo((byte[]) 
expected.get(i));

Review Comment:
   we can remove this line and `else if (expected.get(i) instanceof byte[]) {` 
because AssertJ can properly handle byte array comparisons



##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -230,13 +234,13 @@ public void testLoadCatalogHadoop() {
 catalogName, CatalogProperties.WAREHOUSE_LOCATION),
 "/tmp/mylocation");
 Optional hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
-Assert.assertTrue(hadoopCatalog.isPresent());
+Assertions.assertThat(hadoopCatalog.isPresent()).isTrue();

Review Comment:
   same as above



##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -250,16 +254,18 @@ public void testLoadCatalogCustom() {
 catalogName, CatalogProperties.WAREHOUSE_LOCATION),
 "/tmp/mylocation");
 Optional customHadoopCatalog = Catalogs.loadCatalog(conf, 
catalogName);
-Assert.assertTrue(customHadoopCatalog.isPresent());
+Assertions.assertThat(customHadoopCatalog.isPresent()).isTrue();
 
Assertions.assertThat(customHadoopCatalog.get()).isInstanceOf(CustomHadoopCatalog.class);
 Properties properties = new Properties();
 properties.put(InputFormatConfig.CATALOG_NAME, catalogName);
-Assert.assertFalse(Catalogs.hiveCatalog(conf, properties));
+Assertions.assertThat(Catalogs.hiveCatalog(conf, properties)).isFalse();
   }
 
   @Test
   public void testLoadCatalogLocation() {
-Assert.assertFalse(Catalogs.loadCatalog(conf, 
Catalogs.ICEBERG_HADOOP_TABLE_NAME).isPresent());
+Assertions.assertThat(
+Catalogs.loadCatalog(conf, 
Catalogs.ICEBERG_HADOOP_TABLE_NAME).isPresent())

Review Comment:
   same as above



##
mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -288,9 +287,10 @@ public static 

Re: [I] Type Promotion: Int/Long to String [iceberg]

2023-12-08 Thread via GitHub


zhongyujiang commented on issue #9064:
URL: https://github.com/apache/iceberg/issues/9064#issuecomment-1846875009

   Hi @danielcweeks, thank you for opening this. We used to have this need in 
real use cases, so I think this can be really helpful! 
   In addition, we have also encountered cases where we need to convert a 
column from nullable to not-null (the column contains no nulls). So I wonder 
whether we can also consider supporting promote a column from nullable to 
not-null when we can make sure that the column has no nulls?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #9185:
URL: https://github.com/apache/iceberg/pull/9185#discussion_r1420194979


##
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkScan.java:
##
@@ -49,37 +51,28 @@
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.util.DateTimeUtil;
 import org.assertj.core.api.Assertions;
-import org.junit.Assert;
-import org.junit.ClassRule;
-import org.junit.Rule;
-import org.junit.Test;
-import org.junit.rules.TemporaryFolder;
-import org.junit.runner.RunWith;
-import org.junit.runners.Parameterized;
-
-@RunWith(Parameterized.class)

Review Comment:
   I brought this up on the [DEV 
list](https://lists.apache.org/thread/5c4xs9w4jrot57dknq6vvwyg4o069kmo). Let's 
give people a few days and then we can come back with the right approach to 
properly migrate this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API, Core: Add sqlFor API to views to handle basic resolution of dialect [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #9247:
URL: https://github.com/apache/iceberg/pull/9247#discussion_r1420208010


##
core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java:
##
@@ -200,6 +200,13 @@ public void completeCreateView() {
 .build())
 .build());
 
+assertThat(((SQLViewRepresentation) view.sqlFor("spark")).sql())
+.isEqualTo("select * from ns.tbl");
+assertThat(((SQLViewRepresentation) view.sqlFor("trino")).sql())
+.isEqualTo("select * from ns.tbl using X");
+assertThat(((SQLViewRepresentation) view.sqlFor("unknown-dialect")).sql())
+.isEqualTo("select * from ns.tbl using X");

Review Comment:
   I would probably move this out into a separate test method entirely



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420210944


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitia

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420212153


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##
@@ -142,15 +148,27 @@ private UpdateableReference loadReference(String 
requestedRef, String hash) {
   }
 
   public List listTables(Namespace namespace) {
+return listContents(namespace, Content.Type.ICEBERG_TABLE);
+  }
+
+  public List listViews(Namespace namespace) {
+return listContents(namespace, Content.Type.ICEBERG_VIEW);
+  }
+
+  /** Lists Iceberg table or view from the given namespace */
+  protected List listContents(Namespace namespace, 
Content.Type type) {

Review Comment:
   is there any value in keeping this (and others) protected? I would have 
thought it should be ok to make this private, since all the other places would 
then use the table/view-specific methods



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


ajantha-bhat commented on PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#issuecomment-1846896443

   @nastra: I have addressed the comments. Thanks for the review.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420215532


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##
@@ -190,4 +203,118 @@ public static Optional extractSingleConflict(
 Conflict conflict = conflicts.get(0);
 return Optional.of(conflict);
   }
+
+  public static ViewMetadata loadViewMetadata(
+  ViewMetadata metadata, String metadataLocation, Reference reference) {
+Map newProperties = Maps.newHashMap(metadata.properties());
+newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
+return ViewMetadata.buildFrom(
+
ViewMetadata.buildFrom(metadata).setProperties(newProperties).build())
+.setMetadataLocation(metadataLocation)
+.build();
+  }
+
+  static void handleExceptionsForCommits(Exception exception, String refName, 
Content.Type type) {

Review Comment:
   should this maybe also return `Optional<..>`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r142062


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##
@@ -142,15 +148,27 @@ private UpdateableReference loadReference(String 
requestedRef, String hash) {
   }
 
   public List listTables(Namespace namespace) {
+return listContents(namespace, Content.Type.ICEBERG_TABLE);
+  }
+
+  public List listViews(Namespace namespace) {
+return listContents(namespace, Content.Type.ICEBERG_VIEW);
+  }
+
+  /** Lists Iceberg table or view from the given namespace */
+  protected List listContents(Namespace namespace, 
Content.Type type) {

Review Comment:
   In the `NessieCatalog` itself I have some common code for content which 
calls these protected classes. 
   
   If I doesn't have common code in `NessieCatalog`, it will be code 
duplication. 
   For example see, `NessieCatalog.renameContent()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] shutdown scheduler [iceberg]

2023-12-08 Thread via GitHub


nastra commented on PR #9150:
URL: https://github.com/apache/iceberg/pull/9150#issuecomment-1846903866

   @gabrywu can you fix CI failures please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] shutdown scheduler [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #9150:
URL: https://github.com/apache/iceberg/pull/9150#discussion_r1420224873


##
core/src/main/java/org/apache/iceberg/util/LockManagers.java:
##
@@ -153,6 +153,14 @@ public void initialize(Map properties) {
   CatalogProperties.LOCK_HEARTBEAT_THREADS,
   CatalogProperties.LOCK_HEARTBEAT_THREADS_DEFAULT);
 }
+
+@Override
+public void close() {

Review Comment:
   to fix CI I think this needs to define that it ` throws Exception`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-08 Thread via GitHub


advancedxy commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1846909628

   > Adding this patch though helps pruning more partitions, this is because 
the batch scan on the target table cannot prune partitions because the file 
names (collected as a result of the first join) are not known when performing 
physical planning. I think we should limit the replacement to the "full outer" 
case, what do you think?
   
   Could you elaborate a bit more? the planning tree string/dag of Spark SQL 
would be helpful. 
   If the join type is full outer, the predicate could not be pushed down, 
therefore the partition pruning is unlikely to be performed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420226948


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##
@@ -190,4 +203,118 @@ public static Optional extractSingleConflict(
 Conflict conflict = conflicts.get(0);
 return Optional.of(conflict);
   }
+
+  public static ViewMetadata loadViewMetadata(
+  ViewMetadata metadata, String metadataLocation, Reference reference) {
+Map newProperties = Maps.newHashMap(metadata.properties());
+newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
+return ViewMetadata.buildFrom(
+
ViewMetadata.buildFrom(metadata).setProperties(newProperties).build())
+.setMetadataLocation(metadataLocation)
+.build();
+  }
+
+  static void handleExceptionsForCommits(Exception exception, String refName, 
Content.Type type) {

Review Comment:
   we always throw an exception from this method as it is called only for 3 
types of exception which are filtered. 
   So, I think it is not required here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420229851


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##
@@ -142,15 +148,27 @@ private UpdateableReference loadReference(String 
requestedRef, String hash) {
   }
 
   public List listTables(Namespace namespace) {
+return listContents(namespace, Content.Type.ICEBERG_TABLE);
+  }
+
+  public List listViews(Namespace namespace) {
+return listContents(namespace, Content.Type.ICEBERG_VIEW);
+  }
+
+  /** Lists Iceberg table or view from the given namespace */
+  protected List listContents(Namespace namespace, 
Content.Type type) {

Review Comment:
   `NessieCatalog.table --> NessieCatalog.content --> NessieClient.content` 
this way it is less code duplication. 
   
   `NessieCatalog.table -->  NessieClient.table --> NessieClient.content` will 
cause duplicate lines of code in  `NessieCatalog`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-08 Thread via GitHub


advancedxy commented on code in PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#discussion_r1420230467


##
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestSystemFunctionPushDownDQL.java:
##
@@ -224,6 +226,35 @@ private void testBucketLongFunction(boolean partitioned) {
 Assertions.assertThat(actual.size()).isEqualTo(5);
   }
 
+  @Test
+  public void testBucketLongFunctionInClauseOnUnpartitionedTable() {

Review Comment:
   Then, modification to this file should be reverted?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420230637


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##
@@ -142,15 +148,27 @@ private UpdateableReference loadReference(String 
requestedRef, String hash) {
   }
 
   public List listTables(Namespace namespace) {
+return listContents(namespace, Content.Type.ICEBERG_TABLE);
+  }
+
+  public List listViews(Namespace namespace) {
+return listContents(namespace, Content.Type.ICEBERG_VIEW);
+  }
+
+  /** Lists Iceberg table or view from the given namespace */
+  protected List listContents(Namespace namespace, 
Content.Type type) {

Review Comment:
   why can't those places just call `renameTable()` / `renameView()` directly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420232303


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##
@@ -190,4 +203,118 @@ public static Optional extractSingleConflict(
 Conflict conflict = conflicts.get(0);
 return Optional.of(conflict);
   }
+
+  public static ViewMetadata loadViewMetadata(
+  ViewMetadata metadata, String metadataLocation, Reference reference) {
+Map newProperties = Maps.newHashMap(metadata.properties());
+newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
+return ViewMetadata.buildFrom(
+
ViewMetadata.buildFrom(metadata).setProperties(newProperties).build())
+.setMetadataLocation(metadataLocation)
+.build();
+  }
+
+  static void handleExceptionsForCommits(Exception exception, String refName, 
Content.Type type) {

Review Comment:
   it is more about how this is handling in the exception block where this 
method is called. Having an explicit `throw` where this method is called is 
more readable IMO



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-08 Thread via GitHub


tmnd1991 commented on PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1846982769

   Thanks for the review, I addressed your concerns. If I get green light I 
proceed to copy-paste over 3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-08 Thread via GitHub


tmnd1991 commented on code in PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#discussion_r1420295367


##
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestSystemFunctionPushDownDQL.java:
##
@@ -224,6 +226,35 @@ private void testBucketLongFunction(boolean partitioned) {
 Assertions.assertThat(actual.size()).isEqualTo(5);
   }
 
+  @Test
+  public void testBucketLongFunctionInClauseOnUnpartitionedTable() {

Review Comment:
   yup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-08 Thread via GitHub


tmnd1991 commented on code in PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#discussion_r1420295733


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala:
##
@@ -40,14 +37,20 @@ import org.apache.spark.sql.types.StructType
 object ReplaceStaticInvoke extends Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan =
-plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, 
FILTER)) {
+plan.transformWithPruning (_.containsPattern(FILTER)) {
   case filter @ Filter(condition, _) =>
-val newCondition = 
condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) {
+val newCondition = 
condition.transformWithPruning(_.containsAnyPattern(BINARY_COMPARISON, IN, 
INSET)) {
   case c @ BinaryComparison(left: StaticInvoke, right) if 
canReplace(left) && right.foldable =>
 c.withNewChildren(Seq(replaceStaticInvoke(left), right))
 
   case c @ BinaryComparison(left, right: StaticInvoke) if 
canReplace(right) && left.foldable =>
 c.withNewChildren(Seq(left, replaceStaticInvoke(right)))
+
+  case in @ In(value: StaticInvoke, _) if canReplace(value) =>

Review Comment:
   done as suggested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420317210


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##
@@ -190,4 +203,118 @@ public static Optional extractSingleConflict(
 Conflict conflict = conflicts.get(0);
 return Optional.of(conflict);
   }
+
+  public static ViewMetadata loadViewMetadata(
+  ViewMetadata metadata, String metadataLocation, Reference reference) {
+Map newProperties = Maps.newHashMap(metadata.properties());
+newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
+return ViewMetadata.buildFrom(
+
ViewMetadata.buildFrom(metadata).setProperties(newProperties).build())
+.setMetadataLocation(metadataLocation)
+.build();
+  }
+
+  static void handleExceptionsForCommits(Exception exception, String refName, 
Content.Type type) {

Review Comment:
   in that case wouldn't it make more sense to return an exception from this 
method, so that the calling side can actually do a `throw 
handleExceptionsForCommits(..)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] spark3 can't query iceberg: failed to connect to Hive Metastore [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #2359:
URL: https://github.com/apache/iceberg/issues/2359#issuecomment-1847038152

@coolderli 
   @RussellSpitzer 
   @hunter-cloud09 
   @dixingxing0 
   @pvary 
   
   Hello everyone.
   
   I am using Hive Catalog to create Iceberg tables with Spark as the execution 
engine:
   
   conf = (
   pyspark.SparkConf()
   .setAppName('app_name')
   #packages
   .set('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
   #SQL Extensions
   .set('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
   #Configuring Catalog
   .set('spark.sql.catalog.catalog_hive', 
'org.apache.iceberg.spark.SparkCatalog')
   .set('spark.sql.catalog.catalog_hive.type', 'hive')
   .set('spark.sql.catalog.catalog_hive.uri', HIVE_URI)
   .set('spark.sql.catalog.catalog_hive.warehouse', WAREHOUSE)
   .set('spark.sql.catalog.catalog_hive.endpoint', AWS_S3_ENDPOINT)
   .set('spark.sql.catalog.catalog_hive.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO')
   .set('spark.hadoop.fs.s3a.access.key', AWS_ACCESS_KEY)
   .set('spark.hadoop.fs.s3a.secret.key', AWS_SECRET_KEY)
   )
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   print("Spark Running")
   spark.sql("CREATE TABLE catalog_hive.default.table (name STRING) USING 
iceberg;").show()
   
   When I try to run createTable command it gives me an exception:
   
   Spark Running
   23/12/07 16:47:16 WARN metastore: Failed to connect to the MetaStore 
Server...
   23/12/07 16:47:17 WARN metastore: Failed to connect to the MetaStore 
Server...
   23/12/07 16:47:18 WARN metastore: Failed to connect to the MetaStore 
Server...
   
   Py4JJavaError Traceback (most recent call last)
   Cell In[2], line 36
   34 print("Spark Running")
   35 ## Create a Table
   ---> 36 spark.sql("CREATE TABLE catalog_hive.default.table (name STRING) 
USING iceberg;").show()
   37 ## Insert Some Data
   38 spark.sql("INSERT INTO catalog_hive.default.table VALUES ('Alex Merced'), 
('Dipankar Mazumdar'), ('Jason Hughes')").show()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/session.py:1034, in 
SparkSession.sql(self, sqlQuery, **kwargs)
   1032 sqlQuery = formatter.format(sqlQuery, **kwargs)
   1033 try:
   -> 1034 return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   1035 finally:
   1036 if len(kwargs) > 0:
   
   File ~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in 
JavaMember.call(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +
   1316 self.command_header +
   1317 args_command +
   1318 proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
   -> 1321 return_value = get_return_value(
   1322 answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325 temp_arg._detach()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/utils.py:190, in 
capture_sql_exception..deco(*a, **kw)
   188 def deco(*a: Any, **kw: Any) -> Any:
   189 try:
   --> 190 return f(*a, **kw)
   191 except Py4JJavaError as e:
   192 converted = convert_exception(e.java_exception)
   
   File ~/.local/lib/python3.10/site-packages/py4j/protocol.py:326, in 
get_return_value(answer, gateway_client, target_id, name)
   324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
   325 if answer[1] == REFERENCE_TYPE:
   --> 326 raise Py4JJavaError(
   327 "An error occurred while calling {0}{1}{2}.\n".
   328 format(target_id, ".", name), value)
   329 else:
   330 raise Py4JError(
   331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
   332 format(target_id, ".", name, value))
   
   Py4JJavaError: An error occurred while calling o47.sql.
   : org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive 
Metastore
   at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84)
   at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34)
   at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125)
   at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56)
   at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
   at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:82)
   at 
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:205)
   at 
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:95)
   at 
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:78)
   at 
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:43)
   at 
org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
   at 
java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908)
   at 
org.apache.iceberg.shaded

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1420339501


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -232,17 +232,32 @@ protected String defaultWarehouseLocation(TableIdentifier 
table) {
 
   @Override
   public List listTables(Namespace namespace) {
-return client.listContents(namespace, Content.Type.ICEBERG_TABLE);
+return client.listTables(namespace);
   }
 
   @Override
   public boolean dropTable(TableIdentifier identifier, boolean purge) {
-return dropContent(identifier, Content.Type.ICEBERG_TABLE);
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropTable(identifierWithoutTableReference(identifier, 
tableReference), false);
   }
 
   @Override
   public void renameTable(TableIdentifier from, TableIdentifier to) {
-renameContent(from, to, Content.Type.ICEBERG_TABLE);
+TableReference fromTableReference = parseTableReference(from);

Review Comment:
   Note: not moving these checks to a private common method as I got a comment 
recently that 
   we should not diverge
   catalog.table -> catalog.content -> client.table -> client.content
   
   now it looks like catalog.table -> catalog.table -> client.content



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-08 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r142032


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java:
##
@@ -232,17 +232,32 @@ protected String defaultWarehouseLocation(TableIdentifier 
table) {
 
   @Override
   public List listTables(Namespace namespace) {
-return client.listContents(namespace, Content.Type.ICEBERG_TABLE);
+return client.listTables(namespace);
   }
 
   @Override
   public boolean dropTable(TableIdentifier identifier, boolean purge) {
-return dropContent(identifier, Content.Type.ICEBERG_TABLE);
+TableReference tableReference = parseTableReference(identifier);
+return client
+.withReference(tableReference.getReference(), tableReference.getHash())
+.dropTable(identifierWithoutTableReference(identifier, 
tableReference), false);
   }
 
   @Override
   public void renameTable(TableIdentifier from, TableIdentifier to) {
-renameContent(from, to, Content.Type.ICEBERG_TABLE);
+TableReference fromTableReference = parseTableReference(from);

Review Comment:
   This is that comment
   https://github.com/apache/iceberg/pull/8909#discussion_r1410794852



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] spark3 can't query iceberg: failed to connect to Hive Metastore [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #2359:
URL: https://github.com/apache/iceberg/issues/2359#issuecomment-1847109573

   @RussellSpitzer 
   first of all thank you very much for your answer
   as shown in the screenshots below:
   
   1. I configured hive metastore with the address: thrift://hive-metastore:9083
   2. hive metastore service is running
   3. and for the connection with spark I defined the .env file which contains 
the hive metastore address
   
   
![image](https://github.com/apache/iceberg/assets/149940691/f7107f07-5a7d-4b5e-896a-cd19581daf80)
   
   
   
![image](https://github.com/apache/iceberg/assets/149940691/2b4bd4f4-ec0f-4536-8371-49baa60e175c)
   
   
   
   
![image](https://github.com/apache/iceberg/assets/149940691/778c82a8-fee2-4c08-a7da-5cea505a6235)
   
   Is there any configuration missing or to be added in the script?
   
   Sincerely,
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-08 Thread via GitHub


tmnd1991 commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1847179494

   Sure, let me add a bit of context:
   I have two table with the exact same schema/layout, partitioned on 3 columns:
   - identity(MEAS_YM)
   - identity(MEAS_DD)
   - bucket(POD, 4)
   The source table (small one) has strictly a subset of partitions w/r/t the 
target table (big one).
   In this example I will talk about a local reproducer but keep in mind we are 
talking about a 65TB table with 400k partitions, so every 1% improvement 
actually means a lot.
   
   I started running a merge statement as following, taking advantage of SPJ:
   ```
   MERGE INTO target USING (SELECT * FROM source)
   ON target.MEAS_YM = source.MEAS_YM AND target. MEAS_DD = source. MEAS_DD AND 
target.POD = source.POD
   WHEN MATCHED THEN UPDATE SET ...
   ```
   
   This results in the following physical plan:
   ```
   == Physical Plan ==
   ReplaceData (13)
   +- * Sort (12)
  +- * Project (11)
 +- MergeRows (10)
+- SortMergeJoin FullOuter (9)
   :- * Sort (4)
   :  +- * Project (3)
   : +- * ColumnarToRow (2)
   :+- BatchScan target (1)
   +- * Sort (8)
  +- * Project (7)
 +- * ColumnarToRow (6)
+- BatchScan source (5)
   = Subqueries =
   
   Subquery:1 Hosting operator id = 1 Hosting Expression = _file#2274 IN 
subquery#2672
   * HashAggregate (26)
   +- Exchange (25)
  +- * HashAggregate (24)
 +- * Project (23)
+- * SortMergeJoin LeftSemi (22)
   :- * Sort (17)
   :  +- * Filter (16)
   : +- * ColumnarToRow (15)
   :+- BatchScan target (14)
   +- * Sort (21)
  +- * Filter (20)
 +- * ColumnarToRow (19)
+- BatchScan source (18)
   ```
   
   with
   
   ```
   (1) BatchScan target
   Output [60]: [..., _file#2274]
   target (branch=null) [filters=, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (5) BatchScan source
   Output [60]: [...]
   source (branch=null) [filters=, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (14) BatchScan target
   Output [8]: [..., _file#2590]
   target (branch=null) [filters=POD IS NOT NULL, MEAS_YM IS NOT NULL, MEAS_DD 
IS NOT NULL, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (18) BatchScan source
   Output [7]: [...]
   source (branch=null) [filters=POD IS NOT NULL, MEAS_YM IS NOT NULL, MEAS_DD 
IS NOT NULL, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   ```
   
   This was creating 33 (+10 to exchange the file names) tasks for the subquery 
and 33 tasks for the second join.
   Practically I know for sure that I hit only 25 partitions, not 33 (i.e. some 
files were still read even if we know upfront that they are not needed, also 
the `_file IN (subquery)` can't prune any file because it's dynamic. On top of 
that, I observed that even if files should've been excluded by Spark in 
post-scan filter, still the execution of the task was not as fast as I expected 
(i.e. close to 0ms)).
   
   Therefore, knowing exactly the partitions that I hit beforehand, I tried to 
help iceberg/spark a little enumerating the partitions values that are actually 
hit:
   
   ```
   MERGE INTO target USING (SELECT * FROM source)
   ON target.`POD` = source.`POD` AND target.`MEAS_YM` = source.`MEAS_YM` AND 
target.`MEAS_DD` = source.`MEAS_DD` AND (
 (target.`meas_ym` = '202306' AND target.`meas_dd` = '02' AND 
system.bucket(4, target.`pod`) IN (0,2,3)) OR
 (target.`meas_ym` = '202306' AND target.`meas_dd` = '01') OR 
 (target.`meas_ym` = '202307' AND target.`meas_dd` = '02' AND 
system.bucket(4, target.`pod`) IN (1,3)) OR 
 (target.`meas_ym` = '202306' AND target.`meas_dd` = '03') OR 
 (target.`meas_ym` = '202308' AND target.`meas_dd` = '01' AND 
system.bucket(4, target.`pod`) IN (0,1,2)) OR 
 (target.`meas_ym` = '202307' AND target.`meas_dd` = '03' AND 
system.bucket(4, target.`pod`) IN (0,1,2)) OR 
 (target.`meas_ym` = '202308' AND target.`meas_dd` = '03' AND 
system.bucket(4, target.`pod`) IN (0,3)) OR 
 (target.`meas_ym` = '202307' AND target.`meas_dd` = '01' AND 
system.bucket(4, target.`pod`) IN (0,1,2)) OR 
 (target.`meas_ym` = '202308' AND target.`meas_dd` = '02' AND 
system.bucket(4, target.`pod`) IN (3)))
   WHEN MATCHED THEN UPDATE SET ...
   ```
   
   To my surprise the plan was exactly the same...
   
   Then I fixed this issue and also #9191 locally (adding an optimiser to my 
spark session) and the scans actually changed:
   
   ```
   (1) BatchScan target
   Output [60]: [..., _file#2279]
   target (branch=null) [filters=MEAS_YM = '202306' AND ((MEAS_DD = '02' 
AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307' 
AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND 
MEAS_DD = '03') OR (

Re: [PR] Flink: switch to use SortKey for data statistics [iceberg]

2023-12-08 Thread via GitHub


stevenzwu commented on code in PR #9212:
URL: https://github.com/apache/iceberg/pull/9212#discussion_r1420622934


##
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestDataStatisticsOperator.java:
##
@@ -119,9 +121,9 @@ public void testProcessElement() throws Exception {
 testHarness = createHarness(this.operator)) {
   StateInitializationContext stateContext = getStateContext();
   operator.initializeState(stateContext);
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("a";
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("a";
-  operator.processElement(new 
StreamRecord<>(GenericRowData.of(StringData.fromString("b";
+  operator.processElement(new StreamRecord<>(genericRowDataA));
+  operator.processElement(new StreamRecord<>(genericRowDataA));

Review Comment:
   got it. for readability,I would just get rid of the pre-constructed objects 
and always construct them on the fly then



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] spark3 can't query iceberg: failed to connect to Hive Metastore [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #2359:
URL: https://github.com/apache/iceberg/issues/2359#issuecomment-1847344053

   @pvary 
   Thank you very much for your answer
   
   In fact, i am using "catalog_hive" Catalog to create Iceberg tables with 
Spark as the execution engine:
   import pyspark
   from pyspark.sql import SparkSession
   import os
   
   ## DEFINE SENSITIVE VARIABLES
   HIVE_URI = os.environ.get("HIVE_URI","thrift://hive-metastore:9083") ## 
Nessie Server URI
   WAREHOUSE = os.environ.get("WAREHOUSE", "s3a://warehouse/")
   AWS_ACCESS_KEY = os.environ.get("AWS_ACCESS_KEY", "OQm04sIzCakGYugOqBOV")
   AWS_SECRET_KEY = os.environ.get("AWS_SECRET_KEY", 
"PPlNPddfFWnUdxWdKq5BKoNfkjuRz8fjCQLi4b4I")
   AWS_S3_ENDPOINT = os.environ.get("AWS_S3_ENDPOINT", 
"http://minioserver:9000";)
   
   print(AWS_S3_ENDPOINT)
   print(HIVE_URI)
   print(WAREHOUSE)
   conf = (
   pyspark.SparkConf()
   .setAppName('app_name')
#packages
   .set('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
#SQL Extensions
   .set('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
#Configuring Catalog
   .set('spark.sql.catalog.catalog_hive', 
'org.apache.iceberg.spark.SparkCatalog')
   .set('spark.sql.catalog.catalog_hive.type', 'hive')
   .set('spark.sql.catalog.catalog_hive.uri', HIVE_URI)
   .set('spark.sql.catalog.catalog_hive.warehouse.dir', WAREHOUSE) 
   .set('spark.sql.catalog.catalog_hive.endpoint', AWS_S3_ENDPOINT)
   .set('spark.sql.catalog.catalog_hive.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO')
   .set('spark.hadoop.fs.s3a.access.key', AWS_ACCESS_KEY)
   .set('spark.hadoop.fs.s3a.secret.key', AWS_SECRET_KEY)
   .set("spark.hadoop.fs.s3a.path.style.access", "true")
   )
   ## Start Spark Session
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   print("Spark Running")
   ## Create a Table
   spark.sql("CREATE TABLE catalog_hive.defaul.tmy_table (name STRING) USING 
iceberg;").show()
   ## Insert Some Data
   spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('Alex Merced'), 
('Dipankar Mazumdar'), ('Jason Hughes')").show()
   ## Query the Data
   spark.sql("SELECT * FROM catalog_hive.default.my_table;").show()
   
   when I run the script again,i get this error:
   
   http://minioserver:9000/
   thrift://hive-metastore:9083
   s3a://warehouse/
   23/12/08 14:52:06 WARN SparkSession: Using an existing Spark session; only 
runtime SQL configurations will take effect.
   23/12/08 14:52:06 DEBUG SparkSession: Ignored static SQL configurations:
 spark.sql.catalogImplementation=hive
 spark.sql.warehouse.dir=s3a://warehouse/
   23/12/08 14:52:06 DEBUG SparkSession: Configurations that might not take 
effect:
 spark.sql.catalog.catalog_hive.uri=thrift://hive-metastore:9083
   Spark Running
   23/12/08 14:52:06 DEBUG SparkSqlParser: Parsing command: CREATE TABLE 
catalog_hive.defaul.tmy_table (name STRING) USING iceberg;
   23/12/08 14:52:06 INFO metastore: Trying to connect to metastore with URI 
http://nessie:19120/api/v1
   23/12/08 14:52:06 INFO metastore: Closed a connection to metastore, current 
connections: 0
   23/12/08 14:52:14 WARN metastore: Failed to connect to the MetaStore 
Server...
   org.apache.thrift.transport.TTransportException: 
java.net.UnknownHostException: nessie
at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:478)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:245)
at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1740)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:97)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.inv

Re: [PR] [spark <3.5] [backport from spark 3.5] Support specifying spec_id in RewriteManifestProcedure [iceberg]

2023-12-08 Thread via GitHub


RussellSpitzer merged PR #9243:
URL: https://github.com/apache/iceberg/pull/9243


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] [spark <3.5] [backport from spark 3.5] Support specifying spec_id in RewriteManifestProcedure [iceberg]

2023-12-08 Thread via GitHub


RussellSpitzer commented on PR #9243:
URL: https://github.com/apache/iceberg/pull/9243#issuecomment-1847363462

   Thanks for back-porting, If you can please follow up with a doc PR for the 
new arg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] test: Add integration tests for rest catalog. [iceberg-rust]

2023-12-08 Thread via GitHub


liurenjie1024 commented on PR #109:
URL: https://github.com/apache/iceberg-rust/pull/109#issuecomment-1847376669

   cc @Fokko I've resolved conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Documentation [iceberg-rust]

2023-12-08 Thread via GitHub


liurenjie1024 commented on issue #114:
URL: https://github.com/apache/iceberg-rust/issues/114#issuecomment-1847387741

   Just curious, what would be the relationship with crates on docs.rs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] test: Add integration tests for rest catalog. [iceberg-rust]

2023-12-08 Thread via GitHub


Fokko merged PR #109:
URL: https://github.com/apache/iceberg-rust/pull/109


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] test: Rest catalog integration test. [iceberg-rust]

2023-12-08 Thread via GitHub


Fokko closed issue #100: test: Rest catalog integration test.
URL: https://github.com/apache/iceberg-rust/issues/100


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-08 Thread via GitHub


advancedxy commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1847450030

   >
   ```
   == Physical Plan ==
   ReplaceData (13)
   +- * Sort (12)
  +- * Project (11)
 +- MergeRows (10)
+- SortMergeJoin FullOuter (9)  < Full Outer here
   ```
   
   If the join type is full outer, it means that there are NoMatchedActions. So 
your merge into command should have an `when not matched` clause, is that 
correct?
   
   >
   ```(1) BatchScan target
   Output [60]: [..., _file#2279]
   target (branch=null) [filters=MEAS_YM = '202306' AND ((MEAS_DD = '02' 
AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307' 
AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND 
MEAS_DD = '03') OR ((MEAS_YM = '202308' AND MEAS_DD = '01') AND bucket[4](POD) 
IN (0, 1, 2 OR ((MEAS_DD = '03' AND ((MEAS_YM = '202307' AND bucket[4](POD) 
IN (0, 1, 2)) OR (MEAS_YM = '202308' AND bucket[4](POD) IN (0, 3 OR 
(((MEAS_YM = '202307' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)) OR 
((MEAS_YM = '202308' AND MEAS_DD = '02') AND bucket[4](POD) = 3, 
groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (5) BatchScan source
   Output [60]: [...]
   source (branch=null) [filters=, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (14) BatchScan target
   Output [8]: [..., _file#2590]
   target (branch=null) [filters=MEAS_YM = '202306' AND ((MEAS_DD = '02' 
AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307' 
AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND 
MEAS_DD = '03') OR ((MEAS_YM = '202308' AND MEAS_DD = '01') AND bucket[4](POD) 
IN (0, 1, 2 OR ((MEAS_DD = '03' AND ((MEAS_YM = '202307' AND bucket[4](POD) 
IN (0, 1, 2)) OR (MEAS_YM = '202308' AND bucket[4](POD) IN (0, 3 OR 
(((MEAS_YM = '202307' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)) OR 
((MEAS_YM = '202308' AND MEAS_DD = '02') AND bucket[4](POD) = 3, POD IS NOT 
NULL, MEAS_YM IS NOT NULL, MEAS_DD IS NOT NULL, MAGNITUDE IS NOT NULL, 
METER_KEY IS NOT NULL, REC_ID IS NOT NULL, COLLECT_ID IS NOT NULL, 
groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   
   (18) BatchScan source
   Output [7]: [...]
   source (branch=null) [filters=POD IS NOT NULL, MEAS_YM IS NOT NULL, MEAS_DD 
IS NOT NULL, MAGNITUDE IS NOT NULL, METER_KEY IS NOT NULL, REC_ID IS NOT NULL, 
COLLECT_ID IS NOT NULL, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
   ```
   Could you give the full plan tree or dag for this changed plan? Is the join 
type still full outer?  This is quite strange.  I'm not sure why Filter would 
be pushed down to the data source for a full outer join.  You may set 
`spark.sql.planChangeLog.level` to `INFO` to get which rule changes the plan, 
and posted related plan changes in a gist, that would help to clarify the 
problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-08 Thread via GitHub


tmnd1991 commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1847513665

   yes sorry, there’s also a when not matched statement. i can’t attach the 
plan, but i’ll push a reproducer soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]

2023-12-08 Thread via GitHub


rodmeneses commented on code in PR #9185:
URL: https://github.com/apache/iceberg/pull/9185#discussion_r1420794807


##
data/src/test/java/org/apache/iceberg/data/GenAppenderHelper.java:
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.data;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Path;
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Files;
+import org.apache.iceberg.SnapshotRef;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.io.FileAppender;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.assertj.core.api.Assertions;
+
+/** Helper for appending {@link DataFile} to a table or appending {@link 
Record}s to a table. */
+public class GenAppenderHelper {
+
+  private static final String ORC_CONFIG_PREFIX = "^orc.*";
+  private static final String PARQUET_CONFIG_PATTERN = ".*parquet.*";
+
+  private static final String TEMP_FILE = "temp";
+
+  private final Table table;
+  private final FileFormat fileFormat;
+  private final Path tmp;
+  private final Configuration conf;
+
+  public GenAppenderHelper(Table table, FileFormat fileFormat, Path tmp, 
Configuration conf) {
+this.table = table;
+this.fileFormat = fileFormat;
+this.tmp = tmp;
+this.conf = conf;
+  }
+
+  public GenAppenderHelper(Table table, FileFormat fileFormat, Path tmp) {
+this(table, fileFormat, tmp, null);
+  }
+
+  public void appendToTable(String branch, DataFile... dataFiles) {
+Preconditions.checkNotNull(table, "table not set");
+
+AppendFiles append =
+table.newAppend().toBranch(branch != null ? branch : 
SnapshotRef.MAIN_BRANCH);
+
+for (DataFile dataFile : dataFiles) {
+  append = append.appendFile(dataFile);
+}
+
+append.commit();
+  }
+
+  public void appendToTable(DataFile... dataFiles) {
+appendToTable(null, dataFiles);
+  }
+
+  public void appendToTable(List records) throws IOException {
+appendToTable(null, null, records);
+  }
+
+  public void appendToTable(String branch, List records) throws 
IOException {
+appendToTable(null, branch, records);
+  }
+
+  public void appendToTable(StructLike partition, String branch, List 
records)
+  throws IOException {
+appendToTable(branch, writeFile(partition, records));
+  }
+
+  public void appendToTable(StructLike partition, List records) throws 
IOException {
+appendToTable(writeFile(partition, records));
+  }
+
+  public DataFile writeFile(List records) throws IOException {
+Preconditions.checkNotNull(table, "table not set");
+File file = File.createTempFile(TEMP_FILE, null);
+Assertions.assertThat(file.delete()).isTrue();
+return appendToLocalFile(table, file, fileFormat, null, records, conf);
+  }
+
+  public DataFile writeFile(StructLike partition, List records) throws 
IOException {
+Preconditions.checkNotNull(table, "table not set");
+File file = File.createTempFile(TEMP_FILE, null);
+Assertions.assertThat(file.delete()).isTrue();
+return appendToLocalFile(table, file, fileFormat, partition, records, 
conf);
+  }
+
+  private static DataFile appendToLocalFile(
+  Table table,
+  File file,
+  FileFormat format,
+  StructLike partition,
+  List records,
+  Configuration conf)
+  throws IOException {
+GenericAppenderFactory appenderFactory = new 
GenericAppenderFactory(table.schema());
+
+// Push down ORC related settings to appender if there are any
+if (FileFormat.ORC.equals(format) && conf != null) {
+  appenderFactory.setAll(conf.getValByRegex(ORC_CONFIG_PREFIX));
+}
+
+if (FileFormat.PARQUET.equals(format) && conf != null) {
+  appenderFactory.setAll(conf.getValByRegex(PARQUET_CONFIG_PATTERN));
+}
+
+FileAppender appender = 
appenderFactory.newAppender(Files.localOutput(file), format);
+try (FileAppender fileAppender = appender) {
+  fileAppender.addAl

Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]

2023-12-08 Thread via GitHub


rodmeneses commented on code in PR #9185:
URL: https://github.com/apache/iceberg/pull/9185#discussion_r1420797428


##
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/TestHelpers.java:
##
@@ -193,109 +192,106 @@ private static void assertEquals(
   return;
 }
 
-Assert.assertTrue(
-"expected and actual should be both null or not null", expected != 
null && actual != null);
+assertThat(expected).isNotNull();
+assertThat(actual).isNotNull();
 

Review Comment:
   from the previous assertion I can read:
   ```"expected and actual should be both null or not null```
   however, this block
   ```java
   assertThat(expected).isNotNull();
   assertThat(actual).isNotNull();
   ```
   is only asserting that both of them are not null
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Fix null partitions in PartitionSet [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi closed pull request #9248: Core: Fix null partitions in PartitionSet
URL: https://github.com/apache/iceberg/pull/9248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Fix testReplacePartitionField for Rewrite Manifests [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi commented on PR #9250:
URL: https://github.com/apache/iceberg/pull/9250#issuecomment-1847552754

   Thanks, @bknbkn!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Fix testReplacePartitionField for Rewrite Manifests [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi merged PR #9250:
URL: https://github.com/apache/iceberg/pull/9250


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]

2023-12-08 Thread via GitHub


rodmeneses commented on code in PR #9185:
URL: https://github.com/apache/iceberg/pull/9185#discussion_r1420810926


##
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/TestHelpers.java:
##
@@ -523,89 +510,103 @@ public static void assertEquals(ManifestFile expected, 
ManifestFile actual) {
 if (expected == actual) {
   return;
 }
-Assert.assertTrue("Should not be null.", expected != null && actual != 
null);
-Assert.assertEquals("Path must match", expected.path(), actual.path());
-Assert.assertEquals("Length must match", expected.length(), 
actual.length());
-Assert.assertEquals("Spec id must match", expected.partitionSpecId(), 
actual.partitionSpecId());
-Assert.assertEquals("ManifestContent must match", expected.content(), 
actual.content());
-Assert.assertEquals(
-"SequenceNumber must match", expected.sequenceNumber(), 
actual.sequenceNumber());
-Assert.assertEquals(
-"MinSequenceNumber must match", expected.minSequenceNumber(), 
actual.minSequenceNumber());
-Assert.assertEquals("Snapshot id must match", expected.snapshotId(), 
actual.snapshotId());
-Assert.assertEquals(
-"Added files flag must match", expected.hasAddedFiles(), 
actual.hasAddedFiles());
-Assert.assertEquals(
-"Added files count must match", expected.addedFilesCount(), 
actual.addedFilesCount());
-Assert.assertEquals(
-"Added rows count must match", expected.addedRowsCount(), 
actual.addedRowsCount());
-Assert.assertEquals(
-"Existing files flag must match", expected.hasExistingFiles(), 
actual.hasExistingFiles());
-Assert.assertEquals(
-"Existing files count must match",
-expected.existingFilesCount(),
-actual.existingFilesCount());
-Assert.assertEquals(
-"Existing rows count must match", expected.existingRowsCount(), 
actual.existingRowsCount());
-Assert.assertEquals(
-"Deleted files flag must match", expected.hasDeletedFiles(), 
actual.hasDeletedFiles());
-Assert.assertEquals(
-"Deleted files count must match", expected.deletedFilesCount(), 
actual.deletedFilesCount());
-Assert.assertEquals(
-"Deleted rows count must match", expected.deletedRowsCount(), 
actual.deletedRowsCount());
+assertThat(expected).isNotNull();
+assertThat(actual).isNotNull();
+assertThat(actual.path()).as("Path must match").isEqualTo(expected.path());
+assertThat(actual.length()).as("Length must 
match").isEqualTo(expected.length());
+assertThat(actual.partitionSpecId())
+.as("Spec id must match")
+.isEqualTo(expected.partitionSpecId());
+assertThat(actual.content()).as("ManifestContent must 
match").isEqualTo(expected.content());
+assertThat(actual.sequenceNumber())
+.as("SequenceNumber must match")
+.isEqualTo(expected.sequenceNumber());
+assertThat(actual.minSequenceNumber())
+.as("MinSequenceNumber must match")
+.isEqualTo(expected.minSequenceNumber());
+assertThat(actual.snapshotId()).as("Snapshot id must 
match").isEqualTo(expected.snapshotId());
+assertThat(actual.hasAddedFiles())
+.as("Added files flag must match")
+.isEqualTo(expected.hasAddedFiles());
+assertThat(actual.addedFilesCount())
+.as("Added files count must match")
+.isEqualTo(expected.addedFilesCount());
+assertThat(actual.addedRowsCount())
+.as("Added rows count must match")
+.isEqualTo(expected.addedRowsCount());
+assertThat(actual.hasExistingFiles())
+.as("Existing files flag must match")
+.isEqualTo(expected.hasExistingFiles());
+assertThat(actual.existingFilesCount())
+.as("Existing files count must match")
+.isEqualTo(expected.existingFilesCount());
+assertThat(actual.existingRowsCount())
+.as("Existing rows count must match")
+.isEqualTo(expected.existingRowsCount());
+assertThat(actual.hasDeletedFiles())
+.as("Deleted files flag must match")
+.isEqualTo(expected.hasDeletedFiles());
+assertThat(actual.deletedFilesCount())
+.as("Deleted files count must match")
+.isEqualTo(expected.deletedFilesCount());
+assertThat(actual.deletedRowsCount())
+.as("Deleted rows count must match")
+.isEqualTo(expected.deletedRowsCount());
 
 List expectedSummaries = 
expected.partitions();
 List actualSummaries = 
actual.partitions();
-Assert.assertEquals(
-"PartitionFieldSummary size does not match",
-expectedSummaries.size(),
-actualSummaries.size());
+assertThat(actualSummaries)
+.as("PartitionFieldSummary size does not match")
+.hasSameSizeAs(expectedSummaries.size());

Review Comment:
   I dont think this is the right way to use `hasSameSizeAs` assertions.
   the parameter of 
   ```java
   .hasSameSizeAs
   ```
   shouldn't be a number, but instead another collection. 
   Please

Re: [PR] Spec: Clarify time travel implementation in Iceberg [iceberg]

2023-12-08 Thread via GitHub


emkornfield commented on PR #8982:
URL: https://github.com/apache/iceberg/pull/8982#issuecomment-1847572861

   @aokolnychyi did my changes address your feedback properly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spec: Clarify which columns can be used for equality delete files. [iceberg]

2023-12-08 Thread via GitHub


emkornfield commented on PR #8981:
URL: https://github.com/apache/iceberg/pull/8981#issuecomment-1847573238

   @Fokko or @rdblue would you mind taking a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420849913


##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -176,8 +178,10 @@ public void testCreateDropTableToCatalog() throws 
IOException {
 HadoopCatalog catalog = new CustomHadoopCatalog(conf, warehouseLocation);
 Table table = catalog.loadTable(identifier);
 
-Assert.assertEquals(SchemaParser.toJson(SCHEMA), 
SchemaParser.toJson(table.schema()));
-Assert.assertEquals(PartitionSpecParser.toJson(SPEC), 
PartitionSpecParser.toJson(table.spec()));
+Assertions.assertThat(SchemaParser.toJson(table.schema()))
+.isEqualTo(SchemaParser.toJson(SCHEMA));
+Assertions.assertThat(PartitionSpecParser.toJson(table.spec()))

Review Comment:
   Got it. Fixed this by doing static import in all changed files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420849452


##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -54,9 +53,9 @@ public class TestCatalogs {
 
   private Configuration conf;
 
-  @Rule public TemporaryFolder temp = new TemporaryFolder();
+  @TempDir public Path temp;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420851007


##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -212,11 +216,11 @@ public void testLoadCatalogHive() {
 InputFormatConfig.catalogPropertyConfigKey(catalogName, 
CatalogUtil.ICEBERG_CATALOG_TYPE),
 CatalogUtil.ICEBERG_CATALOG_TYPE_HIVE);
 Optional hiveCatalog = Catalogs.loadCatalog(conf, catalogName);
-Assert.assertTrue(hiveCatalog.isPresent());
+Assertions.assertThat(hiveCatalog.isPresent()).isTrue();

Review Comment:
   done



##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -230,13 +234,13 @@ public void testLoadCatalogHadoop() {
 catalogName, CatalogProperties.WAREHOUSE_LOCATION),
 "/tmp/mylocation");
 Optional hadoopCatalog = Catalogs.loadCatalog(conf, catalogName);
-Assert.assertTrue(hadoopCatalog.isPresent());
+Assertions.assertThat(hadoopCatalog.isPresent()).isTrue();

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420851226


##
mr/src/test/java/org/apache/iceberg/mr/hive/TestDeserializer.java:
##
@@ -35,9 +35,9 @@
 import org.apache.iceberg.hive.HiveVersion;
 import org.apache.iceberg.mr.hive.serde.objectinspector.IcebergObjectInspector;
 import org.apache.iceberg.types.Types;
-import org.junit.Assert;
-import org.junit.Assume;
-import org.junit.Test;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.Assumptions;

Review Comment:
   Okay. done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420851466


##
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergFilterFactory.java:
##
@@ -82,10 +80,12 @@ public void testNotEqualsOperand() {
 UnboundPredicate childExpressionActual = (UnboundPredicate) actual.child();
 UnboundPredicate childExpressionExpected = Expressions.equal("salary", 
3000L);
 
-assertEquals(actual.op(), expected.op());
-assertEquals(actual.child().op(), expected.child().op());
-assertEquals(childExpressionActual.ref().name(), 
childExpressionExpected.ref().name());
-assertEquals(childExpressionActual.literal(), 
childExpressionExpected.literal());
+Assertions.assertThat(expected.op()).isEqualTo(actual.op());

Review Comment:
   Yes Got it.



##
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergOutputCommitter.java:
##
@@ -80,7 +79,7 @@ public class TestHiveIcebergOutputCommitter {
   private static final PartitionSpec PARTITIONED_SPEC =
   PartitionSpec.builderFor(CUSTOMER_SCHEMA).bucket("customer_id", 
3).build();
 
-  @Rule public TemporaryFolder temp = new TemporaryFolder();
+  @TempDir public Path temp;

Review Comment:
   Got it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420851699


##
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergOutputCommitter.java:
##
@@ -80,7 +79,7 @@ public class TestHiveIcebergOutputCommitter {
   private static final PartitionSpec PARTITIONED_SPEC =
   PartitionSpec.builderFor(CUSTOMER_SCHEMA).bucket("customer_id", 
3).build();
 
-  @Rule public TemporaryFolder temp = new TemporaryFolder();
+  @TempDir public Path temp;

Review Comment:
   fixed this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420850601


##
mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java:
##
@@ -198,11 +202,11 @@ public void testCreateDropTableToCatalog() throws 
IOException {
   public void testLoadCatalogDefault() {
 String catalogName = "barCatalog";
 Optional defaultCatalog = Catalogs.loadCatalog(conf, catalogName);
-Assert.assertTrue(defaultCatalog.isPresent());
+Assertions.assertThat(defaultCatalog.isPresent()).isTrue();

Review Comment:
   Fixed this in latest commit 
https://github.com/apache/iceberg/pull/9241/commits/10a0099520665385bdb837372f27f17f1a41c50f



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420852196


##
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergOutputCommitter.java:
##
@@ -201,22 +203,22 @@ public void writerIsClosedAfterTaskCommitFailure() throws 
IOException {
 .when(failingCommitter)
 .commitTask(argumentCaptor.capture());
 
-Table table = table(temp.getRoot().getPath(), false);
+Table table = table(temp.toFile().getPath(), false);
 JobConf conf = jobConf(table, 1);
 
 Assertions.assertThatThrownBy(
 () -> writeRecords(table.name(), 1, 0, true, false, conf, 
failingCommitter))
 .isInstanceOf(RuntimeException.class)
 .hasMessage(exceptionMessage);
 
-Assert.assertEquals(1, argumentCaptor.getAllValues().size());
+Assertions.assertThat(argumentCaptor.getAllValues().size()).isEqualTo(1);

Review Comment:
   Fixed this in 
https://github.com/apache/iceberg/pull/9241/commits/10a0099520665385bdb837372f27f17f1a41c50f



##
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergSerDe.java:
##
@@ -34,22 +35,21 @@
 import org.apache.iceberg.mr.hive.serde.objectinspector.IcebergObjectInspector;
 import org.apache.iceberg.mr.mapred.Container;
 import org.apache.iceberg.types.Types;
-import org.junit.Assert;
-import org.junit.Rule;
-import org.junit.Test;
-import org.junit.rules.TemporaryFolder;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
 
 public class TestHiveIcebergSerDe {
 
   private static final Schema schema =
   new Schema(required(1, "string_field", Types.StringType.get()));
 
-  @Rule public TemporaryFolder tmp = new TemporaryFolder();
+  @TempDir public Path tmp;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add support for CreateScan and GetScanTasks in RESTCatalog [iceberg]

2023-12-08 Thread via GitHub


rahil-c opened a new pull request, #9252:
URL: https://github.com/apache/iceberg/pull/9252

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420858188


##
mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -63,7 +63,7 @@
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.types.Types;
 import org.apache.iceberg.util.ByteBuffers;
-import org.junit.Assert;
+import org.assertj.core.api.Assertions;

Review Comment:
   done



##
mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -218,13 +218,12 @@ public static void assertEquals(Record expected, Record 
actual) {
 for (int i = 0; i < expected.size(); ++i) {
   if (expected.get(i) instanceof OffsetDateTime) {
 // For OffsetDateTime we just compare the actual instant
-Assert.assertEquals(
-((OffsetDateTime) expected.get(i)).toInstant(),
-((OffsetDateTime) actual.get(i)).toInstant());
+Assertions.assertThat(((OffsetDateTime) actual.get(i)).toInstant())
+.isEqualTo(((OffsetDateTime) expected.get(i)).toInstant());
   } else if (expected.get(i) instanceof byte[]) {
-Assert.assertArrayEquals((byte[]) expected.get(i), (byte[]) 
actual.get(i));
+Assertions.assertThat((byte[]) actual.get(i)).isEqualTo((byte[]) 
expected.get(i));

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420860637


##
mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -288,9 +287,10 @@ public static void validateFiles(Table table, 
Configuration conf, JobID jobId, i
 .filter(path -> !path.getFileName().toString().startsWith("."))
 .collect(Collectors.toList());
 
-Assert.assertEquals(dataFileNum, dataFiles.size());
-Assert.assertFalse(
-new 
File(HiveIcebergOutputCommitter.generateJobLocation(table.location(), conf, 
jobId))
-.exists());
+Assertions.assertThat(dataFiles.size()).isEqualTo(dataFileNum);

Review Comment:
   done



##
mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -288,9 +287,10 @@ public static void validateFiles(Table table, 
Configuration conf, JobID jobId, i
 .filter(path -> !path.getFileName().toString().startsWith("."))
 .collect(Collectors.toList());
 
-Assert.assertEquals(dataFileNum, dataFiles.size());
-Assert.assertFalse(
-new 
File(HiveIcebergOutputCommitter.generateJobLocation(table.location(), conf, 
jobId))
-.exists());
+Assertions.assertThat(dataFiles.size()).isEqualTo(dataFileNum);
+Assertions.assertThat(

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Switch to junit5 for mr [iceberg]

2023-12-08 Thread via GitHub


lschetanrao commented on code in PR #9241:
URL: https://github.com/apache/iceberg/pull/9241#discussion_r1420862455


##
mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergTestUtils.java:
##
@@ -265,7 +264,7 @@ public static void validateData(List expected, 
List actual, int
 sortedExpected.sort(Comparator.comparingLong(record -> (Long) 
record.get(sortBy)));
 sortedActual.sort(Comparator.comparingLong(record -> (Long) 
record.get(sortBy)));
 
-Assert.assertEquals(sortedExpected.size(), sortedActual.size());
+
Assertions.assertThat(sortedActual.size()).isEqualTo(sortedExpected.size());

Review Comment:
   Yes. did it for all places where size was compared



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add doc for rewriting manifest with spec id [iceberg]

2023-12-08 Thread via GitHub


puchengy opened a new pull request, #9253:
URL: https://github.com/apache/iceberg/pull/9253

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add doc for rewriting manifest with spec id [iceberg]

2023-12-08 Thread via GitHub


puchengy commented on PR #9253:
URL: https://github.com/apache/iceberg/pull/9253#issuecomment-1847647392

   @RussellSpitzer Do you know how to format the doc and how to test the doc 
locally? I can't find the doc, Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Fix IcebergSource tableloader lifecycle management in batch mode [iceberg]

2023-12-08 Thread via GitHub


stevenzwu commented on code in PR #9173:
URL: https://github.com/apache/iceberg/pull/9173#discussion_r1420907269


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##
@@ -105,12 +107,12 @@ public class IcebergSource implements Source

Re: [PR] Flink: Fix IcebergSource tableloader lifecycle management in batch mode [iceberg]

2023-12-08 Thread via GitHub


stevenzwu commented on code in PR #9173:
URL: https://github.com/apache/iceberg/pull/9173#discussion_r1420908320


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##
@@ -105,12 +107,12 @@ public class IcebergSource implements Source

Re: [PR] Add support for CreateScan and GetScanTasks in RESTCatalog [iceberg]

2023-12-08 Thread via GitHub


jackye1995 commented on code in PR #9252:
URL: https://github.com/apache/iceberg/pull/9252#discussion_r1420933770


##
open-api/rest-catalog-open-api.yaml:
##
@@ -530,6 +530,111 @@ paths:
 5XX:
   $ref: '#/components/responses/ServerErrorResponse'
 
+  /v1/{prefix}/namespaces/{namespace}/tables/{table}/scans:
+parameters:
+  - $ref: '#/components/parameters/prefix'
+  - $ref: '#/components/parameters/namespace'
+  - $ref: '#/components/parameters/table'
+post:
+  tags:
+- Catalog API
+  summary: Create a table scan for the table
+  description:
+Creates a table scan for the table.
+  operationId: createScan
+  requestBody:
+content:
+  application/json:
+schema:
+  $ref: '#/components/schemas/CreateScanRequest'
+  responses:
+200:
+  $ref: '#/components/responses/CreateScanResponse'
+400:
+  $ref: '#/components/responses/BadRequestErrorResponse'
+401:
+  $ref: '#/components/responses/UnauthorizedResponse'
+403:
+  $ref: '#/components/responses/ForbiddenResponse'
+404:
+  description:
+Not Found
+- NoSuchTableException, the table does not exist
+- NoSuchNamespaceException, the namespace does not exist
+  content:
+application/json:
+  schema:
+$ref: '#/components/schemas/ErrorModel'
+  examples:
+TableDoesNotExist:
+  $ref: '#/components/examples/NoSuchTableError'
+NamespaceDoesNotExist:
+  $ref: '#/components/examples/NoSuchNamespaceError'
+419:
+  $ref: '#/components/responses/AuthenticationTimeoutResponse'
+503:
+  $ref: '#/components/responses/ServiceUnavailableResponse'
+5XX:
+  $ref: '#/components/responses/ServerErrorResponse'
+
+
+
+  /v1/{prefix}/namespaces/{namespace}/tables/{table}/scans/{scan}:
+parameters:
+  - $ref: '#/components/parameters/prefix'
+  - $ref: '#/components/parameters/namespace'
+  - $ref: '#/components/parameters/table'
+  - $ref: '#/components/parameters/scan'
+get:
+  tags:
+- Catalog API
+  summary: Gets a list of FileScanTasks
+  operationId: getScanTasks
+  description:
+Gets a list of FileScanTasks
+  parameters:
+- in: query
+  name: shard
+  description:
+Reads a single or multiple shards in parallel, to obtain a list of 
file scan tasks.

Review Comment:
   the description seems to be wrong, it's always a single shard in a single 
request



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add support for CreateScan and GetScanTasks in RESTCatalog [iceberg]

2023-12-08 Thread via GitHub


jackye1995 commented on code in PR #9252:
URL: https://github.com/apache/iceberg/pull/9252#discussion_r1420934425


##
open-api/rest-catalog-open-api.yaml:
##
@@ -2615,6 +2888,23 @@ components:
   additionalProperties:
 type: string
 
+CreateScanRequest:
+  type: object
+  required:
+- select
+- options
+  properties:
+select:
+  type: array
+  items:
+type: string
+filter:
+  type: string

Review Comment:
   I think this should be of type Expression



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add support for CreateScan and GetScanTasks in RESTCatalog [iceberg]

2023-12-08 Thread via GitHub


jackye1995 commented on code in PR #9252:
URL: https://github.com/apache/iceberg/pull/9252#discussion_r1420934962


##
open-api/rest-catalog-open-api.yaml:
##
@@ -2615,6 +2888,23 @@ components:
   additionalProperties:
 type: string
 
+CreateScanRequest:
+  type: object
+  required:
+- select
+- options
+  properties:

Review Comment:
   This seems to be missing a few important ones, like snapshot ID, timestamp, 
etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Fix null partitions in PartitionSet [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi merged PR #9248:
URL: https://github.com/apache/iceberg/pull/9248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Fix null partitions in PartitionSet [iceberg]

2023-12-08 Thread via GitHub


aokolnychyi commented on PR #9248:
URL: https://github.com/apache/iceberg/pull/9248#issuecomment-1847830279

   Thanks, @RussellSpitzer!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: switch to use SortKey for data statistics [iceberg]

2023-12-08 Thread via GitHub


stevenzwu merged PR #9212:
URL: https://github.com/apache/iceberg/pull/9212


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: switch to use SortKey for data statistics [iceberg]

2023-12-08 Thread via GitHub


stevenzwu commented on PR #9212:
URL: https://github.com/apache/iceberg/pull/9212#issuecomment-1847835239

   thanks @pvary for the review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] `GlueTableOperations` retries on Access Denied exceptions from S3, and does not support configuration of exception retry logic [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #9124:
URL: https://github.com/apache/iceberg/issues/9124#issuecomment-1847839526

   @singhpk234 
   @lognoel 
   @dacort 
   @electrum 
   @martint 
   Hello.
   
   I am using Hive Catalog to create Iceberg tables with Spark as the execution 
engine:
   
   import pyspark
   from pyspark.sql import SparkSession
   import os
   #DEFINE SENSITIVE VARIABLES
   HIVE_URI = os.environ.get("HIVE_URI","thrift://hive-metastore:9083")
   WAREHOUSE = os.environ.get("WAREHOUSE", "s3a://warehouse/")
   AWS_ACCESS_KEY = os.environ.get("AWS_ACCESS_KEY", "OQm04sIzCakGYugOqBOV")
   AWS_SECRET_KEY = os.environ.get("AWS_SECRET_KEY", 
"PPlNPddfFWnUdxWdKq5BKoNfkjuRz8fjCQLi4b4I")
   AWS_S3_ENDPOINT = os.environ.get("AWS_S3_ENDPOINT", 
"http://minioserver:9000";)
   
   print(AWS_S3_ENDPOINT)
   print(HIVE_URI)
   print(WAREHOUSE)
   conf = (
   pyspark.SparkConf()
  .setAppName('app_name')
   #packages
  .set('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
   #SQL Extensions
  .set('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
   #Configuring Catalog
  .set('spark.sql.catalog.catalog_hive', 
'org.apache.iceberg.spark.SparkCatalog')
  .set('spark.sql.catalog.catalog_hive.type', 'hive')
  .set('spark.sql.catalog.catalog_hive.uri', HIVE_URI)
  .set('spark.sql.catalog.catalog_hive.warehouse.dir', WAREHOUSE)
  .set('spark.sql.catalog.catalog_hive.endpoint', AWS_S3_ENDPOINT)
  .set('spark.sql.catalog.catalog_hive.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO')
  .set('spark.hadoop.fs.s3a.access.key', AWS_ACCESS_KEY)
  .set('spark.hadoop.fs.s3a.secret.key', AWS_SECRET_KEY)
  .set("spark.hadoop.fs.s3a.path.style.access", "true")
   )
   
   #Start Spark Session
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   print("Spark Running")
   
   #Create a Table
   spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) USING 
iceberg;").show()
   
   #Insert Some Data
   spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('Alex Merced'), 
('Dipankar Mazumdar'), ('Jason Hughes')").show()
   
   #Query the Data
   spark.sql("SELECT * FROM catalog_hive.default.my_table;").show()
   
   When I try to run createTable command it gives me an exception:
   SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   SLF4J: Defaulting to no-operation (NOP) logger implementation
   SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
   
   ---
   Py4JJavaError Traceback (most recent call last)
   Cell In[4], line 38
35 print("Spark Running")
37 #Create a Table
   ---> 38 spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) 
USING iceberg;").show()
40 #Insert Some Data
41 spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('Alex 
Merced'), ('Dipankar Mazumdar'), ('Jason Hughes')").show()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/session.py:1034, in 
SparkSession.sql(self, sqlQuery, **kwargs)
  1032 sqlQuery = formatter.format(sqlQuery, **kwargs)
  1033 try:
   -> 1034 return DataFrame(self._jsparkSession.sql(sqlQuery), self)
  1035 finally:
  1036 if len(kwargs) > 0:
   
   File ~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in 
JavaMember.__call__(self, *args)
  1315 command = proto.CALL_COMMAND_NAME +\
  1316 self.command_header +\
  1317 args_command +\
  1318 proto.END_COMMAND_PART
  1320 answer = self.gateway_client.send_command(command)
   -> 1321 return_value = get_return_value(
  1322 answer, self.gateway_client, self.target_id, self.name)
  1324 for temp_arg in temp_args:
  1325 temp_arg._detach()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/utils.py:190, in 
capture_sql_exception..deco(*a, **kw)
   188 def deco(*a: Any, **kw: Any) -> Any:
   189 try:
   --> 190 return f(*a, **kw)
   191 except Py4JJavaError as e:
   192 converted = convert_exception(e.java_exception)
   
   File ~/.local/lib/python3.10/site-packages/py4j/protocol.py:326, in 
get_return_value(answer, gateway_client, target_id, name)
   324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
   325 if answer[1] == REFERENCE_TYPE:
   --> 326 raise Py4JJavaError(
   327 "An error occurred while calling {0}{1}{2}.\n".
   328 format(target_id, ".", name), value)
   329 else:
   330 raise Py4JError(
   331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
   332 format(target_id, ".", name

Re: [I] `GlueTableOperations` retries on Access Denied exceptions from S3, and does not support configuration of exception retry logic [iceberg]

2023-12-08 Thread via GitHub


lognoel commented on issue #9124:
URL: https://github.com/apache/iceberg/issues/9124#issuecomment-1847845419

   > =
   
   you're leaking your AWS creds in the comment, I suggest you redact them 
immediately


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] `GlueTableOperations` retries on Access Denied exceptions from S3, and does not support configuration of exception retry logic [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #9124:
URL: https://github.com/apache/iceberg/issues/9124#issuecomment-1847855484

   @lognoel 
   I deleted them thank you very much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Unable to write to iceberg table using spark [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #8419:
URL: https://github.com/apache/iceberg/issues/8419#issuecomment-1847856969

   @palanik1 
   @di2mot 
   @maulanaady 
   @RussellSpitzer 
   @dacort 
   Hello.
   
   I am using Hive Catalog to create Iceberg tables with Spark as the execution 
engine:
   
   import pyspark
   from pyspark.sql import SparkSession
   import os
   #DEFINE SENSITIVE VARIABLES
   HIVE_URI = os.environ.get("HIVE_URI","thrift://hive-metastore:9083")
   WAREHOUSE = os.environ.get("WAREHOUSE", "s3a://warehouse/")
   AWS_ACCESS_KEY = os.environ.get("AWS_ACCESS_KEY", "")
   AWS_SECRET_KEY = os.environ.get("AWS_SECRET_KEY", 
"")
   AWS_S3_ENDPOINT = os.environ.get("AWS_S3_ENDPOINT", 
"http://minioserver:9000/";)
   
   print(AWS_S3_ENDPOINT)
   print(HIVE_URI)
   print(WAREHOUSE)
   conf = (
   pyspark.SparkConf()
   .setAppName('app_name')
   #packages
   .set('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
   #SQL Extensions
   .set('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
   #Configuring Catalog
   .set('spark.sql.catalog.catalog_hive', 
'org.apache.iceberg.spark.SparkCatalog')
   .set('spark.sql.catalog.catalog_hive.type', 'hive')
   .set('spark.sql.catalog.catalog_hive.uri', HIVE_URI)
   .set('spark.sql.catalog.catalog_hive.warehouse.dir', WAREHOUSE)
   .set('spark.sql.catalog.catalog_hive.endpoint', AWS_S3_ENDPOINT)
   .set('spark.sql.catalog.catalog_hive.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO')
   .set('spark.hadoop.fs.s3a.access.key', AWS_ACCESS_KEY)
   .set('spark.hadoop.fs.s3a.secret.key', AWS_SECRET_KEY)
   .set("spark.hadoop.fs.s3a.path.style.access", "true")
   )
   
   #Start Spark Session
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   print("Spark Running")
   
   #Create a Table
   spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) USING 
iceberg;").show()
   
   #Insert Some Data
   spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('ns'), ('nd'), 
('Ja')").show()
   
   #Query the Data
   spark.sql("SELECT * FROM catalog_hive.default.my_table;").show()
   
   When I try to run createTable command it gives me an exception:
   SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   SLF4J: Defaulting to no-operation (NOP) logger implementation
   SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
   
   Py4JJavaError Traceback (most recent call last)
   Cell In[4], line 38
   35 print("Spark Running")
   37 #Create a Table
   ---> 38 spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) 
USING iceberg;").show()
   40 #Insert Some Data
   41 spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('Alex 
Merced'), ('Dipankar Mazumdar'), ('Jason Hughes')").show()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/session.py:1034, in 
SparkSession.sql(self, sqlQuery, **kwargs)
   1032 sqlQuery = formatter.format(sqlQuery, **kwargs)
   1033 try:
   -> 1034 return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   1035 finally:
   1036 if len(kwargs) > 0:
   
   File ~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in 
JavaMember.call(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +
   1316 self.command_header +
   1317 args_command +
   1318 proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
   -> 1321 return_value = get_return_value(
   1322 answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325 temp_arg._detach()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/utils.py:190, in 
capture_sql_exception..deco(*a, **kw)
   188 def deco(*a: Any, **kw: Any) -> Any:
   189 try:
   --> 190 return f(*a, **kw)
   191 except Py4JJavaError as e:
   192 converted = convert_exception(e.java_exception)
   
   File ~/.local/lib/python3.10/site-packages/py4j/protocol.py:326, in 
get_return_value(answer, gateway_client, target_id, name)
   324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
   325 if answer[1] == REFERENCE_TYPE:
   --> 326 raise Py4JJavaError(
   327 "An error occurred while calling {0}{1}{2}.\n".
   328 format(target_id, ".", name), value)
   329 else:
   330 raise Py4JError(
   331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
   332 format(target_id, ".", name, value))
   
   Py4JJavaError: An error occurred while calling o49.sql.
   : software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, 
Status Code: 400, Request ID: 2MBCRA6QRAF6SMBQ, Extended Request ID: 
s41ibIYx6fFDoMXiRK+8TRNkUT/GsiwqEzR5X2Drq9cY213HQkX19/PxSacQwo+SPX8eAqTNy7k=)
   at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseH

Re: [I] How to realize Write Iceberg Tables via Hive? (Ideas share) [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-1847860494

   @aimenglin 
   @bitsondatadev 
   @marton-bod 
   @dacort 
   @electrum 
   Hello.
   
   I am using Hive Catalog to create Iceberg tables with Spark as the execution 
engine:
   
   import pyspark
   from pyspark.sql import SparkSession
   import os
   #DEFINE SENSITIVE VARIABLES
   HIVE_URI = os.environ.get("HIVE_URI","thrift://hive-metastore:9083")
   WAREHOUSE = os.environ.get("WAREHOUSE", "s3a://warehouse/")
   AWS_ACCESS_KEY = os.environ.get("AWS_ACCESS_KEY", "")
   AWS_SECRET_KEY = os.environ.get("AWS_SECRET_KEY", 
"")
   AWS_S3_ENDPOINT = os.environ.get("AWS_S3_ENDPOINT", 
"http://minioserver:9000/";)
   
   print(AWS_S3_ENDPOINT)
   print(HIVE_URI)
   print(WAREHOUSE)
   conf = (
   pyspark.SparkConf()
   .setAppName('app_name')
   #packages
   .set('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
   #SQL Extensions
   .set('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
   #Configuring Catalog
   .set('spark.sql.catalog.catalog_hive', 
'org.apache.iceberg.spark.SparkCatalog')
   .set('spark.sql.catalog.catalog_hive.type', 'hive')
   .set('spark.sql.catalog.catalog_hive.uri', HIVE_URI)
   .set('spark.sql.catalog.catalog_hive.warehouse.dir', WAREHOUSE)
   .set('spark.sql.catalog.catalog_hive.endpoint', AWS_S3_ENDPOINT)
   .set('spark.sql.catalog.catalog_hive.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO')
   .set('spark.hadoop.fs.s3a.access.key', AWS_ACCESS_KEY)
   .set('spark.hadoop.fs.s3a.secret.key', AWS_SECRET_KEY)
   .set("spark.hadoop.fs.s3a.path.style.access", "true")
   )
   
   #Start Spark Session
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   print("Spark Running")
   
   #Create a Table
   spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) USING 
iceberg;").show()
   
   #Insert Some Data
   spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('ns'), ('nd'), 
('Ja')").show()
   
   #Query the Data
   spark.sql("SELECT * FROM catalog_hive.default.my_table;").show()
   
   When I try to run createTable command it gives me an exception:
   SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   SLF4J: Defaulting to no-operation (NOP) logger implementation
   SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
   
   Py4JJavaError Traceback (most recent call last)
   Cell In[4], line 38
   35 print("Spark Running")
   37 #Create a Table
   ---> 38 spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) 
USING iceberg;").show()
   40 #Insert Some Data
   41 spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('Alex 
Merced'), ('Dipankar Mazumdar'), ('Jason Hughes')").show()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/session.py:1034, in 
SparkSession.sql(self, sqlQuery, **kwargs)
   1032 sqlQuery = formatter.format(sqlQuery, **kwargs)
   1033 try:
   -> 1034 return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   1035 finally:
   1036 if len(kwargs) > 0:
   
   File ~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in 
JavaMember.call(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +
   1316 self.command_header +
   1317 args_command +
   1318 proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
   -> 1321 return_value = get_return_value(
   1322 answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325 temp_arg._detach()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/utils.py:190, in 
capture_sql_exception..deco(*a, **kw)
   188 def deco(*a: Any, **kw: Any) -> Any:
   189 try:
   --> 190 return f(*a, **kw)
   191 except Py4JJavaError as e:
   192 converted = convert_exception(e.java_exception)
   
   File ~/.local/lib/python3.10/site-packages/py4j/protocol.py:326, in 
get_return_value(answer, gateway_client, target_id, name)
   324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
   325 if answer[1] == REFERENCE_TYPE:
   --> 326 raise Py4JJavaError(
   327 "An error occurred while calling {0}{1}{2}.\n".
   328 format(target_id, ".", name), value)
   329 else:
   330 raise Py4JError(
   331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
   332 format(target_id, ".", name, value))
   
   Py4JJavaError: An error occurred while calling o49.sql.
   : software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, 
Status Code: 400, Request ID: 2MBCRA6QRAF6SMBQ, Extended Request ID: 
s41ibIYx6fFDoMXiRK+8TRNkUT/GsiwqEzR5X2Drq9cY213HQkX19/PxSacQwo+SPX8eAqTNy7k=)
   at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedRespons

Re: [I] Running iceberg with spark 3 in local mode [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #2176:
URL: https://github.com/apache/iceberg/issues/2176#issuecomment-1847862051

   @adnanhb 
   @jackye1995 
   Hello.
   
   I am using Hive Catalog to create Iceberg tables with Spark as the execution 
engine:
   
   import pyspark
   from pyspark.sql import SparkSession
   import os
   #DEFINE SENSITIVE VARIABLES
   HIVE_URI = os.environ.get("HIVE_URI","thrift://hive-metastore:9083")
   WAREHOUSE = os.environ.get("WAREHOUSE", "s3a://warehouse/")
   AWS_ACCESS_KEY = os.environ.get("AWS_ACCESS_KEY", "")
   AWS_SECRET_KEY = os.environ.get("AWS_SECRET_KEY", 
"")
   AWS_S3_ENDPOINT = os.environ.get("AWS_S3_ENDPOINT", 
"http://minioserver:9000/";)
   
   print(AWS_S3_ENDPOINT)
   print(HIVE_URI)
   print(WAREHOUSE)
   conf = (
   pyspark.SparkConf()
   .setAppName('app_name')
   #packages
   .set('spark.jars.packages', 
'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
   #SQL Extensions
   .set('spark.sql.extensions', 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
   #Configuring Catalog
   .set('spark.sql.catalog.catalog_hive', 
'org.apache.iceberg.spark.SparkCatalog')
   .set('spark.sql.catalog.catalog_hive.type', 'hive')
   .set('spark.sql.catalog.catalog_hive.uri', HIVE_URI)
   .set('spark.sql.catalog.catalog_hive.warehouse.dir', WAREHOUSE)
   .set('spark.sql.catalog.catalog_hive.endpoint', AWS_S3_ENDPOINT)
   .set('spark.sql.catalog.catalog_hive.io-impl', 
'org.apache.iceberg.aws.s3.S3FileIO')
   .set('spark.hadoop.fs.s3a.access.key', AWS_ACCESS_KEY)
   .set('spark.hadoop.fs.s3a.secret.key', AWS_SECRET_KEY)
   .set("spark.hadoop.fs.s3a.path.style.access", "true")
   )
   
   #Start Spark Session
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   print("Spark Running")
   
   #Create a Table
   spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) USING 
iceberg;").show()
   
   #Insert Some Data
   spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('ns'), ('nd'), 
('Ja')").show()
   
   #Query the Data
   spark.sql("SELECT * FROM catalog_hive.default.my_table;").show()
   
   When I try to run createTable command it gives me an exception:
   SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   SLF4J: Defaulting to no-operation (NOP) logger implementation
   SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
   
   Py4JJavaError Traceback (most recent call last)
   Cell In[4], line 38
   35 print("Spark Running")
   37 #Create a Table
   ---> 38 spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) 
USING iceberg;").show()
   40 #Insert Some Data
   41 spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('Alex 
Merced'), ('Dipankar Mazumdar'), ('Jason Hughes')").show()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/session.py:1034, in 
SparkSession.sql(self, sqlQuery, **kwargs)
   1032 sqlQuery = formatter.format(sqlQuery, **kwargs)
   1033 try:
   -> 1034 return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   1035 finally:
   1036 if len(kwargs) > 0:
   
   File ~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in 
JavaMember.call(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +
   1316 self.command_header +
   1317 args_command +
   1318 proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
   -> 1321 return_value = get_return_value(
   1322 answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325 temp_arg._detach()
   
   File ~/.local/lib/python3.10/site-packages/pyspark/sql/utils.py:190, in 
capture_sql_exception..deco(*a, **kw)
   188 def deco(*a: Any, **kw: Any) -> Any:
   189 try:
   --> 190 return f(*a, **kw)
   191 except Py4JJavaError as e:
   192 converted = convert_exception(e.java_exception)
   
   File ~/.local/lib/python3.10/site-packages/py4j/protocol.py:326, in 
get_return_value(answer, gateway_client, target_id, name)
   324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
   325 if answer[1] == REFERENCE_TYPE:
   --> 326 raise Py4JJavaError(
   327 "An error occurred while calling {0}{1}{2}.\n".
   328 format(target_id, ".", name), value)
   329 else:
   330 raise Py4JError(
   331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
   332 format(target_id, ".", name, value))
   
   Py4JJavaError: An error occurred while calling o49.sql.
   : software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, 
Status Code: 400, Request ID: 2MBCRA6QRAF6SMBQ, Extended Request ID: 
s41ibIYx6fFDoMXiRK+8TRNkUT/GsiwqEzR5X2Drq9cY213HQkX19/PxSacQwo+SPX8eAqTNy7k=)
   at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
   at 
software.amazon.awssd

Re: [I] How to realize Write Iceberg Tables via Hive? (Ideas share) [iceberg]

2023-12-08 Thread via GitHub


aimenglin commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-1847892200

   Hi Rym,
   
   I understand you're inquiring about writing Iceberg tables using Hive.
   Based on my experience and experiments, it appears that directly writing to
   Iceberg tables through Hive is not supported. However, a viable workaround
   is to create and populate your Iceberg tables using Apache Spark. Once
   these tables are created and data is inserted via Spark, you can seamlessly
   read them using Hive.
   
   Best,
   Menglin
   
   On Fri, Dec 8, 2023 at 1:24 PM Rym ***@***.***> wrote:
   
   > @aimenglin 
   > @bitsondatadev 
   > @marton-bod 
   > @dacort 
   > @electrum 
   > Hello.
   >
   > I am using Hive Catalog to create Iceberg tables with Spark as the
   > execution engine:
   >
   > import pyspark
   > from pyspark.sql import SparkSession
   > import os
   > #DEFINE SENSITIVE VARIABLES
   > HIVE_URI = os.environ.get("HIVE_URI","thrift://hive-metastore:9083")
   > WAREHOUSE = os.environ.get("WAREHOUSE", "s3a://warehouse/")
   > AWS_ACCESS_KEY = os.environ.get("AWS_ACCESS_KEY", "")
   > AWS_SECRET_KEY = os.environ.get("AWS_SECRET_KEY",
   > "")
   > AWS_S3_ENDPOINT = os.environ.get("AWS_S3_ENDPOINT", "
   > http://minioserver:9000/";)
   >
   > print(AWS_S3_ENDPOINT)
   > print(HIVE_URI)
   > print(WAREHOUSE)
   > conf = (
   > pyspark.SparkConf()
   > .setAppName('app_name')
   > #packages
   > .set('spark.jars.packages',
   > 
'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.0.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
   > #SQL Extensions
   > .set('spark.sql.extensions',
   > 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
   > #Configuring Catalog
   > .set('spark.sql.catalog.catalog_hive',
   > 'org.apache.iceberg.spark.SparkCatalog')
   > .set('spark.sql.catalog.catalog_hive.type', 'hive')
   > .set('spark.sql.catalog.catalog_hive.uri', HIVE_URI)
   > .set('spark.sql.catalog.catalog_hive.warehouse.dir', WAREHOUSE)
   > .set('spark.sql.catalog.catalog_hive.endpoint', AWS_S3_ENDPOINT)
   > .set('spark.sql.catalog.catalog_hive.io-impl',
   > 'org.apache.iceberg.aws.s3.S3FileIO')
   > .set('spark.hadoop.fs.s3a.access.key', AWS_ACCESS_KEY)
   > .set('spark.hadoop.fs.s3a.secret.key', AWS_SECRET_KEY)
   > .set("spark.hadoop.fs.s3a.path.style.access", "true")
   > )
   >
   > #Start Spark Session
   > spark = SparkSession.builder.config(conf=conf).getOrCreate()
   > print("Spark Running")
   >
   > #Create a Table
   > spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name STRING) USING
   > iceberg;").show()
   >
   > #Insert Some Data
   > spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('ns'),
   > ('nd'), ('Ja')").show()
   >
   > #Query the Data
   > spark.sql("SELECT * FROM catalog_hive.default.my_table;").show()
   >
   > When I try to run createTable command it gives me an exception:
   > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   > SLF4J: Defaulting to no-operation (NOP) logger implementation
   > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
   > details.
   >
   > Py4JJavaError Traceback (most recent call last)
   > Cell In[4], line 38
   > 35 print("Spark Running")
   > 37 #Create a Table
   > ---> 38 spark.sql("CREATE TABLE catalog_hive.default.tmy_table (name
   > STRING) USING iceberg;").show()
   > 40 #Insert Some Data
   > 41 spark.sql("INSERT INTO catalog_hive.default.my_table VALUES ('Alex
   > Merced'), ('Dipankar Mazumdar'), ('Jason Hughes')").show()
   >
   > File ~/.local/lib/python3.10/site-packages/pyspark/sql/session.py:1034, in
   > SparkSession.sql(self, sqlQuery, **kwargs)
   > 1032 sqlQuery = formatter.format(sqlQuery, **kwargs)
   > 1033 try:
   > -> 1034 return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   > 1035 finally:
   > 1036 if len(kwargs) > 0:
   >
   > File ~/.local/lib/python3.10/site-packages/py4j/java_gateway.py:1321, in
   > JavaMember.call(self, *args)
   > 1315 command = proto.CALL_COMMAND_NAME +
   > 1316 self.command_header +
   > 1317 args_command +
   > 1318 proto.END_COMMAND_PART
   > 1320 answer = self.gateway_client.send_command(command)
   > -> 1321 return_value = get_return_value(
   > 1322 answer, self.gateway_client, self.target_id, self.name)
   > 1324 for temp_arg in temp_args:
   > 1325 temp_arg._detach()
   >
   > File ~/.local/lib/python3.10/site-packages/pyspark/sql/utils.py:190, in
   > capture_sql_exception..deco(*a, **kw)
   > 188 def deco(*a: Any, **kw: Any) -> Any:
   > 189 try:
   > --> 190 return f(*a, **kw)
   > 191 except Py4JJavaError as e:
   > 192 converted = convert_exception(e.java_exception)
   >
   > File ~/.local/lib/python3.10/site-packages/

Re: [I] How to realize Write Iceberg Tables via Hive? (Ideas share) [iceberg]

2023-12-08 Thread via GitHub


ExplorData24 commented on issue #2685:
URL: https://github.com/apache/iceberg/issues/2685#issuecomment-1847923590

   Hi @aimenglin.
   First of all, thank you very much for your availability. 
   
   I actually followed the part **Custom Iceberg catalogs** of this link: 
https://iceberg.incubator.apache.org/docs/latest/hive/#custom-iceberg-catalogs 
to create a table in iceberg format via the hive type catalog.
   **However, a viable workaround is to create and populate your Iceberg tables 
using Apache Spark** => for this solution are you talking about an integrated 
catalog (writing to the local file system)!!!
   
   Sincerely.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump mkdocs-material from 9.4.14 to 9.5.0 [iceberg-python]

2023-12-08 Thread via GitHub


dependabot[bot] commented on PR #196:
URL: https://github.com/apache/iceberg-python/pull/196#issuecomment-1847941081

   Superseded by #197.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Build: Bump mkdocs-material from 9.4.14 to 9.5.1 [iceberg-python]

2023-12-08 Thread via GitHub


dependabot[bot] opened a new pull request, #197:
URL: https://github.com/apache/iceberg-python/pull/197

   Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 
9.4.14 to 9.5.1.
   
   Release notes
   Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mkdocs-material's 
releases.
   
   mkdocs-material-9.5.1
   
   Updated Greek translations
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6464";>#6464:
 Privacy plugin cannot be enabled
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6461";>#6461:
 Sorting blog posts ignores time component in date
   
   mkdocs-material-9.5.0
   Merged Insiders features of 'Goat's Horn' funding goal
   
   Added privacy plugin: automatic downloading of external assets
   Added support for card grids and grid layouts
   Added support for improved tooltips
   Added support for content tabs anchor links (deep linking)
   Added support for automatic dark/light mode
   Added support for document contributors
   
   
   
   
   Changelog
   Sourced from https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG";>mkdocs-material's
 changelog.
   
   mkdocs-material-9.5.1+insiders-4.47.0 (2023-12-08)
   
   Added support for staying on page when switching languages
   Added configurable logging capabilities to projects plugin
   Removed temporary warning on blog plugin authors file format change
   Fixed projects plugin logging messages twice on Linux systems
   Fixed projects plugin trying to hoist theme assets of divergent 
themes
   Fixed compatibility of optimize plugin and projects plugin
   Fixed compatibility of social plugin and projects plugin
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6448";>#6448:
 Code line selection broken for code blocks with custom ids
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6437";>#6437:
 Projects plugin crashing for certain site URL configurations
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6414";>#6414:
 Projects plugin doesn't prefix messages coming from projects
   
   mkdocs-material-9.5.1 (2023-12-08)
   
   Updated Greek translations
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6464";>#6464:
 Privacy plugin cannot be enabled
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6461";>#6461:
 Sorting blog posts ignores time component in date
   
   mkdocs-material-9.5.0 (2023-12-07)
   Merged Insiders features of 'Goat's Horn' funding goal
   
   Added privacy plugin: automatic downloading of external assets
   Added support for card grids and grid layouts
   Added support for improved tooltips
   Added support for content tabs anchor links (deep linking)
   Added support for automatic dark/light mode
   Added support for document contributors
   
   mkdocs-material-9.4.14+insiders-4.46.0 (2023-11-26)
   
   Added support for author profiles in blog plugin
   Fixed custom index pages yielding two navigation items (4.45.0 
regression)
   
   mkdocs-material-9.4.14 (2023-11-26)
   
   Added support for linking authors in blog posts
   
   mkdocs-material-9.4.13 (2023-11-26)
   
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/6365";>#6365:
 Blog plugin pagination links to previous pages broken
   Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/5758";>#5758:
 Updated Mermaid.js to version 10.6.1 (latest)
   
   mkdocs-material-9.4.12+insiders-4.45.0 (2023-11-24)
   
   Added support for sorting blog categories by post count or custom 
function
   Improved tags plugin to generate Unicode-aware slugs by default
   Fixed non-deterministic order of multiple authors in blog plugin
   
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/squidfunk/mkdocs-material/commit/63683eb3c5f44a1eb4b75ec399d5420a80b42387";>63683eb
 Prepare 9.5.1 release
   https://github.com/squidfunk/mkdocs-material/commit/ac41e53b455381e10838208c0a6e021b6d617505";>ac41e53
 Fixed privacy plugin not being available
   https://github.com/squidfunk/mkdocs-material/commit/0d72b5f57e4249968b3a9aaa055e00716bd7dfc0";>0d72b5f
 Formatting
   https://github.com/squidfunk/mkdocs-material/commit/e1723f0cc6b1611c633f35eeb8017310c20fa623";>e1723f0
 Merge branch 'master' of github.com:squidfunk/mkdocs-material
   https://github.com/squidfunk/mkdocs-material/commit/5b70a0cfcb3a0cd93931115245c73a5e03c303b4";>5b70a0c
 Fixe time being dropped from blog post dates
   https://github.com/squidfunk/mkdocs-material/commit/6c1859145a5af08c215ffdb59e670155c923bac3";>6c18591
 Documentation (https://redirect.github.com/squidfunk/mkdocs-material/issues/6459";>#6459)
   https://github.com/squidfunk/mkdocs-material/commit/63de1a0567dc890e05c9ce4e1d46b69522724721";>63de1a0
 Merge branch 'master' of github.com:squidfunk/mkdocs-material
   https://github.com/squidfunk/mkdocs-material/commit/8a7a6a7357aed9707837ba98b6d7ffae23e2faa5";>8a7

Re: [PR] Build: Bump mkdocs-material from 9.4.14 to 9.5.0 [iceberg-python]

2023-12-08 Thread via GitHub


dependabot[bot] closed pull request #196: Build: Bump mkdocs-material from 
9.4.14 to 9.5.0
URL: https://github.com/apache/iceberg-python/pull/196


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-08 Thread via GitHub


tmnd1991 commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1847976194

   Finally I got a reproducer inside the codebase, you can find it at 
`TestSPJWithBucketing`.
   Spark 3.4 (same as my app) with the condition on the partitions will 
actually prune the unaffected partitions, while 3.5 will not.
   
   Anyway the more I work on this, the more I think the issue should be solved 
directly on the Scan, not by adding conditions manually. All info should be 
available to Spark beforehand, am I right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Build: rebasing a fork triggers redundant CI runs against master commits [iceberg]

2023-12-08 Thread via GitHub


github-actions[bot] commented on issue #7819:
URL: https://github.com/apache/iceberg/issues/7819#issuecomment-1847993683

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Docs: Give Metadata Tables their Own Page [iceberg]

2023-12-08 Thread via GitHub


github-actions[bot] commented on issue #7793:
URL: https://github.com/apache/iceberg/issues/7793#issuecomment-1847993704

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Docs: Include references GCP libraries [iceberg]

2023-12-08 Thread via GitHub


github-actions[bot] commented on issue #7787:
URL: https://github.com/apache/iceberg/issues/7787#issuecomment-1847993726

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] [Spark] Few package dependency issues of Iceberg and AWS [iceberg]

2023-12-08 Thread via GitHub


github-actions[bot] commented on issue #7737:
URL: https://github.com/apache/iceberg/issues/7737#issuecomment-1847993762

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat(docs): example of multiple catalogs defined in .pyiceberg.yaml [iceberg-python]

2023-12-08 Thread via GitHub


HonahX commented on code in PR #194:
URL: https://github.com/apache/iceberg-python/pull/194#discussion_r1421197076


##
mkdocs/docs/api.md:
##
@@ -33,6 +33,20 @@ catalog:
 credential: t-1234:secret
 ```
 
+Note that multiple catalogs can be defined in the same `.pyiceberg.yaml`:
+```yaml
+catalog:
+  hive:
+  uri: thrift://127.0.0.1:9083
+  s3.endpoint: http://127.0.0.1:9000
+  s3.access-key-id: admin
+  s3.secret-access-key: password
+  rest:
+uri: https://rest-server:8181/
+warehouse: my-warehouse
+```
+and loaded in python by calling `load_catalog`. See below for an example.

Review Comment:
   I think the example primarily illustrates how to load a catalog that is not 
defined in `.pyiceberg.yaml`. For catalogs that are defined in the yaml file, 
we might say
   ```
   and loaded in python by calling `load_catalog(name="hive")` and 
`load_catalog(name="rest")`
   ```
   Does this sound right to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add UnboundSortOrder [iceberg-rust]

2023-12-08 Thread via GitHub


fqaiser94 opened a new pull request, #115:
URL: https://github.com/apache/iceberg-rust/pull/115

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Parquet file overwritten by spark streaming job in subsequent execution with same spark streaming checkpoint location [iceberg]

2023-12-08 Thread via GitHub


amogh-jahagirdar commented on issue #9172:
URL: https://github.com/apache/iceberg/issues/9172#issuecomment-1848220358

   Thanks for the details, one key thing stands out to me:
   
   ```
   I also tested with latest version, iceberg-spark-runtime-3.4_2.12-1.4.2.jar 
as well, I could see that the second number, part of the file name, is 
continuously increasing 
1-3200-11773075-523f-4667-936b-88702fe9860c-1.parquet, however after 
around 200 execution of stream, the file name got reset 
1-3166-11773075-523f-4667-936b-88702fe9860c-1.parquet and files were 
started getting overwritten.
   ```
   
   This does align with the suspicion in the other issue that task IDs can be 
reused across epochs ("after around 200 executions of stream" I'm reading that 
as 200 intervals of miccrobatches)
   
Which I think makes sense (and anyways that's probably intentional in the 
DSV2 API to surface the writer). I'll put up a draft for adding the epochID to 
the output path. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark Streaming: Fix clobbering of files across streaming epochs [iceberg]

2023-12-08 Thread via GitHub


amogh-jahagirdar commented on code in PR #9255:
URL: https://github.com/apache/iceberg/pull/9255#discussion_r1421271945


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##
@@ -673,11 +673,11 @@ public DataWriter createWriter(int 
partitionId, long taskId, long e
   Table table = tableBroadcast.value();
   PartitionSpec spec = table.specs().get(outputSpecId);
   FileIO io = table.io();
-
+  String operationId = queryId + "-" + epochId;

Review Comment:
   I also need to think more, if this ends up breaking idempotency in other 
cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



  1   2   >