[I] S3 compression Issue with Iceberg [iceberg]

2023-10-04 Thread via GitHub


swat1234 opened a new issue, #8713:
URL: https://github.com/apache/iceberg/issues/8713

   Iceberg tables not compressing parquet file in s3. When the below Table 
parameters are used for the Compression the file size is increasing in 
comparison with uncompression. Can some one please assist on the same.
   
   1. File with UNCOMPRESSED codec.
   
   0-0-0129ba78-17f6-466f-b57b-695c678d64d5-1.parquet === size 682 bytes
   
   },
 "properties" : {
   "codec" : "UNCOMPRESSED",
   
   ---
   2. File with gzip codec 733 bytes
   
   0-0-e6f22c0e-2e16-43aa-8a5f-efabee995876-1.parquet
   
   "properties" : {
   "codec" : "GZIP",
   
   ---
   3. File with code snappy codec 686 bytes.
   
   0-0-36fd4aad-8c38-40f5-8241-78ffe4f0a032-1.parquet
   
"codec" : "SNAPPY",
   "path" : {
   
   --
   Table Properties:
   
   "parquet.compression": "SNAPPY"
   "read.parquet.vectorization.batch-size": "5000"
   "read.split.target-size": "134217728"
   "read.parquet.vectorization.enabled": "true"
   "write.parquet.page-size-bytes": "1048576"
   "write.parquet.row-group-size-bytes": "134217728"
   "write_compression": "SNAPPY"
   "write.parquet.compression-codec": "snappy"
   "write.metadata.metrics.max-inferred-column-defaults": "100"
   "write.parquet.compression-level": "4"
   "write.target-file-size-bytes": "536870912"
   "write.delete.target-file-size-bytes": "67108864"
   "write.parquet.page-row-limit": "2"
   "write.format.default": "parquet"
   "write.metadata.compression-codec": "gzip"
   "write.compression": "SNAPPY"
   
   
   Thanks in advance!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Support view metadata compression [iceberg]

2023-10-04 Thread via GitHub


nastra commented on code in PR #8552:
URL: https://github.com/apache/iceberg/pull/8552#discussion_r1345331616


##
core/src/test/java/org/apache/iceberg/view/TestViewMetadataParser.java:
##
@@ -308,4 +322,57 @@ public void 
replaceViewMetadataWithMultipleSQLsForDialect() throws Exception {
 
 assertThat(replaced.currentVersion()).isEqualTo(viewVersion);
   }
+
+  @ParameterizedTest
+  @ValueSource(strings = {"none", "gzip"})
+  public void metadataCompression(String codecName) throws IOException {
+Codec codec = Codec.fromName(codecName);
+String location = Paths.get(tmp.toString(), "v1" + 
getFileExtension(codec)).toString();

Review Comment:
   I was aligning this test with how compression was tested for table metadata 
in 
https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/core/src/test/java/org/apache/iceberg/TableMetadataParserTest.java#L63
 but I've changed it to have explicit file names



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API, Core: Allow setting a View's location [iceberg]

2023-10-04 Thread via GitHub


nastra commented on code in PR #8648:
URL: https://github.com/apache/iceberg/pull/8648#discussion_r1345339487


##
api/src/main/java/org/apache/iceberg/view/UpdateViewLocation.java:
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import org.apache.iceberg.PendingUpdate;
+
+/** API for setting a view's base location. */
+public interface UpdateViewLocation extends PendingUpdate {

Review Comment:
   yes, we could use `UpdateLocation` here instead of introducing a separate API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API, Core: Allow setting a View's location [iceberg]

2023-10-04 Thread via GitHub


nastra commented on code in PR #8648:
URL: https://github.com/apache/iceberg/pull/8648#discussion_r1345349415


##
core/src/main/java/org/apache/iceberg/view/SetViewLocation.java:
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import static org.apache.iceberg.TableProperties.COMMIT_MAX_RETRY_WAIT_MS;
+import static 
org.apache.iceberg.TableProperties.COMMIT_MAX_RETRY_WAIT_MS_DEFAULT;
+import static org.apache.iceberg.TableProperties.COMMIT_MIN_RETRY_WAIT_MS;
+import static 
org.apache.iceberg.TableProperties.COMMIT_MIN_RETRY_WAIT_MS_DEFAULT;
+import static org.apache.iceberg.TableProperties.COMMIT_NUM_RETRIES;
+import static org.apache.iceberg.TableProperties.COMMIT_NUM_RETRIES_DEFAULT;
+import static org.apache.iceberg.TableProperties.COMMIT_TOTAL_RETRY_TIME_MS;
+import static 
org.apache.iceberg.TableProperties.COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.PropertyUtil;
+import org.apache.iceberg.util.Tasks;
+
+class SetViewLocation implements UpdateViewLocation {
+  private final ViewOperations ops;
+  private ViewMetadata base;
+  private String newLocation = null;
+
+  SetViewLocation(ViewOperations ops) {
+this.ops = ops;
+this.base = ops.current();
+  }
+
+  @Override
+  public String apply() {
+return internalApply().location();

Review Comment:
   makes sense, I've aligned the logic with how it's done for tables in 
`SetLocation`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API, Core: Allow setting a View's location [iceberg]

2023-10-04 Thread via GitHub


nastra commented on PR #8648:
URL: https://github.com/apache/iceberg/pull/8648#issuecomment-1746330496

   thanks for the reviews @rdblue and @amogh-jahagirdar, I've adjusted the code 
accordingly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Deprecate usages of AssertHelpers in codebase [iceberg]

2023-10-04 Thread via GitHub


gzagarwal commented on issue #7094:
URL: https://github.com/apache/iceberg/issues/7094#issuecomment-1746338892

   > okay let me pick, I am working on iceberg-aws
   
   Shall i assume current test cases are working? on my local system they are 
not working so asking this question. 
   Is there any perquisite to work on the iceberg workspace .Some settings do i 
need to run to work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] S3 compression Issue with Iceberg [iceberg]

2023-10-04 Thread via GitHub


nastra commented on issue #8713:
URL: https://github.com/apache/iceberg/issues/8713#issuecomment-1746347077

   I see that you configured `"write.metadata.compression-codec": "gzip"` but 
this is for table metadata files being compressed, not individual data files. 
Also any particular reason to set `parquet.compression` / `write.compression` / 
... and all the others? The setting that controls this is 
`write.parquet.compression-codec`, which defaults to `gzip`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Deprecate usages of AssertHelpers in codebase [iceberg]

2023-10-04 Thread via GitHub


nastra commented on issue #7094:
URL: https://github.com/apache/iceberg/issues/7094#issuecomment-1746356259

   @gzagarwal yes the tests should all be working. What issue are you seeing? 
https://github.com/apache/iceberg/blob/a3aff95f9e60962240b94242e24a778760bdd1d9/CONTRIBUTING.md
 and https://iceberg.apache.org/contribute/#building-the-project-locally should 
contain everything needed in order to configure the project locally 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] S3 compression Issue with Iceberg [iceberg]

2023-10-04 Thread via GitHub


swat1234 commented on issue #8713:
URL: https://github.com/apache/iceberg/issues/8713#issuecomment-1746360361

   I am are trying to reduce the storage space of the files by applying Snappy 
or Gzip compression. I can see metadata is getting compression to gzip but not 
the data files. Could you guide me on how to do it,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] S3 compression Issue with Iceberg [iceberg]

2023-10-04 Thread via GitHub


nastra commented on issue #8713:
URL: https://github.com/apache/iceberg/issues/8713#issuecomment-1746404632

   I would probably start first by reducing the amount of random table 
properties being set. 
   As I mentioned earlier, the one that matters in your case is 
`write.parquet.compression-codec`, which defaults to `gzip`, but can also be 
set to `snappy` or `zstd`.
   The other setting you can experiment with is 
`write.parquet.compression-level`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Upgrade to gradle 8.3 [iceberg]

2023-10-04 Thread via GitHub


jbonofre commented on issue #8485:
URL: https://github.com/apache/iceberg/issues/8485#issuecomment-1746467781

   FYI, I tested `revapi` with Gradle 8.3 (on my PR). Here's the test I did:
   * I added `void test();` method in `SessionCatalog`
   * I added the corresponding `public void test() {}` in `RESTSessionCatalog`
   So there's actually a change on the API.
   With Gradle 8.1, `revapi` fails with `API/ABI breaks detected.`.
   With Gradle 8.3, `revapi` doesn't fail, it doesn't detect the API change.
   
   I'm investigating.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API, Core: Allow setting a View's location [iceberg]

2023-10-04 Thread via GitHub


nastra merged PR #8648:
URL: https://github.com/apache/iceberg/pull/8648


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Deprecate usages of AssertHelpers in codebase [iceberg]

2023-10-04 Thread via GitHub


gzagarwal commented on issue #7094:
URL: https://github.com/apache/iceberg/issues/7094#issuecomment-1746492707

   > @gzagarwal yes the tests should all be working. What issue are you seeing? 
https://github.com/apache/iceberg/blob/a3aff95f9e60962240b94242e24a778760bdd1d9/CONTRIBUTING.md
 and https://iceberg.apache.org/contribute/#building-the-project-locally should 
contain everything needed in order to configure the project locally
   
   Some issue with my local system only, let me fix that first , if someone 
wants to take care of the changes they can pick it up.
   if my workspace works locally then will come back.,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Upgrade to gradle 8.3 [iceberg]

2023-10-04 Thread via GitHub


ajantha-bhat commented on issue #8485:
URL: https://github.com/apache/iceberg/issues/8485#issuecomment-1746502923

   > With Gradle 8.3, revapi doesn't fail, it doesn't detect the API change.
   
   Yes. Thats what we have observed with Gradle 8.2 also. 
   Maybe we need to raise an issue to revAPI team. If it is a Gradle issue they 
will divert us to there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: new sink base on the unified sink API - WIP [iceberg]

2023-10-04 Thread via GitHub


gyfora commented on code in PR #8653:
URL: https://github.com/apache/iceberg/pull/8653#discussion_r1345500193


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java:
##
@@ -18,72 +18,39 @@
  */
 package org.apache.iceberg.flink.sink;
 
-import static org.apache.iceberg.TableProperties.AVRO_COMPRESSION;
-import static org.apache.iceberg.TableProperties.AVRO_COMPRESSION_LEVEL;
-import static org.apache.iceberg.TableProperties.ORC_COMPRESSION;
-import static org.apache.iceberg.TableProperties.ORC_COMPRESSION_STRATEGY;
-import static org.apache.iceberg.TableProperties.PARQUET_COMPRESSION;
-import static org.apache.iceberg.TableProperties.PARQUET_COMPRESSION_LEVEL;
-import static org.apache.iceberg.TableProperties.WRITE_DISTRIBUTION_MODE;
-
-import java.io.IOException;
-import java.io.UncheckedIOException;
-import java.time.Duration;
 import java.util.List;
 import java.util.Map;
-import java.util.Set;
-import java.util.function.Function;
 import org.apache.flink.api.common.functions.MapFunction;
 import org.apache.flink.api.common.typeinfo.TypeInformation;
 import org.apache.flink.api.common.typeinfo.Types;
-import org.apache.flink.configuration.Configuration;
-import org.apache.flink.configuration.ReadableConfig;
 import org.apache.flink.streaming.api.datastream.DataStream;
 import org.apache.flink.streaming.api.datastream.DataStreamSink;
 import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
 import org.apache.flink.streaming.api.functions.sink.DiscardingSink;
 import org.apache.flink.table.api.TableSchema;
 import org.apache.flink.table.data.RowData;
-import org.apache.flink.table.data.util.DataFormatConverters;
-import org.apache.flink.table.types.DataType;
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.types.Row;
-import org.apache.iceberg.DataFile;
-import org.apache.iceberg.DistributionMode;
-import org.apache.iceberg.FileFormat;
-import org.apache.iceberg.PartitionField;
-import org.apache.iceberg.PartitionSpec;
-import org.apache.iceberg.Schema;
-import org.apache.iceberg.SerializableTable;
 import org.apache.iceberg.Table;
-import org.apache.iceberg.flink.FlinkSchemaUtil;
 import org.apache.iceberg.flink.FlinkWriteConf;
-import org.apache.iceberg.flink.FlinkWriteOptions;
-import org.apache.iceberg.flink.TableLoader;
-import org.apache.iceberg.flink.util.FlinkCompatibilityUtil;
+import org.apache.iceberg.flink.sink.committer.IcebergFilesCommitter;
+import org.apache.iceberg.flink.sink.writer.IcebergStreamWriter;
 import org.apache.iceberg.io.WriteResult;
 import 
org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
-import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
-import org.apache.iceberg.relocated.com.google.common.collect.Sets;
-import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.util.SerializableSupplier;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-public class FlinkSink {
-  private static final Logger LOG = LoggerFactory.getLogger(FlinkSink.class);
 
+public class FlinkSink extends SinkBase {

Review Comment:
   Isn't this going to cause backward compatibility issues with the current 
FlinkSink instances? Maybe I am wrong but it means that users can only upgrade 
the iceberg version if they re-build their job graph (re-serialize operators 
etc.)



##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/committer/CommonCommitter.java:
##
@@ -0,0 +1,426 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink.committer;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.NavigableMap;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.TimeUnit;
+import javax.annotation.Nullable;
+import org.apache.flink.api.connector.sink2.Committer;
+import org.apache.flink.core.io.SimpleVersionedSerialization;
+import org

Re: [PR] Docs: Document all metadata tables. [iceberg]

2023-10-04 Thread via GitHub


nk1506 commented on code in PR #8709:
URL: https://github.com/apache/iceberg/pull/8709#discussion_r1345504986


##
docs/spark-queries.md:
##
@@ -357,6 +381,31 @@ SELECT * FROM prod.db.table.all_data_files;
 |  
0|s3://.../dt=20210103/0-0-26222098-032f-472b-8ea5-651a55b21210-1.parquet|
PARQUET|{20210103}|  14|  2444|{1 -> 94, 2 -> 17}|{1 -> 
14, 2 -> 14}|  {1 -> 0, 2 -> 0}|  {}|{1 -> 1, 2 -> 20210103}|{1 -> 
3, 2 -> 20210103}|null|  [4]|null|0|
 |  
0|s3://.../dt=20210104/0-0-a3bb1927-88eb-4f1c-bc6e-19076b0d952e-1.parquet|
PARQUET|{20210104}|  14|  2444|{1 -> 94, 2 -> 17}|{1 -> 
14, 2 -> 14}|  {1 -> 0, 2 -> 0}|  {}|{1 -> 1, 2 -> 20210104}|{1 -> 
3, 2 -> 20210104}|null|  [4]|null|0|
 
+ All Delete Files
+
+To show all the table's delete files and each file's metadata:

Review Comment:
   followed the same convention of previous tables like `files`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Document all metadata tables. [iceberg]

2023-10-04 Thread via GitHub


ajantha-bhat commented on code in PR #8709:
URL: https://github.com/apache/iceberg/pull/8709#discussion_r1345506789


##
docs/spark-queries.md:
##
@@ -357,6 +381,31 @@ SELECT * FROM prod.db.table.all_data_files;
 |  
0|s3://.../dt=20210103/0-0-26222098-032f-472b-8ea5-651a55b21210-1.parquet|
PARQUET|{20210103}|  14|  2444|{1 -> 94, 2 -> 17}|{1 -> 
14, 2 -> 14}|  {1 -> 0, 2 -> 0}|  {}|{1 -> 1, 2 -> 20210103}|{1 -> 
3, 2 -> 20210103}|null|  [4]|null|0|
 |  
0|s3://.../dt=20210104/0-0-a3bb1927-88eb-4f1c-bc6e-19076b0d952e-1.parquet|
PARQUET|{20210104}|  14|  2444|{1 -> 94, 2 -> 17}|{1 -> 
14, 2 -> 14}|  {1 -> 0, 2 -> 0}|  {}|{1 -> 1, 2 -> 20210104}|{1 -> 
3, 2 -> 20210104}|null|  [4]|null|0|
 
+ All Delete Files
+
+To show all the table's delete files and each file's metadata:

Review Comment:
   ok. The existing convention is still confusing for me to read. But we can 
optimize in a follow up. Ok for me to keep it similar now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] feat: In memory catalog [iceberg-rust]

2023-10-04 Thread via GitHub


JanKaul opened a new pull request, #74:
URL: https://github.com/apache/iceberg-rust/pull/74

   This is a draft PR to implement some functionality for an in memory catalog. 
The in memory catalog is supposed to simplify tests.
   
   Additionally this PR serves as a way to test the requirements for the 
`Catalog` trait for catalogs that make use of the `metadata_location` of an 
Iceberg Table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Could there be duplicate values in the result returned by the findOrphanFiles method? [iceberg]

2023-10-04 Thread via GitHub


nk1506 commented on issue #8670:
URL: https://github.com/apache/iceberg/issues/8670#issuecomment-1746511694

   @RussellSpitzer , I want to look into it and fix it accordingly. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Upgrade to gradle 8.3 [iceberg]

2023-10-04 Thread via GitHub


jbonofre commented on issue #8485:
URL: https://github.com/apache/iceberg/issues/8485#issuecomment-1746517112

   I think the problem is more on gradle or a mix with gradle and revapi gradle 
plugin. 
   
   I'm doing a bisect on gradle to identify the change causing the issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: new sink base on the unified sink API - WIP [iceberg]

2023-10-04 Thread via GitHub


gyfora commented on code in PR #8653:
URL: https://github.com/apache/iceberg/pull/8653#discussion_r1345500193


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java:
##
@@ -18,72 +18,39 @@
  */
 package org.apache.iceberg.flink.sink;
 
-import static org.apache.iceberg.TableProperties.AVRO_COMPRESSION;
-import static org.apache.iceberg.TableProperties.AVRO_COMPRESSION_LEVEL;
-import static org.apache.iceberg.TableProperties.ORC_COMPRESSION;
-import static org.apache.iceberg.TableProperties.ORC_COMPRESSION_STRATEGY;
-import static org.apache.iceberg.TableProperties.PARQUET_COMPRESSION;
-import static org.apache.iceberg.TableProperties.PARQUET_COMPRESSION_LEVEL;
-import static org.apache.iceberg.TableProperties.WRITE_DISTRIBUTION_MODE;
-
-import java.io.IOException;
-import java.io.UncheckedIOException;
-import java.time.Duration;
 import java.util.List;
 import java.util.Map;
-import java.util.Set;
-import java.util.function.Function;
 import org.apache.flink.api.common.functions.MapFunction;
 import org.apache.flink.api.common.typeinfo.TypeInformation;
 import org.apache.flink.api.common.typeinfo.Types;
-import org.apache.flink.configuration.Configuration;
-import org.apache.flink.configuration.ReadableConfig;
 import org.apache.flink.streaming.api.datastream.DataStream;
 import org.apache.flink.streaming.api.datastream.DataStreamSink;
 import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
 import org.apache.flink.streaming.api.functions.sink.DiscardingSink;
 import org.apache.flink.table.api.TableSchema;
 import org.apache.flink.table.data.RowData;
-import org.apache.flink.table.data.util.DataFormatConverters;
-import org.apache.flink.table.types.DataType;
 import org.apache.flink.table.types.logical.RowType;
 import org.apache.flink.types.Row;
-import org.apache.iceberg.DataFile;
-import org.apache.iceberg.DistributionMode;
-import org.apache.iceberg.FileFormat;
-import org.apache.iceberg.PartitionField;
-import org.apache.iceberg.PartitionSpec;
-import org.apache.iceberg.Schema;
-import org.apache.iceberg.SerializableTable;
 import org.apache.iceberg.Table;
-import org.apache.iceberg.flink.FlinkSchemaUtil;
 import org.apache.iceberg.flink.FlinkWriteConf;
-import org.apache.iceberg.flink.FlinkWriteOptions;
-import org.apache.iceberg.flink.TableLoader;
-import org.apache.iceberg.flink.util.FlinkCompatibilityUtil;
+import org.apache.iceberg.flink.sink.committer.IcebergFilesCommitter;
+import org.apache.iceberg.flink.sink.writer.IcebergStreamWriter;
 import org.apache.iceberg.io.WriteResult;
 import 
org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
-import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.relocated.com.google.common.collect.Maps;
-import org.apache.iceberg.relocated.com.google.common.collect.Sets;
-import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.util.SerializableSupplier;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-public class FlinkSink {
-  private static final Logger LOG = LoggerFactory.getLogger(FlinkSink.class);
 
+public class FlinkSink extends SinkBase {

Review Comment:
   Isn't this going to cause backward compatibility issues with the current 
FlinkSink instances? Maybe I am wrong but it means that users can only upgrade 
the iceberg version if they re-build their job graph (re-serialize operators 
etc.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Add View support for REST catalog [iceberg]

2023-10-04 Thread via GitHub


nastra commented on code in PR #7913:
URL: https://github.com/apache/iceberg/pull/7913#discussion_r1345527136


##
core/src/main/java/org/apache/iceberg/catalog/BaseSessionCatalog.java:
##
@@ -30,8 +30,10 @@
 import org.apache.iceberg.exceptions.NamespaceNotEmptyException;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet;
+import org.apache.iceberg.view.View;
+import org.apache.iceberg.view.ViewBuilder;
 
-public abstract class BaseSessionCatalog implements SessionCatalog {
+public abstract class BaseSessionCatalog implements SessionCatalog, 
ViewSessionCatalog {

Review Comment:
   in this case, maybe we should have a `BaseViewSessionCatalog` that extends 
`BaseSessionCatalog`? This would be similar to `BaseMetastoreCatalog` & 
`BaseMetastoreViewCatalog`.
   Also I'm not sure why RevAPI didn't complain



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Adopt `Catalog` API to include references to the `TableMetadata` and the `metadata_location` in the `TableCommit` payload for the `update_table` method [iceberg-rust]

2023-10-04 Thread via GitHub


JanKaul opened a new issue, #75:
URL: https://github.com/apache/iceberg-rust/issues/75

   Iceberg catalogs that make use of a `*.metadata.json` file to store the 
table metadata require the `metadata_location` and the `TableMetadata` of a 
Table to perform an `update_table` operation ([see 
here](https://github.com/JanKaul/iceberg-rust/blob/memory-catalog/crates/iceberg/src/catalog/memory.rs#L176)).
 It would therefore be helpful to include references to the `metadata_location` 
and the `TableMetadata` in the `TableCommit` payload of the `update_table` 
operation.
   
   Something like:
   
   ```rust
   pub struct TableCommit<'t> {
   /// The table ident.
   pub ident: TableIdent,
   /// Metadata file location of the table
   pub metadata_location: &'t str,
   /// Table metadata
   pub table_metadata: &'t TableMetadata,
   /// The requirements of the table.
   ///
   /// Commit will fail if the requirements are not met.
   pub requirements: Vec,
   /// The updates of the table.
   pub updates: Vec,
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Optimize metadata tables? [iceberg]

2023-10-04 Thread via GitHub


ajantha-bhat commented on issue #8714:
URL: https://github.com/apache/iceberg/issues/8714#issuecomment-1746561829

   cc: @szehon-ho 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Docs: document the compareWithFileList parameter [iceberg]

2023-10-04 Thread via GitHub


Tavisca-vinayak-bhadage commented on issue #8155:
URL: https://github.com/apache/iceberg/issues/8155#issuecomment-1746581142

   This compareWithFileList would be good solution for AWS S3 based iceberg 
tables also. As we are facing below exception with default remove orphan file 
implementation :
   
   `org.apache.iceberg.exceptions.RuntimeIOException: java.io.IOException: 
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
 Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error 
Code: SlowDown;)`
   
   Is there any way to use compareWithFileList parameter from pyspark or 
spark-sql. As we are having AWS Glue as spark engine which is not supporting 
java Data Set api.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Could there be duplicate values in the result returned by the findOrphanFiles method? [iceberg]

2023-10-04 Thread via GitHub


nk1506 commented on issue #8670:
URL: https://github.com/apache/iceberg/issues/8670#issuecomment-1746623860

   @hwfff , could you please share the stack-trace if handy? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] S3 compression Issue with Iceberg [iceberg]

2023-10-04 Thread via GitHub


swat1234 commented on issue #8713:
URL: https://github.com/apache/iceberg/issues/8713#issuecomment-1746699625

   We tried with only write.parquet.compression-codec parameter set to snappy, 
gzip but it is not working. Instead of compressing, the size is getting 
increased.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] S3 compression Issue with Iceberg [iceberg]

2023-10-04 Thread via GitHub


RussellSpitzer commented on issue #8713:
URL: https://github.com/apache/iceberg/issues/8713#issuecomment-1746708682

   If you are only trying with sub kilobyte files the results will be bad. You 
have some amortized costs there and most of the file (footers) will not be 
compressed. Try with larger files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: new sink base on the unified sink API - WIP [iceberg]

2023-10-04 Thread via GitHub


gyfora commented on code in PR #8653:
URL: https://github.com/apache/iceberg/pull/8653#discussion_r1345697890


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/SinkBase.java:
##
@@ -0,0 +1,326 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink;
+
+import static org.apache.iceberg.TableProperties.AVRO_COMPRESSION;
+import static org.apache.iceberg.TableProperties.AVRO_COMPRESSION_LEVEL;
+import static org.apache.iceberg.TableProperties.ORC_COMPRESSION;
+import static org.apache.iceberg.TableProperties.ORC_COMPRESSION_STRATEGY;
+import static org.apache.iceberg.TableProperties.PARQUET_COMPRESSION;
+import static org.apache.iceberg.TableProperties.PARQUET_COMPRESSION_LEVEL;
+import static org.apache.iceberg.TableProperties.WRITE_DISTRIBUTION_MODE;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.io.UncheckedIOException;
+import java.time.Duration;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.iceberg.DistributionMode;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SerializableTable;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.flink.FlinkSchemaUtil;
+import org.apache.iceberg.flink.FlinkWriteConf;
+import org.apache.iceberg.flink.TableLoader;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.TypeUtil;
+import org.apache.iceberg.util.SerializableSupplier;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class SinkBase implements Serializable {
+  private static final Logger LOG = LoggerFactory.getLogger(SinkBase.class);
+
+  private transient SinkBuilder builder;
+  private List equalityFieldIds;
+  private RowType flinkRowType;
+  // Can not be serialized, so we have to be careful and serialize specific 
fields when needed
+  private transient FlinkWriteConf flinkWriteConf;
+  private Map writeProperties;
+  private SerializableSupplier tableSupplier;
+
+  SinkBase(SinkBuilder builder) {
+this.builder = builder;
+  }
+
+  SinkBuilder builder() {
+return builder;
+  }
+
+  FlinkWriteConf flinkWriteConf() {
+return flinkWriteConf;
+  }
+
+  Map writeProperties() {
+return writeProperties;
+  }
+
+  SerializableSupplier tableSupplier() {
+return tableSupplier;
+  }
+
+  RowType flinkRowType() {
+return flinkRowType;
+  }
+
+  List equalityFieldIds() {

Review Comment:
   Why not simply protected / package private fields and access them directly 
to reduce boilerplate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: new sink base on the unified sink API - WIP [iceberg]

2023-10-04 Thread via GitHub


gyfora commented on code in PR #8653:
URL: https://github.com/apache/iceberg/pull/8653#discussion_r1345702939


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java:
##
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink;
+
+import java.util.Map;
+import java.util.UUID;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.connector.sink2.Committer;
+import org.apache.flink.core.io.SimpleVersionedSerializer;
+import org.apache.flink.streaming.api.connector.sink2.CommittableMessage;
+import 
org.apache.flink.streaming.api.connector.sink2.CommittableMessageTypeInfo;
+import org.apache.flink.streaming.api.connector.sink2.WithPostCommitTopology;
+import org.apache.flink.streaming.api.connector.sink2.WithPreCommitTopology;
+import org.apache.flink.streaming.api.connector.sink2.WithPreWriteTopology;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.datastream.DataStreamSink;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.types.Row;
+import org.apache.iceberg.DistributionMode;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.flink.TableLoader;
+import org.apache.iceberg.flink.sink.committer.CommonCommitter;
+import org.apache.iceberg.flink.sink.committer.SinkV2Aggregator;
+import org.apache.iceberg.flink.sink.committer.SinkV2Committable;
+import org.apache.iceberg.flink.sink.committer.SinkV2CommittableSerializer;
+import org.apache.iceberg.flink.sink.committer.SinkV2Committer;
+import org.apache.iceberg.flink.sink.writer.IcebergStreamWriterMetrics;
+import org.apache.iceberg.flink.sink.writer.RowDataTaskWriterFactory;
+import org.apache.iceberg.flink.sink.writer.SinkV2Writer;
+
+/**
+ * Flink v2 sink offer different hooks to insert custom topologies into the 
sink. We will use the
+ * following:
+ *
+ * 
+ *   {@link WithPreWriteTopology} which redistributes the data to the 
writers based on the
+ *   {@link DistributionMode}
+ *   {@link org.apache.flink.api.connector.sink2.SinkWriter} which writes 
data files, and
+ *   generates a {@link SinkV2Committable} to store the {@link
+ *   org.apache.iceberg.io.WriteResult}
+ *   {@link WithPreCommitTopology} which we use to to place the {@link 
SinkV2Aggregator} which
+ *   merges the individual {@link 
org.apache.flink.api.connector.sink2.SinkWriter}'s {@link
+ *   org.apache.iceberg.io.WriteResult}s to a single {@link
+ *   org.apache.iceberg.flink.sink.committer.DeltaManifests}
+ *   {@link Committer} which stores the incoming {@link
+ *   org.apache.iceberg.flink.sink.committer.DeltaManifests}s in state for 
recovery, and commits
+ *   them to the Iceberg table using the {@link CommonCommitter}
+ *   {@link WithPostCommitTopology} we could use for incremental 
compaction later. This is not
+ *   implemented yet.
+ * 
+ *
+ * The job graph looks like below:
+ *
+ * {@code
+ *Flink sink
+ *   
+---+
+ *   | 
  |
+ * +---+ | +--+   +-+  
+---+ |
+ * | Map 1 | ==> | | writer 1 |   | committer 1 | 
---> | post commit 1 | |
+ * +---+ | +--+   +-+  
+---+ |
+ *   | \ /
\  |
+ *   |  \   /  
\ |
+ *   |   \ /   
 \|
+ * +---+ | +--+   \ +---+ /   +-+  
  \ +---+ |
+ * | Map 2 | ==> | | writer 2 | --->| commit aggregator | | 

Re: [PR] Flink: new sink base on the unified sink API - WIP [iceberg]

2023-10-04 Thread via GitHub


gyfora commented on code in PR #8653:
URL: https://github.com/apache/iceberg/pull/8653#discussion_r1345705474


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java:
##
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink;
+
+import java.util.Map;
+import java.util.UUID;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.connector.sink2.Committer;
+import org.apache.flink.core.io.SimpleVersionedSerializer;
+import org.apache.flink.streaming.api.connector.sink2.CommittableMessage;
+import 
org.apache.flink.streaming.api.connector.sink2.CommittableMessageTypeInfo;
+import org.apache.flink.streaming.api.connector.sink2.WithPostCommitTopology;
+import org.apache.flink.streaming.api.connector.sink2.WithPreCommitTopology;
+import org.apache.flink.streaming.api.connector.sink2.WithPreWriteTopology;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.datastream.DataStreamSink;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.types.Row;
+import org.apache.iceberg.DistributionMode;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.flink.TableLoader;
+import org.apache.iceberg.flink.sink.committer.CommonCommitter;
+import org.apache.iceberg.flink.sink.committer.SinkV2Aggregator;
+import org.apache.iceberg.flink.sink.committer.SinkV2Committable;
+import org.apache.iceberg.flink.sink.committer.SinkV2CommittableSerializer;
+import org.apache.iceberg.flink.sink.committer.SinkV2Committer;
+import org.apache.iceberg.flink.sink.writer.IcebergStreamWriterMetrics;
+import org.apache.iceberg.flink.sink.writer.RowDataTaskWriterFactory;
+import org.apache.iceberg.flink.sink.writer.SinkV2Writer;
+
+/**
+ * Flink v2 sink offer different hooks to insert custom topologies into the 
sink. We will use the
+ * following:
+ *
+ * 
+ *   {@link WithPreWriteTopology} which redistributes the data to the 
writers based on the
+ *   {@link DistributionMode}
+ *   {@link org.apache.flink.api.connector.sink2.SinkWriter} which writes 
data files, and
+ *   generates a {@link SinkV2Committable} to store the {@link
+ *   org.apache.iceberg.io.WriteResult}
+ *   {@link WithPreCommitTopology} which we use to to place the {@link 
SinkV2Aggregator} which
+ *   merges the individual {@link 
org.apache.flink.api.connector.sink2.SinkWriter}'s {@link
+ *   org.apache.iceberg.io.WriteResult}s to a single {@link
+ *   org.apache.iceberg.flink.sink.committer.DeltaManifests}
+ *   {@link Committer} which stores the incoming {@link
+ *   org.apache.iceberg.flink.sink.committer.DeltaManifests}s in state for 
recovery, and commits
+ *   them to the Iceberg table using the {@link CommonCommitter}
+ *   {@link WithPostCommitTopology} we could use for incremental 
compaction later. This is not
+ *   implemented yet.
+ * 
+ *
+ * The job graph looks like below:
+ *
+ * {@code
+ *Flink sink
+ *   
+---+
+ *   | 
  |
+ * +---+ | +--+   +-+  
+---+ |
+ * | Map 1 | ==> | | writer 1 |   | committer 1 | 
---> | post commit 1 | |
+ * +---+ | +--+   +-+  
+---+ |
+ *   | \ /
\  |
+ *   |  \   /  
\ |
+ *   |   \ /   
 \|
+ * +---+ | +--+   \ +---+ /   +-+  
  \ +---+ |
+ * | Map 2 | ==> | | writer 2 | --->| commit aggregator | | 

Re: [PR] Flink: new sink base on the unified sink API - WIP [iceberg]

2023-10-04 Thread via GitHub


gyfora commented on code in PR #8653:
URL: https://github.com/apache/iceberg/pull/8653#discussion_r1345702939


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java:
##
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink;
+
+import java.util.Map;
+import java.util.UUID;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.connector.sink2.Committer;
+import org.apache.flink.core.io.SimpleVersionedSerializer;
+import org.apache.flink.streaming.api.connector.sink2.CommittableMessage;
+import 
org.apache.flink.streaming.api.connector.sink2.CommittableMessageTypeInfo;
+import org.apache.flink.streaming.api.connector.sink2.WithPostCommitTopology;
+import org.apache.flink.streaming.api.connector.sink2.WithPreCommitTopology;
+import org.apache.flink.streaming.api.connector.sink2.WithPreWriteTopology;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.datastream.DataStreamSink;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.types.Row;
+import org.apache.iceberg.DistributionMode;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.flink.TableLoader;
+import org.apache.iceberg.flink.sink.committer.CommonCommitter;
+import org.apache.iceberg.flink.sink.committer.SinkV2Aggregator;
+import org.apache.iceberg.flink.sink.committer.SinkV2Committable;
+import org.apache.iceberg.flink.sink.committer.SinkV2CommittableSerializer;
+import org.apache.iceberg.flink.sink.committer.SinkV2Committer;
+import org.apache.iceberg.flink.sink.writer.IcebergStreamWriterMetrics;
+import org.apache.iceberg.flink.sink.writer.RowDataTaskWriterFactory;
+import org.apache.iceberg.flink.sink.writer.SinkV2Writer;
+
+/**
+ * Flink v2 sink offer different hooks to insert custom topologies into the 
sink. We will use the
+ * following:
+ *
+ * 
+ *   {@link WithPreWriteTopology} which redistributes the data to the 
writers based on the
+ *   {@link DistributionMode}
+ *   {@link org.apache.flink.api.connector.sink2.SinkWriter} which writes 
data files, and
+ *   generates a {@link SinkV2Committable} to store the {@link
+ *   org.apache.iceberg.io.WriteResult}
+ *   {@link WithPreCommitTopology} which we use to to place the {@link 
SinkV2Aggregator} which
+ *   merges the individual {@link 
org.apache.flink.api.connector.sink2.SinkWriter}'s {@link
+ *   org.apache.iceberg.io.WriteResult}s to a single {@link
+ *   org.apache.iceberg.flink.sink.committer.DeltaManifests}
+ *   {@link Committer} which stores the incoming {@link
+ *   org.apache.iceberg.flink.sink.committer.DeltaManifests}s in state for 
recovery, and commits
+ *   them to the Iceberg table using the {@link CommonCommitter}
+ *   {@link WithPostCommitTopology} we could use for incremental 
compaction later. This is not
+ *   implemented yet.
+ * 
+ *
+ * The job graph looks like below:
+ *
+ * {@code
+ *Flink sink
+ *   
+---+
+ *   | 
  |
+ * +---+ | +--+   +-+  
+---+ |
+ * | Map 1 | ==> | | writer 1 |   | committer 1 | 
---> | post commit 1 | |
+ * +---+ | +--+   +-+  
+---+ |
+ *   | \ /
\  |
+ *   |  \   /  
\ |
+ *   |   \ /   
 \|
+ * +---+ | +--+   \ +---+ /   +-+  
  \ +---+ |
+ * | Map 2 | ==> | | writer 2 | --->| commit aggregator | | 

Re: [PR] Flink: new sink base on the unified sink API - WIP [iceberg]

2023-10-04 Thread via GitHub


gyfora commented on code in PR #8653:
URL: https://github.com/apache/iceberg/pull/8653#discussion_r1345706466


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java:
##
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.sink;
+
+import java.util.Map;
+import java.util.UUID;
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.connector.sink2.Committer;
+import org.apache.flink.core.io.SimpleVersionedSerializer;
+import org.apache.flink.streaming.api.connector.sink2.CommittableMessage;
+import 
org.apache.flink.streaming.api.connector.sink2.CommittableMessageTypeInfo;
+import org.apache.flink.streaming.api.connector.sink2.WithPostCommitTopology;
+import org.apache.flink.streaming.api.connector.sink2.WithPreCommitTopology;
+import org.apache.flink.streaming.api.connector.sink2.WithPreWriteTopology;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.datastream.DataStreamSink;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.types.Row;
+import org.apache.iceberg.DistributionMode;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.flink.TableLoader;
+import org.apache.iceberg.flink.sink.committer.CommonCommitter;
+import org.apache.iceberg.flink.sink.committer.SinkV2Aggregator;
+import org.apache.iceberg.flink.sink.committer.SinkV2Committable;
+import org.apache.iceberg.flink.sink.committer.SinkV2CommittableSerializer;
+import org.apache.iceberg.flink.sink.committer.SinkV2Committer;
+import org.apache.iceberg.flink.sink.writer.IcebergStreamWriterMetrics;
+import org.apache.iceberg.flink.sink.writer.RowDataTaskWriterFactory;
+import org.apache.iceberg.flink.sink.writer.SinkV2Writer;
+
+/**
+ * Flink v2 sink offer different hooks to insert custom topologies into the 
sink. We will use the
+ * following:
+ *
+ * 
+ *   {@link WithPreWriteTopology} which redistributes the data to the 
writers based on the
+ *   {@link DistributionMode}
+ *   {@link org.apache.flink.api.connector.sink2.SinkWriter} which writes 
data files, and
+ *   generates a {@link SinkV2Committable} to store the {@link
+ *   org.apache.iceberg.io.WriteResult}
+ *   {@link WithPreCommitTopology} which we use to to place the {@link 
SinkV2Aggregator} which
+ *   merges the individual {@link 
org.apache.flink.api.connector.sink2.SinkWriter}'s {@link
+ *   org.apache.iceberg.io.WriteResult}s to a single {@link
+ *   org.apache.iceberg.flink.sink.committer.DeltaManifests}
+ *   {@link Committer} which stores the incoming {@link
+ *   org.apache.iceberg.flink.sink.committer.DeltaManifests}s in state for 
recovery, and commits
+ *   them to the Iceberg table using the {@link CommonCommitter}
+ *   {@link WithPostCommitTopology} we could use for incremental 
compaction later. This is not
+ *   implemented yet.
+ * 
+ *
+ * The job graph looks like below:
+ *
+ * {@code
+ *Flink sink
+ *   
+---+
+ *   | 
  |
+ * +---+ | +--+   +-+  
+---+ |
+ * | Map 1 | ==> | | writer 1 |   | committer 1 | 
---> | post commit 1 | |
+ * +---+ | +--+   +-+  
+---+ |
+ *   | \ /
\  |
+ *   |  \   /  
\ |
+ *   |   \ /   
 \|
+ * +---+ | +--+   \ +---+ /   +-+  
  \ +---+ |
+ * | Map 2 | ==> | | writer 2 | --->| commit aggregator | | 

Re: [I] Writing to S3 fails if the user is authenticated with `aws sso login` [iceberg-python]

2023-10-04 Thread via GitHub


jayceslesar commented on issue #39:
URL: https://github.com/apache/iceberg-python/issues/39#issuecomment-1746873989

   This is confirmed an upstream bug in pyarrow 13.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] S3 compression Issue with Iceberg [iceberg]

2023-10-04 Thread via GitHub


amogh-jahagirdar commented on issue #8713:
URL: https://github.com/apache/iceberg/issues/8713#issuecomment-1746920616

   +1 to @RussellSpitzer point. These files seem way too small for compression 
to play a significant role and be meaningful. Compression is most noticeable on 
significant amounts of "similar" data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Construct a writer tree [iceberg-python]

2023-10-04 Thread via GitHub


Fokko opened a new pull request, #40:
URL: https://github.com/apache/iceberg-python/pull/40

   For V1 and V2 there are some differences that are hard to enforce without 
this:
   
   - `1: snapshot_id` is required for V1, optional for V2
   - `105: block_size_in_bytes` needs to be written for V1, but omitted for V2 
(this leverages the `write-default`).
   - `3: sequence_number` and `4: file_sequence_number` can be omited for V1.
   
   Everything that we read, we map it to V2. However, when writing we also want 
to be compliant with the V1 spec, and this is where the writer tree comes in 
since we construct a tree for V1 or V2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Construct a writer tree [iceberg-python]

2023-10-04 Thread via GitHub


Fokko commented on code in PR #40:
URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1345991299


##
pyiceberg/manifest.py:
##
@@ -262,15 +346,13 @@ class DataFile(Record):
 "split_offsets",
 "equality_ids",
 "sort_order_id",
-"spec_id",
 )
 content: DataFileContent
 file_path: str
 file_format: FileFormat
 partition: Record
 record_count: int
 file_size_in_bytes: int
-block_size_in_bytes: Optional[int]

Review Comment:
   I've removed this since we shouldn't use it. It is still part of the schema 
itself.



##
pyiceberg/manifest.py:
##
@@ -281,7 +363,6 @@ class DataFile(Record):
 split_offsets: Optional[List[int]]
 equality_ids: Optional[List[int]]
 sort_order_id: Optional[int]
-spec_id: Optional[int]

Review Comment:
   Removed this for now. Created an issue: 
https://github.com/apache/iceberg/issues/8712



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Your branch name [iceberg]

2023-10-04 Thread via GitHub


shreyanshR7 opened a new pull request, #8715:
URL: https://github.com/apache/iceberg/pull/8715

   #7154


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Add View support for REST catalog [iceberg]

2023-10-04 Thread via GitHub


nastra commented on code in PR #7913:
URL: https://github.com/apache/iceberg/pull/7913#discussion_r1346067694


##
core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java:
##
@@ -568,4 +649,9 @@ private static TableIdentifier 
identFromPathVars(Map pathVars) {
 return TableIdentifier.of(
 namespaceFromPathVars(pathVars), 
RESTUtil.decodeString(pathVars.get("table")));
   }
+
+  private static TableIdentifier viewIdentFromPathVars(Map 
pathVars) {

Review Comment:
   yep, good idea



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Add View support for REST catalog [iceberg]

2023-10-04 Thread via GitHub


nastra commented on code in PR #7913:
URL: https://github.com/apache/iceberg/pull/7913#discussion_r1346082143


##
open-api/rest-catalog-open-api.yaml:
##
@@ -1014,6 +1014,357 @@ paths:
 }
   }
 
+  /v1/{prefix}/namespaces/{namespace}/views:
+parameters:
+  - $ref: '#/components/parameters/prefix'
+  - $ref: '#/components/parameters/namespace'
+
+get:
+  tags:
+- Catalog API
+  summary: List all view identifiers underneath a given namespace
+  description: Return all view identifiers under this namespace
+  operationId: listViews
+  responses:
+200:
+  $ref: '#/components/responses/ListTablesResponse'
+400:
+  $ref: '#/components/responses/BadRequestErrorResponse'
+401:
+  $ref: '#/components/responses/UnauthorizedResponse'
+403:
+  $ref: '#/components/responses/ForbiddenResponse'
+404:
+  description: Not Found - The namespace specified does not exist
+  content:
+application/json:
+  schema:
+$ref: '#/components/schemas/ErrorModel'
+  examples:
+NamespaceNotFound:
+  $ref: '#/components/examples/NoSuchNamespaceError'
+419:
+  $ref: '#/components/responses/AuthenticationTimeoutResponse'
+503:
+  $ref: '#/components/responses/ServiceUnavailableResponse'
+5XX:
+  $ref: '#/components/responses/ServerErrorResponse'
+
+post:
+  tags:
+- Catalog API
+  summary: Create a view in the given namespace
+  description:
+Create a view in the given namespace.
+  operationId: createView
+  requestBody:
+content:
+  application/json:
+schema:
+  $ref: '#/components/schemas/CreateViewRequest'
+  responses:
+200:
+  $ref: '#/components/responses/LoadViewResponse'
+400:
+  $ref: '#/components/responses/BadRequestErrorResponse'
+401:
+  $ref: '#/components/responses/UnauthorizedResponse'
+403:
+  $ref: '#/components/responses/ForbiddenResponse'
+404:
+  description: Not Found - The namespace specified does not exist
+  content:
+application/json:
+  schema:
+$ref: '#/components/schemas/ErrorModel'
+  examples:
+NamespaceNotFound:
+  $ref: '#/components/examples/NoSuchNamespaceError'
+409:
+  description: Conflict - The view already exists
+  content:
+application/json:
+  schema:
+$ref: '#/components/schemas/ErrorModel'
+  examples:
+NamespaceAlreadyExists:
+  $ref: '#/components/examples/ViewAlreadyExistsError'
+419:
+  $ref: '#/components/responses/AuthenticationTimeoutResponse'
+503:
+  $ref: '#/components/responses/ServiceUnavailableResponse'
+5XX:
+  $ref: '#/components/responses/ServerErrorResponse'
+
+  /v1/{prefix}/namespaces/{namespace}/views/{view}:
+parameters:
+  - $ref: '#/components/parameters/prefix'
+  - $ref: '#/components/parameters/namespace'
+  - $ref: '#/components/parameters/view'
+
+get:
+  tags:
+- Catalog API
+  summary: Load a view from the catalog
+  operationId: loadView
+  description:
+Load a view from the catalog.
+
+
+The response contains both configuration and table metadata. The 
configuration, if non-empty is used
+as additional configuration for the view that overrides catalog 
configuration. For example, this
+configuration may change the FileIO implementation to be used for the 
view.
+
+
+The response also contains the view's full metadata, matching the view 
metadata JSON file.
+
+
+The catalog configuration may contain credentials that should be used 
for subsequent requests for the
+view. The configuration key "token" is used to pass an access token to 
be used as a bearer token
+for view requests. Otherwise, a token may be passed using a RFC 8693 
token type as a configuration
+key. For example, "urn:ietf:params:oauth:token-type:jwt=".
+  responses:
+200:
+  $ref: '#/components/responses/LoadViewResponse'
+400:
+  $ref: '#/components/responses/BadRequestErrorResponse'
+401:
+  $ref: '#/components/responses/UnauthorizedResponse'
+403:
+  $ref: '#/components/responses/ForbiddenResponse'
+404:
+  description:
+Not Found - NoSuchViewException, view to load does not exist
+  content:
+application/json:
+  schema:
+$ref: '#/components/schemas/ErrorModel'
+  examples:
+ViewToLoadDoesNotExist:
+

[PR] OpenAPI: Add AssignUUID update to metadata updates [iceberg]

2023-10-04 Thread via GitHub


nastra opened a new pull request, #8716:
URL: https://github.com/apache/iceberg/pull/8716

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] WIP: Write support [iceberg-python]

2023-10-04 Thread via GitHub


Fokko opened a new pull request, #41:
URL: https://github.com/apache/iceberg-python/pull/41

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Replace Thread.sleep() usage in test code with Awaitility [iceberg]

2023-10-04 Thread via GitHub


shreyanshR7 commented on issue #7154:
URL: https://github.com/apache/iceberg/issues/7154#issuecomment-1747275843

   @nastra I tried to implement the above method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Thread.sleep() method is replaced with Awaitility [iceberg]

2023-10-04 Thread via GitHub


nk1506 commented on code in PR #8715:
URL: https://github.com/apache/iceberg/pull/8715#discussion_r1346198636


##
api/src/test/java/org/apache/iceberg/metrics/TestDefaultTimer.java:
##
@@ -104,7 +106,7 @@ public void measureRunnable() {
 Runnable runnable =
 () -> {
   try {
-Thread.sleep(100);
+Awaitility.await().atLeast(Duration.ofMillis(100)).until(() -> 
true);

Review Comment:
   won't it become flaky, if callable condition is evaluated sooner. 
   `if (evaluationDuration.compareTo(minWaitTime) < 0) {
   String message = String.format("Condition was evaluated in 
%s which is earlier than expected minimum timeout %s",
   formatAsString(evaluationDuration), 
formatAsString(minWaitTime));
   conditionEvaluationHandler.handleTimeout(message, true);
   throw new ConditionTimeoutException(message);
   }`
   
   Why not use something like 
`https://github.com/apache/iceberg/blob/master/api/src/test/java/org/apache/iceberg/TestHelpers.java#L57`
   ? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Thread.sleep() method is replaced with Awaitility [iceberg]

2023-10-04 Thread via GitHub


shreyanshR7 commented on PR #8715:
URL: https://github.com/apache/iceberg/pull/8715#issuecomment-1747371647

   Oh i see, the code uses a while loop checks the current time until the 
condition is met.But its asked to replace Thread.sleep method with awaitility, 
should i implement your suggestion as it don't uses awaitility.@nk1506


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Parquet: Support filter operations on int96 timestamps [iceberg]

2023-10-04 Thread via GitHub


thesquelched closed pull request #2563: Parquet: Support filter operations on 
int96 timestamps
URL: https://github.com/apache/iceberg/pull/2563


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Push filtering for Iceberg table type to Hive MetaStore when listing tables [iceberg]

2023-10-04 Thread via GitHub


mderoy commented on PR #2722:
URL: https://github.com/apache/iceberg/pull/2722#issuecomment-1747635549

   @hankfanchiu this is awesome. the change in performance by doing this is 
exponential... any chance on reviving this? or did the external issues put a 
damper on this? maybe we can do this via catalog properties like the 
'list-all-tables' property as a workaround if we're hung up on the external 
issues :thinking: if you don't have plans on moving this forward I can give it 
a go :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Phase 1 - New Docs Deployment [iceberg]

2023-10-04 Thread via GitHub


aokolnychyi commented on code in PR #8659:
URL: https://github.com/apache/iceberg/pull/8659#discussion_r1346544156


##
docs-new/site/releases.md:
##
@@ -0,0 +1,777 @@
+---
+title: "Releases"
+---
+
+
+## Downloads
+
+The latest version of Iceberg is [{{ icebergVersion 
}}](https://github.com/apache/iceberg/releases/tag/apache-iceberg-{{ 
icebergVersion }}).
+
+* [{{ icebergVersion }} source 
tar.gz](https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-{{ 
icebergVersion }}/apache-iceberg-{{ icebergVersion }}.tar.gz) -- 
[signature](https://downloads.apache.org/iceberg/apache-iceberg-{{ 
icebergVersion }}/apache-iceberg-{{ icebergVersion }}.tar.gz.asc) -- 
[sha512](https://downloads.apache.org/iceberg/apache-iceberg-{{ icebergVersion 
}}/apache-iceberg-{{ icebergVersion }}.tar.gz.sha512)
+* [{{ icebergVersion }} Spark 3.4\_2.12 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.4_2.12/{{
 icebergVersion }}/iceberg-spark-runtime-3.4_2.12-{{ icebergVersion }}.jar) -- 
[3.4\_2.13](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.4_2.13/{{
 icebergVersion }}/iceberg-spark-runtime-3.4_2.13-{{ icebergVersion }}.jar)
+* [{{ icebergVersion }} Spark 3.3\_2.12 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.3_2.12/{{
 icebergVersion }}/iceberg-spark-runtime-3.3_2.12-{{ icebergVersion }}.jar) -- 
[3.3\_2.13](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.3_2.13/{{
 icebergVersion }}/iceberg-spark-runtime-3.3_2.13-{{ icebergVersion }}.jar)
+* [{{ icebergVersion }} Spark 3.2\_2.12 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/{{
 icebergVersion }}/iceberg-spark-runtime-3.2_2.12-{{ icebergVersion }}.jar) -- 
[3.2\_2.13](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.2_2.13/{{
 icebergVersion }}/iceberg-spark-runtime-3.2_2.13-{{ icebergVersion }}.jar)
+* [{{ icebergVersion }} Spark 3.1 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.1_2.12/{{
 icebergVersion }}/iceberg-spark-runtime-3.1_2.12-{{ icebergVersion }}.jar)
+* [{{ icebergVersion }} Flink 1.17 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.17/{{
 icebergVersion }}/iceberg-flink-runtime-1.17-{{ icebergVersion }}.jar)
+* [{{ icebergVersion }} Flink 1.16 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.16/{{
 icebergVersion }}/iceberg-flink-runtime-1.16-{{ icebergVersion }}.jar)
+* [{{ icebergVersion }} Flink 1.15 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.15/{{
 icebergVersion }}/iceberg-flink-runtime-1.15-{{ icebergVersion }}.jar)
+* [{{ icebergVersion }} Hive runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-hive-runtime/{{
 icebergVersion }}/iceberg-hive-runtime-{{ icebergVersion }}.jar)
+
+To use Iceberg in Spark or Flink, download the runtime JAR for your engine 
version and add it to the jars folder of your installation.
+
+To use Iceberg in Hive 2 or Hive 3, download the Hive runtime JAR and add it 
to Hive using `ADD JAR`.
+
+### Gradle
+
+To add a dependency on Iceberg in Gradle, add the following to `build.gradle`:
+
+```
+dependencies {
+  compile 'org.apache.iceberg:iceberg-core:{{ icebergVersion }}'
+}
+```
+
+You may also want to include `iceberg-parquet` for Parquet file support.
+
+### Maven
+
+To add a dependency on Iceberg in Maven, add the following to your `pom.xml`:
+
+```
+
+  ...
+  
+org.apache.iceberg
+iceberg-core
+{{ icebergVersion }}
+  
+  ...
+
+```
+
+## 1.3.1 release
+
+Apache Iceberg 1.3.1 was released on July 25, 2023.
+The 1.3.1 release addresses various issues identified in the 1.3.0 release.
+
+* Core
+  - Table Metadata parser now accepts null for fields: current-snapshot-id, 
properties, and snapshots 
([\#8064](https://github.com/apache/iceberg/pull/8064))
+* Hive
+  - Fix HiveCatalog deleting metadata on failures in checking lock status 
([\#7931](https://github.com/apache/iceberg/pull/7931))
+* Spark
+  - Fix RewritePositionDeleteFiles failure for certain partition types 
([\#8059](https://github.com/apache/iceberg/pull/8059))
+  - Fix RewriteDataFiles concurrency edge-case on commit timeouts 
([\#7933](https://github.com/apache/iceberg/pull/7933))
+  - Fix partition-level DELETE operations for WAP branches 
([\#7900](https://github.com/apache/iceberg/pull/7900))
+* Flink
+  - FlinkCatalog creation no longer creates the default database 
([\#8039](https://github.com/apache/iceberg/pull/8039))
+
+## Past releases
+
+### 1.3.0 release
+
+Apache Iceberg 1.3.0 was released on May 30th, 2023.
+The 1.3.0 release adds a vari

Re: [PR] Phase 1 - New Docs Deployment [iceberg]

2023-10-04 Thread via GitHub


aokolnychyi commented on PR #8659:
URL: https://github.com/apache/iceberg/pull/8659#issuecomment-1747740766

   My primary concern of moving the docs into the main repo was versioning and 
pollution. It seems like `git worktree` should solve that. I deployed this 
locally, it seems pretty straightforward.
   
   Do we need to adjust our `.gitignore`? I did see some extra files when 
deploying locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Phase 1 - New Docs Deployment [iceberg]

2023-10-04 Thread via GitHub


bitsondatadev commented on PR #8659:
URL: https://github.com/apache/iceberg/pull/8659#issuecomment-1747804852

   > My primary concern of moving the docs into the main repo was versioning 
and pollution. It seems like `git worktree` should solve that. I deployed this 
locally, it seems pretty straightforward.
   > 
   > 
   > 
   > Do we need to adjust our `.gitignore`? I did see some extra files when 
deploying locally.
   
   There's still a few changes I need to make that I missed when adapting this 
from my sandbox repo. 
   
   My goal is to not have any artifacts like this remaining once this is 
merged. I want this to be a good baseline to generate the older versions of the 
docs with the consistent mkdocs format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Wap branch does not support reading from the partitions table [iceberg]

2023-10-04 Thread via GitHub


github-actions[bot] commented on issue #7297:
URL: https://github.com/apache/iceberg/issues/7297#issuecomment-1747821140

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Upsert support for keyless Apache Flink tables [iceberg]

2023-10-04 Thread via GitHub


Ge opened a new issue, #8719:
URL: https://github.com/apache/iceberg/issues/8719

   ### Feature Request / Improvement
   
   Consider the following continuous insertion into a keyless table:
   
   ```
   SET 'execution.checkpointing.interval' = '10 s';
   SET 'sql-client.execution.result-mode' = 'tableau';
   SET 'pipeline.max-parallelism' = '5';
   
   
   CREATE CATALOG nessie_catalog WITH (
   'type'='iceberg', 
   'catalog-impl'='org.apache.iceberg.nessie.NessieCatalog', 
   'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
   'uri'='http://catalog:19120/api/v1', 'authentication.type'='none', 
   'client.assume-role.region'='us-east-1',
   'warehouse' = 's3://warehouse', 
   's3.endpoint'='http://127.0.0.1:9000'
   );
   
   USE CATALOG nessie_catalog;
   
   CREATE DATABASE IF NOT EXISTS db;
   USE db;
   
   CREATE TEMPORARY TABLE word_table (
   word STRING
   ) WITH (
   'connector' = 'datagen',
   'fields.word.length' = '1'
   );
   
   CREATE TABLE word_count (
   word STRING,
   cnt BIGINT
   ) PARTITIONED BY (`ptn`) WITH (
   'format-version'='2',
  'write.upsert.enabled'='true'
   );
   
   INSERT INTO word_count SELECT word, COUNT(*) FROM word_table GROUP BY word;
   ```
   
   The output of the `SELECT` query there looks like:
   
   ```
   Flink SQL> SELECT word, COUNT(*) as cnt FROM word_table GROUP BY word LIMIT 
5;
   +++--+
   | op |   word |  cnt |
   +++--+
   | +I |  e |1 |
   | -D |  e |1 |
   | +I |  e |2 |
   | +I |  7 |1 |
   | +I |  5 |1 |
   | -D |  e |2 |
   ...
   ```
   
   
   Now, consider the following query against the `word_count` table:
   
   ```
   Flink SQL> SELECT * FROM word_count LIMIT 10;
   ```
   
   Expected result: latest counts for 10 words.
   
   Actual result:
   ```
   [ERROR] Could not execute SQL statement. Reason:
   java.lang.IllegalStateException: Equality field columns shouldn't be empty 
when configuring to use UPSERT data stream.
   ```
   
   ### Query engine
   
   Flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Null support in Apache Flink [iceberg]

2023-10-04 Thread via GitHub


Ge opened a new issue, #8720:
URL: https://github.com/apache/iceberg/issues/8720

   ### Query engine
   
   Flink 1.17.1
   
   ### Question
   
   According to https://iceberg.apache.org/docs/latest/flink/#flink-to-iceberg, 
Iceberg does not handle Flink's `null`. Can you please describe the cases where 
the `null` support is missing or limited?
   
   I tried a very basic example - inserted a row without specifying values for 
all columns and it worked:
   
   ```
   CREATE TABLE word_count (
   word STRING,
   cnt BIGINT,
   PRIMARY KEY(`word`) NOT ENFORCED
   ) PARTITIONED BY (word) WITH (
   'format-version'='2', 
   'write.upsert.enabled'='true'
   );
   
   INSERT INTO word_count (word) VALUES ('zz');
   
   SELECT * FROM word_count WHERE word = 'zz';
   +++--+
   | op |   word |  cnt |
   +++--+
   | +I | zz ||
   +++--+
   Received a total of 1 row
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Thread.sleep() method is replaced with Awaitility [iceberg]

2023-10-04 Thread via GitHub


nk1506 commented on code in PR #8715:
URL: https://github.com/apache/iceberg/pull/8715#discussion_r1346743526


##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java:
##
@@ -325,7 +328,7 @@ public void testSpecificSnapshotTimestamp() throws 
Exception {
 long snapshot0Timestamp = 
tableResource.table().currentSnapshot().timestampMillis();
 
 // sleep for 2 ms to make sure snapshot1 has a higher timestamp value
-Thread.sleep(2);
+Awaitility.await().atLeast(Duration.ofMillis(2)).until(() -> true);

Review Comment:
   If thread just requires to sleep for some duration , ideally it should be 
like 
   `Awaitility.await().pollDelay(2, TimeUnit.MILLISECONDS).until(() -> 
System.currentTimeMillis()-snapshot0Timestamp>2);`



##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java:
##
@@ -403,7 +406,7 @@ public static List 
waitForResult(CloseableIterator iter, int limit) {
 
   public static void waitUntilJobIsRunning(ClusterClient client) throws 
Exception {

Review Comment:
   remove this method and instead of this it should be something like
   `Awaitility.await().pollDelay(10, TimeUnit.MILLISECONDS).until(() -> 
!getRunningJobs(MINI_CLUSTER_RESOURCE.getClusterClient()).isEmpty());`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Document all metadata tables. [iceberg]

2023-10-04 Thread via GitHub


nk1506 commented on PR #8709:
URL: https://github.com/apache/iceberg/pull/8709#issuecomment-1747996344

   @szehon-ho , Please review and share the feedback. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Document publish_changes procedure [iceberg]

2023-10-04 Thread via GitHub


nk1506 commented on PR #8706:
URL: https://github.com/apache/iceberg/pull/8706#issuecomment-1747996641

   @nastra , please look. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Thread.sleep() method is replaced with Awaitility [iceberg]

2023-10-04 Thread via GitHub


shreyanshR7 commented on PR #8715:
URL: https://github.com/apache/iceberg/pull/8715#issuecomment-1748006574

   Thanks for the suggestion @nk1506 ,I'll update that
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spec: add types timstamp_ns and timestamptz_ns [iceberg]

2023-10-04 Thread via GitHub


jacobmarble commented on PR #8683:
URL: https://github.com/apache/iceberg/pull/8683#issuecomment-1748044374

   @Fokko - friendly reminder to review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Thread.sleep() method is replaced with Awaitility [iceberg]

2023-10-04 Thread via GitHub


shreyanshR7 commented on PR #8715:
URL: https://github.com/apache/iceberg/pull/8715#issuecomment-1748046446

   I've made the changes@nk1506


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Push filtering for Iceberg table type to Hive MetaStore when listing tables [iceberg]

2023-10-04 Thread via GitHub


pvary commented on PR #2722:
URL: https://github.com/apache/iceberg/pull/2722#issuecomment-1748085546

   @mderoy: I am still here to review 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support deletion in Apache Flink [iceberg]

2023-10-04 Thread via GitHub


pvary commented on issue #8718:
URL: https://github.com/apache/iceberg/issues/8718#issuecomment-1748095935

   Is this for a V2 table?
   I have seen deleting rows working using V2 table, Java code with the stream 
API, but I yet to try out SQL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support deletion in Apache Flink [iceberg]

2023-10-04 Thread via GitHub


Ge commented on issue #8718:
URL: https://github.com/apache/iceberg/issues/8718#issuecomment-1748157735

   Yes, this is a V2 table. I added the DDL to the description now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Document publish_changes procedure [iceberg]

2023-10-04 Thread via GitHub


nastra merged PR #8706:
URL: https://github.com/apache/iceberg/pull/8706


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Spark SQL Extensions: Document all stored procedures [iceberg]

2023-10-04 Thread via GitHub


nastra closed issue #1601: Spark SQL Extensions: Document all stored procedures
URL: https://github.com/apache/iceberg/issues/1601


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Dell : Migrate Files using TestRule to Junit5. [iceberg]

2023-10-04 Thread via GitHub


ashutosh-roy commented on code in PR #8707:
URL: https://github.com/apache/iceberg/pull/8707#discussion_r1346869623


##
dell/src/test/java/org/apache/iceberg/dell/mock/ecs/EcsS3MockRule.java:
##
@@ -178,4 +163,16 @@ public String bucket() {
   public String randomObjectName() {
 return "test-" + ID.getAndIncrement() + "-" + UUID.randomUUID();
   }
+
+  @Override
+  public void afterEach(ExtensionContext extensionContext) {
+System.out.println("Bucket removed");
+cleanUp();
+  }
+
+  @Override
+  public void beforeEach(ExtensionContext extensionContext) {
+System.out.println("Bucket removed");

Review Comment:
   @nk1506 if there are no feedbacks, are we good to merge this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Thread.sleep() method is replaced with Awaitility [iceberg]

2023-10-04 Thread via GitHub


nastra commented on PR #8715:
URL: https://github.com/apache/iceberg/pull/8715#issuecomment-1748190639

   The goal of https://github.com/apache/iceberg/issues/7154 is to convert 
`Thread.sleep` usages to Awaitility where it makes sense. We don't want to 
blindly just replace all `Thread.sleep` usage. We need to understand why the 
sleep is used in the first place.
   
   Typically, `Thread.sleep` is used to wait for an async condition to happen. 
Rather than waiting the max sleep time, we want to wait less than that, by 
checking a success condition that indicates we can proceed.
   
   Here is a good example where the test was made more stable by the use of 
Awaitility: 
https://github.com/apache/iceberg/blob/cff2ff9d2f533c8e733324a66f3101eafbc2e932/core/src/test/java/org/apache/iceberg/actions/TestCommitService.java#L99-L102
   
   We wait at most 5 seconds until the given assertion is met. 
   
   For this PR that means a good starting point might be 
https://github.com/apache/iceberg/blob/f9f85e7f2647020ef7e9571f76027c8e55075690/flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java#L153,
 where there's a 1s sleep and then a check. That's the kind of things we want 
to replace with Awaitility


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Spark fails to write into an iceberg table after updating its schema [iceberg]

2023-10-04 Thread via GitHub


paulpaul1076 opened a new issue, #8721:
URL: https://github.com/apache/iceberg/issues/8721

   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Spark fails to write the dataframe with new schema after updating the schema 
of a table:
   
   ```
   import org.apache.iceberg.{CatalogUtil, Schema}
   import org.apache.iceberg.catalog.{Catalog, TableIdentifier}
   import org.apache.iceberg.types.Types
   import org.apache.spark.sql.SparkSession
   
   import java.util.Properties
   
   //https://iceberg.apache.org/docs/latest/nessie/
   object IcebergJobNessie extends App {
 val spark = SparkSession.builder()
   .master("local[*]")
   .appName("iceberg test")
   .config("spark.sql.catalog.nessie", 
"org.apache.iceberg.spark.SparkCatalog")
   .config("spark.sql.extensions", 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions")
   .config("spark.sql.catalog.nessie.catalog-impl", 
"org.apache.iceberg.nessie.NessieCatalog")
   .config("spark.sql.catalog.nessie.ref", "main")
   .config("spark.sql.catalog.nessie.uri", "http://localhost:19120/api/v1";)
   .config("spark.sql.catalog.nessie.s3.endpoint", "***")
   .config("spark.sql.catalog.nessie.s3.access.key", "***")
   .config("spark.sql.catalog.nessie.s3.secret.key", "***")
   .config("spark.sql.defaultCatalog", "nessie")
   .config("spark.hadoop.fs.s3a.endpoint", "***")
   .config("spark.hadoop.fs.s3a.access.key", "***")
   .config("spark.hadoop.fs.s3a.secret.key", "***")
   .config("spark.hadoop.fs.s3a.impl", 
"org.apache.hadoop.fs.s3a.S3AFileSystem")
   .config("spark.hadoop.fs.s3.impl", 
"org.apache.hadoop.fs.s3a.S3AFileSystem")
   .config("spark.sql.catalog.nessie.s3a.path-style-access ", "true")
   .config("spark.sql.catalog.nessie", 
"org.apache.iceberg.spark.SparkCatalog")
   .config("spark.sql.catalog.nessie.warehouse", 
"s3://hdp-temp/iceberg_catalog")
   .getOrCreate()
   
 import spark.implicits._
   
 val options = new java.util.HashMap[String, String]()
 options.put("warehouse", "s3://hdp-temp/iceberg_catalog")
 options.put("ref", "main")
 options.put("uri", "http://localhost:19120/api/v1";)
 val nessieCatalog: Catalog = 
CatalogUtil.loadCatalog("org.apache.iceberg.nessie.NessieCatalog", "nessie", 
options, spark.sparkContext.hadoopConfiguration)
   
 // -PART 1---
 val name = TableIdentifier.of("db_nessie", "schema_evolution15")
 val schema = new Schema(
   Types.NestedField.required(1, "age", Types.IntegerType.get()),
   Types.NestedField.optional(2, "sibling_info",
 Types.ListType.ofOptional(3, Types.StructType.of(
   Types.NestedField.required(4, "age", Types.IntegerType.get()),
   Types.NestedField.optional(5, "name", Types.StringType.get())
 ))
   )
 )
   
 nessieCatalog.createTable(name, schema)
   
 val df = List(
   (1, List(
 SiblingInfo(1, "John"),
 SiblingInfo(2, "Sean"),
 SiblingInfo(3, "Peter"))
   ),
   (12, List(
 SiblingInfo(13, "Ivan"),
 SiblingInfo(11, "Sean")
   )
   )).toDF("age", "sibling_info")
   
 df.writeTo("db_nessie.schema_evolution15").append()
   
 spark.sql("select * from db_nessie.schema_evolution15").show(false)
   
 val table = nessieCatalog.loadTable(name)
 val newIcebergSchema = new Schema(
   Types.NestedField.required(1, "age", Types.IntegerType.get()),
   Types.NestedField.optional(2, "sibling_info",
 Types.ListType.ofOptional(3, Types.StructType.of(
   Types.NestedField.required(4, "age", Types.IntegerType.get()),
   Types.NestedField.optional(5, "name", Types.StringType.get()),
   Types.NestedField.optional(6, "lastName", Types.StringType.get())
 ))
   )
 )
 table.updateSchema()
   .unionByNameWith(newIcebergSchema)
   .commit()
   
 // -PART 2---
 val df2 = List(
   (1, List(
 SiblingInfo2(1, "John", "Johnson"),
 SiblingInfo2(2, "Sean", "Johnson"),
 SiblingInfo2(3, "Peter", "Johnson"))
   ),
   (12, List(
 SiblingInfo2(13, "Ivan", "Johnson"),
 SiblingInfo2(11, "Test", "Johnson")
   )
   )).toDF("age", "sibling_info")
   
 df2.writeTo("db_nessie.schema_evolution15").append()
   
 spark.sql("select * from db_nessie.schema_evolution15").show(false)
   }
   ```
   
   
   The exception is:
   ```
   Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
write incompatible data to table 'spark_catalog1.db.schema_evolution15':
   - Cannot write nullable values to non-null column 'sibling_info.x.age'.
at 
org.apache.spark.