Re: [PR] rewrite v2 tables by skip deletes planning and join deletes data tables [iceberg]

2023-10-12 Thread via GitHub


ajantha-bhat commented on PR #8807:
URL: https://github.com/apache/iceberg/pull/8807#issuecomment-1759043430

   Is running rewrite_position_delete before running rewrite_data_files not 
helping in this scenario? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Adopt to Nessie 0.71.1 release [iceberg]

2023-10-12 Thread via GitHub


ajantha-bhat commented on PR #8798:
URL: https://github.com/apache/iceberg/pull/8798#issuecomment-1759048641

   cc: @dimas-b, @snazy 
   
   Dependabot is raising PRs for Nessie bumps now. This is a follow up for the 
latest bump.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2023-10-12 Thread via GitHub


ajantha-bhat commented on code in PR #8797:
URL: https://github.com/apache/iceberg/pull/8797#discussion_r1356348310


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java:
##
@@ -109,6 +110,8 @@ public InternalRow[] call(InternalRow args) {
 action = checkAndApplyStrategy(action, strategy, sortOrderString, 
table.schema());
   }
 
+  action = checkAndApplyBranch(table, action);

Review Comment:
   yes. Support extracting branch info from table identifer. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Kafka Connect: Initial project setup and event data structures [iceberg]

2023-10-12 Thread via GitHub


ajantha-bhat commented on code in PR #8701:
URL: https://github.com/apache/iceberg/pull/8701#discussion_r1356434493


##
kafka-connect/kafka-connect-events/src/main/java/org/apache/iceberg/connect/events/CommitCompletePayload.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.connect.events;
+
+import java.util.UUID;
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaBuilder;
+
+public class CommitCompletePayload implements Payload {
+
+  private UUID commitId;
+  private Long vtts;
+  private final Schema avroSchema;
+
+  private static final Schema AVRO_SCHEMA =
+  SchemaBuilder.builder()
+  .record(CommitCompletePayload.class.getName())
+  .fields()
+  .name("commitId")
+  .prop(FIELD_ID_PROP, DUMMY_FIELD_ID)
+  .type(UUID_SCHEMA)
+  .noDefault()
+  .name("vtts")
+  .prop(FIELD_ID_PROP, DUMMY_FIELD_ID)
+  .type()
+  .nullable()
+  .longType()
+  .noDefault()
+  .endRecord();
+
+  // Used by Avro reflection to instantiate this class when reading events
+  public CommitCompletePayload(Schema avroSchema) {
+this.avroSchema = avroSchema;
+  }
+
+  public CommitCompletePayload(UUID commitId, Long vtts) {
+this.commitId = commitId;
+this.vtts = vtts;
+this.avroSchema = AVRO_SCHEMA;
+  }
+
+  public UUID commitId() {
+return commitId;
+  }
+
+  public Long vtts() {

Review Comment:
   I think we can also add events.md doc with this PR now.
   https://github.com/tabular-io/iceberg-kafka-connect/blob/main/docs/events.md



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-12 Thread via GitHub


amogh-jahagirdar commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1759179813

   My mistake, yes you can have format version 2 and have copy on write. The 
remaining issue is why you are even seeing delete files if CoW is set. That 
seems to be the fundamental issue here. I'll try and repro that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add ASF DOAP rdf file [iceberg]

2023-10-12 Thread via GitHub


nastra commented on PR #8586:
URL: https://github.com/apache/iceberg/pull/8586#issuecomment-1759247885

   thanks @jbonofre 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add ASF DOAP rdf file [iceberg]

2023-10-12 Thread via GitHub


nastra merged PR #8586:
URL: https://github.com/apache/iceberg/pull/8586


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add ASF DOAP rdf file [iceberg]

2023-10-12 Thread via GitHub


jbonofre commented on PR #8586:
URL: https://github.com/apache/iceberg/pull/8586#issuecomment-1759269677

   Awesome ! Thanks, I'm dealing with the ASF record now ;)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Kafka Connect: Initial project setup and event data structures [iceberg]

2023-10-12 Thread via GitHub


ajantha-bhat commented on code in PR #8701:
URL: https://github.com/apache/iceberg/pull/8701#discussion_r1356710076


##
kafka-connect/kafka-connect-events/src/main/java/org/apache/iceberg/connect/events/CommitCompletePayload.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.connect.events;
+
+import java.util.UUID;
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaBuilder;
+
+public class CommitCompletePayload implements Payload {
+
+  private UUID commitId;
+  private Long vtts;
+  private final Schema avroSchema;
+
+  private static final Schema AVRO_SCHEMA =
+  SchemaBuilder.builder()
+  .record(CommitCompletePayload.class.getName())
+  .fields()
+  .name("commitId")
+  .prop(FIELD_ID_PROP, DUMMY_FIELD_ID)
+  .type(UUID_SCHEMA)
+  .noDefault()
+  .name("vtts")
+  .prop(FIELD_ID_PROP, DUMMY_FIELD_ID)
+  .type()
+  .nullable()
+  .longType()
+  .noDefault()
+  .endRecord();
+
+  // Used by Avro reflection to instantiate this class when reading events
+  public CommitCompletePayload(Schema avroSchema) {
+this.avroSchema = avroSchema;
+  }
+
+  public CommitCompletePayload(UUID commitId, Long vtts) {
+this.commitId = commitId;
+this.vtts = vtts;
+this.avroSchema = AVRO_SCHEMA;
+  }
+
+  public UUID commitId() {
+return commitId;
+  }
+
+  public Long vtts() {

Review Comment:
   We can add this info as javadoc
   
   > VTTS (valid-through timestamp) property indicating through what timestamp 
records have been fully processed, i.e. all records processed from then on will 
have a timestamp greater than the VTTS. This is calculated by taking the 
maximum timestamp of records processed from each topic partition, and taking 
the minimum of these. If any partitions were not processed as part of the 
commit then the VTTS is not set
   
   
https://github.com/tabular-io/iceberg-kafka-connect/blob/main/docs/design.md#snapshot-properties



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Make iceberg an idempotent sink for Spark like delta lake [iceberg]

2023-10-12 Thread via GitHub


paulpaul1076 opened a new issue, #8809:
URL: https://github.com/apache/iceberg/issues/8809

   ### Feature Request / Improvement
   
   Delta lake has an interesting feature which you can read about here: 
https://docs.delta.io/latest/delta-streaming.html#idempotent-table-writes-in-foreachbatch
   And here:
   
![image](https://github.com/apache/iceberg/assets/4533296/f3817344-0337-4b6c-873f-4bb51f0da78a)
   
![image](https://github.com/apache/iceberg/assets/4533296/1c688140-7694-46a1-a31f-e2bf57658d97)
   
   From what I understand, iceberg does not support this, but I think that it 
is a really important feature. Can we add this to iceberg?
   
   I don't think that multi-table transactions will solve this problem, because 
from my understanding foreachBatch commits its offsets after the entire lambda 
function passed to it gets executed, now imagine you have this code with 
multi-table transactions:
   
   ```
 dfStr.writeStream.foreachBatch((df: DataFrame, id: Long) => {
   // create transaction1
   // create transaction2
   // multi_table_commit(transaction1, transaction2)
   // send something to kafka
 }).start().awaitTermination()
 ```
 
From what I understand, if the "send something to kafka" step fails, the 
entire microbatch is re-executed and the multi-table transaction will write the 
same data a second time, which will cause data duplication. At my job, for 
example, we use this kind of logic and we frequently kill our streaming jobs to 
redeploy new code after which we restart them.

So, from my understanding, iceberg is not an idempotent sink and you can't 
expect to have end-to-end exactly once with iceberg?
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add Blogs Related to Hive & Iceberg. [iceberg-docs]

2023-10-12 Thread via GitHub


ayushtkn opened a new pull request, #282:
URL: https://github.com/apache/iceberg-docs/pull/282

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Blogs Related to Hive & Iceberg. [iceberg-docs]

2023-10-12 Thread via GitHub


ayushtkn commented on PR #282:
URL: https://github.com/apache/iceberg-docs/pull/282#issuecomment-1759537789

   Tried building locally to validate, no link is broken, Attaching screenshot
   https://github.com/apache/iceberg-docs/assets/25608848/205e05af-1eb8-41f3-b0e4-7cfb66c80c58";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Adopt to Nessie 0.71.1 release [iceberg]

2023-10-12 Thread via GitHub


dimas-b commented on code in PR #8798:
URL: https://github.com/apache/iceberg/pull/8798#discussion_r1356768108


##
nessie/src/test/java/org/apache/iceberg/nessie/TestCustomNessieClient.java:
##
@@ -78,30 +77,11 @@ public void testNonExistentCustomClient() {
   temp.toUri().toString(),
   CatalogProperties.URI,
   uri,
-  NessieConfigConstants.CONF_NESSIE_CLIENT_BUILDER_IMPL,
-  "non.existent.ClientBuilderImpl"));
-})
-.isInstanceOf(RuntimeException.class)
-.hasMessageContaining("Cannot load Nessie client builder 
implementation class");
-  }
-
-  @Test
-  public void testCustomClientByImpl() {

Review Comment:
   These tests are different, this one uses 
`NessieConfigConstants.CONF_NESSIE_CLIENT_BUILDER_IMPL`, the other one uses 
`NessieConfigConstants.CONF_NESSIE_CLIENT_NAME`, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Adopt to Nessie 0.71.1 release [iceberg]

2023-10-12 Thread via GitHub


ajantha-bhat commented on code in PR #8798:
URL: https://github.com/apache/iceberg/pull/8798#discussion_r1356770832


##
nessie/src/test/java/org/apache/iceberg/nessie/TestCustomNessieClient.java:
##
@@ -78,30 +77,11 @@ public void testNonExistentCustomClient() {
   temp.toUri().toString(),
   CatalogProperties.URI,
   uri,
-  NessieConfigConstants.CONF_NESSIE_CLIENT_BUILDER_IMPL,
-  "non.existent.ClientBuilderImpl"));
-})
-.isInstanceOf(RuntimeException.class)
-.hasMessageContaining("Cannot load Nessie client builder 
implementation class");
-  }
-
-  @Test
-  public void testCustomClientByImpl() {

Review Comment:
   Since CONF_NESSIE_CLIENT_BUILDER_IMPL is deprecated, we need change testcase 
to use CONF_NESSIE_CLIENT_NAME instead of removing. But there is already a 
testcase to do that. So, I guess he removed the testcase of 
CONF_NESSIE_CLIENT_BUILDER_IMPL



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Blogs Related to Hive & Iceberg. [iceberg-docs]

2023-10-12 Thread via GitHub


pvary merged PR #282:
URL: https://github.com/apache/iceberg-docs/pull/282


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Replace `.size() > 0` with `!.isempty()` [iceberg]

2023-10-12 Thread via GitHub


Fokko opened a new issue, #8810:
URL: https://github.com/apache/iceberg/issues/8810

   ### Feature Request / Improvement
   
   Suggestion by IDEA:
   
   
![image](https://github.com/apache/iceberg/assets/1134248/2d0a997e-9693-4283-8fc4-b9471a6fab6c)
   
   I think this is nice because `isEmpty` should be faster. We also have 
different implementations in `PartitionSet.java`:
   
   ```java
 @Override
 public int size() {
   return partitionSetById.values().stream().mapToInt(Set::size).sum();
 }
   
 @Override
 public boolean isEmpty() {
   return partitionSetById.values().stream().allMatch(Set::isEmpty);
 }
   ```
   
   ### Query engine
   
   Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Update roadmap.md [iceberg-docs]

2023-10-12 Thread via GitHub


ajantha-bhat commented on code in PR #272:
URL: https://github.com/apache/iceberg-docs/pull/272#discussion_r1356790636


##
landing-page/content/common/roadmap.md:
##
@@ -22,28 +22,36 @@ disableSidebar: true
 
 # Roadmap Overview
 
-This roadmap outlines projects that the Iceberg community is working on, their 
priority, and a rough size estimate.
-This is based on the latest [community priority 
discussion](https://lists.apache.org/thread.html/r84e80216c259c81f824c6971504c321cd8c785774c489d52d4fc123f%40%3Cdev.iceberg.apache.org%3E).
+This roadmap outlines projects that the Iceberg community is working on.
 Each high-level item links to a Github project board that tracks the current 
status.
 Related design docs will be linked on the planning boards.
 
-# Priority 1
-
-* API: [Iceberg 1.0.0](https://github.com/apache/iceberg/projects/3) [medium]
-* Python: [Pythonic refactor](https://github.com/apache/iceberg/projects/7) 
[medium]
-* Spec: [Z-ordering / Space-filling 
curves](https://github.com/apache/iceberg/projects/16) [medium]
-* Spec: [Snapshot tagging and 
branching](https://github.com/apache/iceberg/projects/4) [small]
-* Views: [Spec](https://github.com/apache/iceberg/projects/6) [medium]
-* Puffin: [Implement statistics information in table 
snapshot](https://github.com/apache/iceberg/pull/4741) [medium]
-* Flink: [FLIP-27 based Iceberg 
source](https://github.com/apache/iceberg/projects/23) [large]
-
-# Priority 2
-
-* ORC: [Support delete files stored as 
ORC](https://github.com/apache/iceberg/projects/13) [small]
-* Spark: [DSv2 streaming 
improvements](https://github.com/apache/iceberg/projects/2) [small]
-* Flink: [Inline file 
compaction](https://github.com/apache/iceberg/projects/14) [small]
-* Flink: [Support UPSERT](https://github.com/apache/iceberg/projects/15) 
[small]
-* Spec: [Secondary indexes](https://github.com/apache/iceberg/projects/17) 
[large]
-* Spec v3: [Encryption](https://github.com/apache/iceberg/projects/5) [large]
-* Spec v3: [Relative paths](https://github.com/apache/iceberg/projects/18) 
[large]
-* Spec v3: [Default field 
values](https://github.com/apache/iceberg/projects/19) [medium]
+# General
+
+* [Multi-table transaction 
support](https://github.com/apache/iceberg/projects/30)
+* [Views Support](https://github.com/apache/iceberg/projects/29)
+* [Change Data Capture (CDC) 
Support](https://github.com/apache/iceberg/projects/26)
+* [Snapshot tagging and 
branching](https://github.com/apache/iceberg/projects/4)
+* [Inline file compaction](https://github.com/apache/iceberg/projects/14)
+* [Delete File compaction](https://github.com/apache/iceberg/projects/10)
+* [Z-ordering / Space-filling 
curves](https://github.com/apache/iceberg/projects/16)
+* [Support UPSERT](https://github.com/apache/iceberg/projects/15)
+

Review Comment:
   Can you please add partition stats? 
   https://github.com/apache/iceberg/projects/31



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Replace Thread.Sleep() usage with org.Awaitility from Tests. [iceberg]

2023-10-12 Thread via GitHub


nk1506 commented on PR #8804:
URL: https://github.com/apache/iceberg/pull/8804#issuecomment-1759573960

   @nastra , Please take a look. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Adapt to Nessie 0.71.1 release [iceberg]

2023-10-12 Thread via GitHub


nk1506 commented on PR #8798:
URL: https://github.com/apache/iceberg/pull/8798#issuecomment-1759595831

   > nit: `Adopt` -> `Adapt` in title?
   > 
   > I believe the removed test case is worth keeping.
   
   Since `CONF_NESSIE_CLIENT_BUILDER_IMPL` has been deprecated, replacing it 
with `CONF_NESSIE_CLIENT_NAME` will make 
[testCustomClientByImpl](https://github.com/apache/iceberg/blob/b5ea0d5a7f55e5b8d9eec8e764bbcc35f8301db3/nessie/src/test/java/org/apache/iceberg/nessie/TestCustomNessieClient.java#L89
 ) and 
[testCustomClientByName](https://github.com/apache/iceberg/blob/b5ea0d5a7f55e5b8d9eec8e764bbcc35f8301db3/nessie/src/test/java/org/apache/iceberg/nessie/TestCustomNessieClient.java#L108)
 duplicate of each other. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: DeleteMarker to mark row as deleted [iceberg]

2023-10-12 Thread via GitHub


Humbedooh closed pull request #2434: Core: DeleteMarker to mark row as deleted
URL: https://github.com/apache/iceberg/pull/2434


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Allow to create external table to access the iceberg table managed in hive catalog [iceberg]

2023-10-12 Thread via GitHub


Humbedooh closed pull request #3539: Hive: Allow to create external table to 
access the iceberg table managed in hive catalog
URL: https://github.com/apache/iceberg/pull/3539


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Add RocksDBStructLikeMap [iceberg]

2023-10-12 Thread via GitHub


Humbedooh closed pull request #2680: Core: Add RocksDBStructLikeMap
URL: https://github.com/apache/iceberg/pull/2680


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Bug when runing SQL with multiple table join. [iceberg]

2023-10-12 Thread via GitHub


Humbedooh closed pull request #3392: Hive: Bug when runing SQL with multiple 
table join.
URL: https://github.com/apache/iceberg/pull/3392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Update the comment content of 'commit.status-check.total-timeout-ms' [iceberg]

2023-10-12 Thread via GitHub


Humbedooh closed pull request #2894: Update the comment content of 
'commit.status-check.total-timeout-ms'
URL: https://github.com/apache/iceberg/pull/2894


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Aliyun: Add iceberg-aliyun document [iceberg]

2023-10-12 Thread via GitHub


Humbedooh closed pull request #3686: Aliyun: Add iceberg-aliyun document
URL: https://github.com/apache/iceberg/pull/3686


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.13 [iceberg]

2023-10-12 Thread via GitHub


dependabot[bot] opened a new pull request, #8811:
URL: https://github.com/apache/iceberg/pull/8811

   Bumps 
[org.springframework:spring-web](https://github.com/spring-projects/spring-framework)
 from 5.3.9 to 6.0.13.
   
   Release notes
   Sourced from https://github.com/spring-projects/spring-framework/releases";>org.springframework:spring-web's
 releases.
   
   v6.0.13
   :star: New Features
   
   Improve diagnostics for negative repeated text count in SpEL https://redirect.github.com/spring-projects/spring-framework/issues/31342";>#31342
   Improve diagnostics when repeated text size calculation results in 
overflow in SpEL https://redirect.github.com/spring-projects/spring-framework/issues/31341";>#31341
   UnknownContentTypeException is not 
Serializable https://redirect.github.com/spring-projects/spring-framework/issues/31283";>#31283
   Reintroduce FastClass in CGLIB class names for 
@Configuration classes https://redirect.github.com/spring-projects/spring-framework/issues/31272";>#31272
   
   :lady_beetle: Bug Fixes
   
   HibernateJpaDialect and 
HibernateExceptionTranslator throw 
SQLExceptionTranslator-provided exception instead of returning it 
https://redirect.github.com/spring-projects/spring-framework/issues/31409";>#31409
   AnnotationScanner scanning leads to StackOverflowError with recursive 
annotation https://redirect.github.com/spring-projects/spring-framework/issues/31400";>#31400
   NamedParameterJdbcTemplate throws unexpected exception for 
null query https://redirect.github.com/spring-projects/spring-framework/issues/31391";>#31391
   HTTP server exchange observations have incorrect UNKNOWN status tag if 
the client disconnected https://redirect.github.com/spring-projects/spring-framework/issues/31388";>#31388
   Breaking change from 6.0.11 to 6.0.12 if you expect query parameters in 
@RequestBody https://redirect.github.com/spring-projects/spring-framework/issues/31327";>#31327
   SpEL's CompoundExpression.toStringAST() omits 
? for null-safe navigation https://redirect.github.com/spring-projects/spring-framework/issues/31326";>#31326
   ConcurrentLruCache no longer supports capacity = 0 https://redirect.github.com/spring-projects/spring-framework/issues/31317";>#31317
   Using R2dbc transactional and non transactional on a database connection 
pool will fail for Oracle. https://redirect.github.com/spring-projects/spring-framework/issues/31268";>#31268
   AOT-generated code no longer set bean class for beans created from a 
@Bean method https://redirect.github.com/spring-projects/spring-framework/issues/31242";>#31242
   CGLIB proxy classes are no longer cached properly https://redirect.github.com/spring-projects/spring-framework/issues/31238";>#31238
   Illegal reflective access in 
ContextOverridingClassLoader.isEligibleForOverriding https://redirect.github.com/spring-projects/spring-framework/issues/31232";>#31232
   Fix RuntimeHintsPredicates matching rules for public/declared elements 
https://redirect.github.com/spring-projects/spring-framework/issues/31224";>#31224
   MultipartParser should respect read position https://redirect.github.com/spring-projects/spring-framework/issues/31110";>#31110
   WebClient reports 'Host is not specified' for URI with hostname and 
port, but without scheme https://redirect.github.com/spring-projects/spring-framework/issues/31033";>#31033
   R2DBC Connection is closed during transaction when using 
TransactionAwareConnectionFactoryProxy https://redirect.github.com/spring-projects/spring-framework/issues/28133";>#28133
   SpEL cannot evaluate or compile expression with null-safe 
void method invocation https://redirect.github.com/spring-projects/spring-framework/issues/27421";>#27421
   LazyResolutionMessage does not implement proper 
toString https://redirect.github.com/spring-projects/spring-framework/issues/21265";>#21265
   
   :notebook_with_decorative_cover: Documentation
   
   Document Kotlin declaration site variance subtleties https://redirect.github.com/spring-projects/spring-framework/issues/31370";>#31370
   Add missing conversionService field in doc example https://redirect.github.com/spring-projects/spring-framework/pull/31330";>#31330
   Clarify documentation on Spring Web MVC pattern comparison https://redirect.github.com/spring-projects/spring-framework/issues/31294";>#31294
   Improved documentation for MethodParameter#getAnnotatedElement https://redirect.github.com/spring-projects/spring-framework/issues/30397";>#30397
   Javadoc for BeanPropertyRowMapper.getColumnValue(ResultSet, int, 
Class) is inconsistent with code https://redirect.github.com/spring-projects/spring-framework/issues/29285";>#29285
   Referencing a @Bean method in a @Configuration 
class' @PostConstruct method leads to circular reference https://redirect.github.com/spring-projects/spring-framework/issues/27876";>#27876
   Incorrect reference information about CGLIB supported method visibility 
https://redirect.github.com/spring-projects/spring-fra

Re: [PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.12 [iceberg]

2023-10-12 Thread via GitHub


dependabot[bot] commented on PR #8734:
URL: https://github.com/apache/iceberg/pull/8734#issuecomment-1759624056

   Superseded by #8811.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.12 [iceberg]

2023-10-12 Thread via GitHub


dependabot[bot] closed pull request #8734: Build: Bump 
org.springframework:spring-web from 5.3.9 to 6.0.12
URL: https://github.com/apache/iceberg/pull/8734


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.24.0 [iceberg]

2023-10-12 Thread via GitHub


dependabot[bot] opened a new pull request, #8812:
URL: https://github.com/apache/iceberg/pull/8812

   Bumps 
[com.palantir.baseline:gradle-baseline-java](https://github.com/palantir/gradle-baseline)
 from 4.42.0 to 5.24.0.
   
   Release notes
   Sourced from https://github.com/palantir/gradle-baseline/releases";>com.palantir.baseline:gradle-baseline-java's
 releases.
   
   5.24.0
   
   
   
   Type
   Description
   Link
   
   
   
   
   Fix
   baseline-exact-dependencies is now far more lazy around 
Configuration creation in order to support Gradle 8.
   https://redirect.github.com/palantir/gradle-baseline/pull/2639";>palantir/gradle-baseline#2639
   
   
   
   5.23.0
   
   
   
   Type
   Description
   Link
   
   
   
   
   Fix
   Use a Proxy for JavaInstallationMetadata so we 
can work across Gradle 7 and 8.
   https://redirect.github.com/palantir/gradle-baseline/pull/2605";>palantir/gradle-baseline#2605
   
   
   
   5.22.0
   Automated release, no documented user facing changes
   5.21.0
   
   
   
   Type
   Description
   Link
   
   
   
   
   Improvement
   Upgrade error-prone to 2.21.1 (from 2.19.1)
   https://redirect.github.com/palantir/gradle-baseline/pull/2628";>palantir/gradle-baseline#2628
   
   
   
   5.20.0
   
   
   
   Type
   Description
   Link
   
   
   
   
   Improvement
   Improve SafeLoggingPropagation on Immutables, taking into account fields 
from superinterfaces
   https://redirect.github.com/palantir/gradle-baseline/pull/2629";>palantir/gradle-baseline#2629
   
   
   
   5.19.0
   
   
   
   Type
   Description
   Link
   
   
   
   
   Improvement
   Prefer InputStream.transferTo(OutputStream)Add error-prone check to automate migration 
to prefer InputStream.transferTo(OutputStream) instead of utility 
methods such as Guava's 
com.google.common.io.ByteStreams.copy(InputStream, 
OutputStream).Allow 
for optimization when underlying input stream (such as 
ByteArrayInputStream, ChannelInputStream) overrides 
transferTo(OutputStream) to avoid extra array allocations and copy 
larger chunks at a time (e.g. allowing 16KiB chunks via 
ApacheHttpClientBlockingChannel.ModulatingOutputStream from https://redirect.github.com/palantir/gradle-baseline/issues/1790";>#1790).When running on JDK 21+, this 
also enables 16KiB byte chunk copies via 
InputStream.transferTo(OutputStream) perhttps://bugs.openjdk.org/browse/JDK-8299336";>JDK-8299336, where as on 
JDK Closes https://redirect.github.com/palantir/gradle-baseline/issues/2615";>palantir/gradle-baseline#2615
   https://redirect.github.com/palantir/gradle-baseline/pull/2615";>palantir/gradle-baseline#2615,
 https://redirect.github.com/palantir/gradle-baseline/pull/2616";>palantir/gradle-baseline#2616
   
   
   
   5.18.0
   Automated release, no documented user facing changes
   5.17.0
   
   
   
   Type
   Description
   Link
   
   
   
   
   Feature
   Add error-prone check to prefer ZoneId constants
   https://redirect.github.com/palantir/gradle-baseline/pull/2596";>palantir/gradle-baseline#2596
   
   
   
   5.16.0
   
   
   
   Type
   Description
   Link
   
   
   
   
   Fix
   Fix nullaway checkerframework dependency
   https://redirect.github.com/palantir/gradle-baseline/pull/2602";>palantir/gradle-baseline#2602
   
   
   
   5.15.0
   Automated release, no documented user facing changes
   5.14.0
   
   
   
   Type
   Description
   Link
   
   
   
   
   Fix
   Fix unintentional suppression of StrictUnusedVariable
   https://redirect.github.com/palantir/gradle-baseline/pull/2599";>palantir/gradle-baseline#2599
   
   
   
   5.13.0
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/palantir/gradle-baseline/commit/2381f401fb0050eae77a10f27397ad5835cdd3ca";>2381f40
 Autorelease 5.24.0
   https://github.com/palantir/gradle-baseline/commit/f6e43095b61de662685810defa070661dad8ee08";>f6e4309
 Lazy exact dependencies (https://redirect.github.com/palantir/gradle-baseline/issues/2639";>#2639)
   https://github.com/palantir/gradle-baseline/commit/642e1e3d675ca49517113b70f73144144914ba0e";>642e1e3
 Autorelease 5.23.0
   https://github.com/palantir/gradle-baseline/commit/437d3920294ba82c8296acf844ddfa866df1662c";>437d392
 Use a Proxy for JavaInstallationMetadata so we can 
work across Gradle 7 a...
   https://github.com/palantir/gradle-baseline/commit/777e9f7e69a3b21daa5d2b8a70267dbb38bc8f4c";>777e9f7
 Excavator:  Upgrade buildscript dependencies (https://redirect.github.com/palantir/gradle-baseline/issues/2638";>#2638)
   https://github.com/palantir/gradle-baseline/commit/1eefbf55c3a2ced9af0785014a2eb9a2396a4b9a";>1eefbf5
 Excavator:  Upgrades Baseline to the latest version (https://redirect.github.com/palantir/gradle-baseline/issues/2637";>#2637)
   https://github.com/palantir/gradle-baseline/commit/dca88ac7faf672d9ab5d941061cd155fcf6fdf0c";>dca88ac
 Allow incubating method use inside other incubating methods (https://redirect.github.com/palantir/gradle-baseline/issues/2636";>#2636)

Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.22.0 [iceberg]

2023-10-12 Thread via GitHub


dependabot[bot] commented on PR #8778:
URL: https://github.com/apache/iceberg/pull/8778#issuecomment-1759625459

   Superseded by #8812.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.22.0 [iceberg]

2023-10-12 Thread via GitHub


dependabot[bot] closed pull request #8778: Build: Bump 
com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.22.0
URL: https://github.com/apache/iceberg/pull/8778


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Rename master branch to main [iceberg]

2023-10-12 Thread via GitHub


jbonofre commented on PR #8722:
URL: https://github.com/apache/iceberg/pull/8722#issuecomment-1759631883

   `master` branch has been renamed to `main`. 
   
   @Fokko @nastra if you can merge this PR when you have time, it would be 
great. Thanks !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Rename master branch to main [iceberg]

2023-10-12 Thread via GitHub


Fokko merged PR #8722:
URL: https://github.com/apache/iceberg/pull/8722


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Rename master branch to main [iceberg]

2023-10-12 Thread via GitHub


Fokko commented on PR #8722:
URL: https://github.com/apache/iceberg/pull/8722#issuecomment-1759635303

   Thanks @jbonofre for taking the lead on this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Make iceberg an idempotent sink for Spark like delta lake [iceberg]

2023-10-12 Thread via GitHub


paulpaul1076 commented on issue #8809:
URL: https://github.com/apache/iceberg/issues/8809#issuecomment-1759723529

   @RussellSpitzer provided this code which achieves the same:
   
   ```
   foreachBatch (batch_df, batch_id) => {
 val lastBatch = 
Spark3Util.loadIcebergTable(spark,"db.timezoned").currentSnapshot().summary()(STREAMID)
 
  if (batch_id > lastBatch) {
  batch_df.writeTo(...).option("snapshot-property."+STREAMID, 
batch_id).append
  }
  
   }
   ```
   
   But I wonder if delta lake if faster here, because I assume that this 
metadata lookup  
`Spark3Util.loadIcebergTable(spark,"db.timezoned").currentSnapshot().summary()(STREAMID)`
 goes to S3?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Replace `.size() > 0` with `!.isempty()` [iceberg]

2023-10-12 Thread via GitHub


Fokko commented on issue #8810:
URL: https://github.com/apache/iceberg/issues/8810#issuecomment-1759954274

   There are quite a few:
   
   ```
   ./core/src/main/java/org/apache/iceberg/BaseDistributedDataScan.java:
boolean mayHaveEqualityDeletes = deleteManifests.size() > 0 && 
mayHaveEqualityDeletes(snapshot);
   ./core/src/main/java/org/apache/iceberg/util/PartitionUtil.java:  if 
(partitionType.fields().size() > 0) {
   ./core/src/main/java/org/apache/iceberg/TableMetadata.java:  || 
(discardChanges && changes.size() > 0)
   ./core/src/main/java/org/apache/iceberg/io/DeleteWriteResult.java:return 
referencedDataFiles != null && referencedDataFiles.size() > 0;
   ./core/src/main/java/org/apache/iceberg/SnapshotSummary.java:
setIf(changedPartitions.size() > 0, builder, PARTITION_SUMMARY_PROP, "true");
   ./core/src/main/java/org/apache/iceberg/FastAppend.java:if (newManifests 
== null && newFiles.size() > 0) {
   
./core/src/main/java/org/apache/iceberg/actions/RewritePositionDeletesGroup.java:
Preconditions.checkArgument(tasks.size() > 0, "Tasks must not be empty");
   ./core/src/main/java/org/apache/iceberg/actions/BaseCommitService.java:  
while (running.get() || completedRewrites.size() > 0 || 
inProgressCommits.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/actions/BaseCommitService.java:  
  if (!running.get() && completedRewrites.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/actions/BaseCommitService.java:
boolean writingComplete = !running.get() && completedRewrites.size() > 0;
   ./core/src/main/java/org/apache/iceberg/view/ViewMetadata.java:  
Preconditions.checkArgument(versions.size() > 0, "Invalid view: no versions 
were added");
   ./core/src/main/java/org/apache/iceberg/PositionDeletesTable.java:if 
(partitionType.fields().size() > 0) {
   ./core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java:  if 
(deletedDataFiles.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
return deletePaths.size() > 0
   ./core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
|| dropPartitions.size() > 0;
   ./core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:if 
(dropPartitions.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:} 
else if (deletePaths.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/ReachableFileUtil.java:if 
(metadataLogEntries.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
return newDataFiles.size() > 0;
   ./core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
return newDeleteFilesBySpec.size() > 0;
   ./core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:if 
(newDataFiles.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:if 
(hasNewDeleteFiles && cachedNewDeleteManifests.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java:if 
(replacedDataFiles.size() > 0) {
   ./core/src/main/java/org/apache/iceberg/ContentFileParser.java:return 
partitionData != null && partitionData.size() > 0;
   ./core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java:   
 if (snapshot.deleteManifests(table().io()).size() > 0) {
   
./aliyun/src/test/java/org/apache/iceberg/aliyun/oss/mock/AliyunOSSMockLocalStore.java:
return buckets.size() > 0 ? buckets.get(0) : null;
   
./mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:  
  if (dataFiles.size() > 0) {
   
./mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:  
  if (dataFiles.size() > 0) {
   
./flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java: 
 if (equalityFieldColumns != null && equalityFieldColumns.size() > 0) {
   
./flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java:
  gateways[subtaskIndex].size() > 0,
   
./flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java: 
 if (equalityFieldColumns != null && equalityFieldColumns.size() > 0) {
   
./flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java: 
 if (equalityFieldColumns != null && equalityFieldColumns.size() > 0) {
   
./delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
if (filesToAdd.size() > 0 && filesToRemove.size() > 0) {
   
./delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
} else if (filesToAdd.size() > 0) {
   
./delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
} else if (filesToRemove.size() > 0) {
   ./hive3/src/main/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java:
return hasBase || deltas.size() > 0;
   ./api/src/main/java/org/apache/iceberg/expressions/

Re: [PR] [ISSUE #8810] replaced .size() > 0 with isEmpty() [iceberg]

2023-10-12 Thread via GitHub


Fokko commented on PR #8813:
URL: https://github.com/apache/iceberg/pull/8813#issuecomment-1759955441

   Thanks for opening this PR @PickBas. There are a couple more in the 
codebase. What do you think of doing a PR per module? So we keep it manageable. 
In this case everything in `core/*`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] [ISSUE #8810] replaced .size() > 0 with isEmpty() [iceberg]

2023-10-12 Thread via GitHub


PickBas commented on PR #8813:
URL: https://github.com/apache/iceberg/pull/8813#issuecomment-1759958867

   @Fokko Sure, will be done. PR per module works for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Replace `.size() > 0` with `!.isempty()` [iceberg]

2023-10-12 Thread via GitHub


PickBas commented on issue #8810:
URL: https://github.com/apache/iceberg/issues/8810#issuecomment-1759962395

   @Fokko Will be done. Could you assign the issue to me, if you don't mind?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] struct value design [iceberg-rust]

2023-10-12 Thread via GitHub


ZENOTME opened a new issue, #77:
URL: https://github.com/apache/iceberg-rust/issues/77

   Use lookup will make the memory cost if we have multiple struct with same 
type.
   
   One solve way is to use `Arc` in Struct. I try this design in 
https://github.com/icelake-io/icelake/pull/136. 
   
   Anyway I think it's ok to solve this Problem in another PR. If this desgin 
looks work, I'm glad to port it.
   
   _Originally posted by @ZENOTME in 
https://github.com/apache/iceberg-rust/pull/20#discussion_r1282598302_
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] struct value design [iceberg-rust]

2023-10-12 Thread via GitHub


ZENOTME commented on issue #77:
URL: https://github.com/apache/iceberg-rust/issues/77#issuecomment-1759997715

   I find that our struct value didn't include type info. Do we want include 
type info in it?
   
   1. If we include info in struct, the struct value may look like 
   
   ```
   struct Struct {
  type: Arc
  ...
   }
   ```
   
   - The benefit of it is we can look up field info directly by struct.
   - The drawback is that extra 8 bytes pointer cost.
   
   2. Another solution is pass struct type as another parameter when we need 
it, e.g.
   ```
   fn write(struct_value: Struct,struct_type: StructType)
   ```
   
   - The benefit of it is save memory cost.
   - The drawback is that I'm not sure whether the struct type is hard 
available in some case .
   
   For now, I'm working on serialize/deserialize value. And both process seem 
can solve by second way. (pass a struct type as parameter)
   
   But I'm not sure whether there is some scenario we must include info in 
struct value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Replace `.size() > 0` with `!.isempty()` [iceberg]

2023-10-12 Thread via GitHub


PickBas commented on issue #8810:
URL: https://github.com/apache/iceberg/issues/8810#issuecomment-1760006978

   @Fokko I have changed everywhere in core/* module from `size()` to 
`isEmpty()` except _ContentFileParser.java_. In order to move away from 
`.size() > 0` it is required to add the `isEmpty()` method to the `StructLike` 
interface. This interface has 29 implementations. Do I need to add the 
`isEmpty()` method to the aforementioned interface or leave it as is?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use ParallelIterable in Deletes::toPositionIndex (6387) [iceberg]

2023-10-12 Thread via GitHub


wypoon commented on PR #8805:
URL: https://github.com/apache/iceberg/pull/8805#issuecomment-1760074698

   @rdblue @aokolnychyi as @rbalamohan indicated that he's not working on 
https://github.com/apache/iceberg/pull/6432 anymore, I have taken it up here. I 
rebased it on master and resolved the conflicts, moving the configuration from 
`SystemProperties` to the new `SystemConfigs`, changed the default for the pool 
size to be the same as for the existing worker pool, and the tests are green.
   @aokolnychyi I didn't see 
https://github.com/apache/iceberg/pull/6432#issuecomment-1758777424 until I 
have put this up. I didn't know that you're also working on this in 
https://github.com/apache/iceberg/pull/8755. I'll take a look at that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Update roadmap.md [iceberg-docs]

2023-10-12 Thread via GitHub


bitsondatadev commented on PR #272:
URL: https://github.com/apache/iceberg-docs/pull/272#issuecomment-1760168587

   Hey all, I'm looping back to this today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Update roadmap.md [iceberg-docs]

2023-10-12 Thread via GitHub


bitsondatadev commented on code in PR #272:
URL: https://github.com/apache/iceberg-docs/pull/272#discussion_r1357238803


##
landing-page/content/common/roadmap.md:
##
@@ -22,28 +22,36 @@ disableSidebar: true
 
 # Roadmap Overview
 
-This roadmap outlines projects that the Iceberg community is working on, their 
priority, and a rough size estimate.
-This is based on the latest [community priority 
discussion](https://lists.apache.org/thread.html/r84e80216c259c81f824c6971504c321cd8c785774c489d52d4fc123f%40%3Cdev.iceberg.apache.org%3E).
+This roadmap outlines projects that the Iceberg community is working on.
 Each high-level item links to a Github project board that tracks the current 
status.
 Related design docs will be linked on the planning boards.
 
-# Priority 1
-
-* API: [Iceberg 1.0.0](https://github.com/apache/iceberg/projects/3) [medium]
-* Python: [Pythonic refactor](https://github.com/apache/iceberg/projects/7) 
[medium]
-* Spec: [Z-ordering / Space-filling 
curves](https://github.com/apache/iceberg/projects/16) [medium]
-* Spec: [Snapshot tagging and 
branching](https://github.com/apache/iceberg/projects/4) [small]
-* Views: [Spec](https://github.com/apache/iceberg/projects/6) [medium]
-* Puffin: [Implement statistics information in table 
snapshot](https://github.com/apache/iceberg/pull/4741) [medium]
-* Flink: [FLIP-27 based Iceberg 
source](https://github.com/apache/iceberg/projects/23) [large]
-
-# Priority 2
-
-* ORC: [Support delete files stored as 
ORC](https://github.com/apache/iceberg/projects/13) [small]
-* Spark: [DSv2 streaming 
improvements](https://github.com/apache/iceberg/projects/2) [small]
-* Flink: [Inline file 
compaction](https://github.com/apache/iceberg/projects/14) [small]
-* Flink: [Support UPSERT](https://github.com/apache/iceberg/projects/15) 
[small]
-* Spec: [Secondary indexes](https://github.com/apache/iceberg/projects/17) 
[large]
-* Spec v3: [Encryption](https://github.com/apache/iceberg/projects/5) [large]
-* Spec v3: [Relative paths](https://github.com/apache/iceberg/projects/18) 
[large]
-* Spec v3: [Default field 
values](https://github.com/apache/iceberg/projects/19) [medium]
+# General
+
+* [Multi-table transaction 
support](https://github.com/apache/iceberg/projects/30)
+* [Views Support](https://github.com/apache/iceberg/projects/29)
+* [Change Data Capture (CDC) 
Support](https://github.com/apache/iceberg/projects/26)
+* [Snapshot tagging and 
branching](https://github.com/apache/iceberg/projects/4)
+* [Inline file compaction](https://github.com/apache/iceberg/projects/14)
+* [Delete File compaction](https://github.com/apache/iceberg/projects/10)
+* [Z-ordering / Space-filling 
curves](https://github.com/apache/iceberg/projects/16)
+* [Support UPSERT](https://github.com/apache/iceberg/projects/15)
+

Review Comment:
   Will do!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spec: Clarify spec_id field in Data File [iceberg]

2023-10-12 Thread via GitHub


Fokko commented on code in PR #8730:
URL: https://github.com/apache/iceberg/pull/8730#discussion_r1357291336


##
format/spec.md:
##
@@ -443,13 +443,13 @@ The schema of a manifest file is a struct called 
`manifest_entry` with the follo
 | _optional_ | _optional_ | **`132  split_offsets`**  | `list<133: 
long>`| Split offsets for the data file. For example, all row group 
offsets in a Parquet file. Must be sorted ascending |
 || _optional_ | **`135  equality_ids`**   | `list<136: 
int>` | Field ids used to determine row equality in equality delete 
files. Required when `content=2` and should be null otherwise. Fields with ids 
listed in this column must be present in the delete file |
 | _optional_ | _optional_ | **`140  sort_order_id`**  | `int`  
  | ID representing sort order for this file [3]. |
-
+| _optional_ | _optional_ | **`141  spec_id`**| `int`  
  | ID representing partition spec for this file [4]. |

Review Comment:
   How about keeping it blank? This means that the field should not be written.
   ```suggestion
   | | | **`141  spec_id`**| `int`| ID 
representing partition spec for this file [4]. |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: flink/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-12 Thread via GitHub


Fokko commented on code in PR #8819:
URL: https://github.com/apache/iceberg/pull/8819#discussion_r1357327064


##
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsCoordinator.java:
##
@@ -340,7 +340,7 @@ private void unregisterSubtaskGateway(int subtaskIndex, int 
attemptNumber) {
 
 private OperatorCoordinator.SubtaskGateway getSubtaskGateway(int 
subtaskIndex) {
   Preconditions.checkState(
-  gateways[subtaskIndex].size() > 0,
+  !gateways[subtaskIndex].isEmpty(),

Review Comment:
   ```suggestion
 !gateways[subtaskIndex].isEmpty(),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] rewrite v2 tables by skipping deletes planning and join deletes data tables [iceberg]

2023-10-12 Thread via GitHub


singhpk234 commented on PR #8807:
URL: https://github.com/apache/iceberg/pull/8807#issuecomment-1760327041

   interesting this is an approach which impala folks took too : 
   - 
https://docs.google.com/document/d/1WF_UOanQ61RUuQlM4LaiRWI0YXpPKZ2VEJ8gyJdDyoY/edit#
   
   wondering if we could benefit from reads in general as well ? Also do you 
have more crisp benchmarks demonstrating this would benefit always ?
   
   have you tried the caching of delete files on executor solution which 
@aokolnychyi is working on and integrating with it ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Fix column rename doc example to reflect correct API [iceberg-python]

2023-10-12 Thread via GitHub


Fokko merged PR #59:
URL: https://github.com/apache/iceberg-python/pull/59


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-10-12 Thread via GitHub


Fokko opened a new pull request, #61:
URL: https://github.com/apache/iceberg-python/pull/61

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Make `next_sequence_number` private [iceberg-python]

2023-10-12 Thread via GitHub


Fokko opened a new pull request, #62:
URL: https://github.com/apache/iceberg-python/pull/62

   We should only use this in the table module.
   
   Follow up on 
https://github.com/apache/iceberg-python/pull/60#discussion_r1355656751


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Fix specific field values treated as unequal while comparing rows for carry-over removal [iceberg]

2023-10-12 Thread via GitHub


flyrain commented on PR #8799:
URL: https://github.com/apache/iceberg/pull/8799#issuecomment-1760415613

   I will take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] De-Duping Rows While Compacting [iceberg]

2023-10-12 Thread via GitHub


dramaticlly commented on issue #8702:
URL: https://github.com/apache/iceberg/issues/8702#issuecomment-1760455077

   data compaction only change physical files layout but not the data visible 
to users. Consider you originally have 1000 records with 10 duplicates, after 
deduplication it would be 990 records and also file layout change, I think 
deduplication (with ability to identify the row based on primary key or unique 
row identifier) probably need its own action/procedure instead of rely on data 
compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Add outputFile() for FileAppender [iceberg]

2023-10-12 Thread via GitHub


github-actions[bot] commented on issue #7231:
URL: https://github.com/apache/iceberg/issues/7231#issuecomment-1760563302

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Add outputFile() for FileAppender [iceberg]

2023-10-12 Thread via GitHub


github-actions[bot] closed issue #7231: Add outputFile() for FileAppender
URL: https://github.com/apache/iceberg/issues/7231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] rewrite v2 tables by skipping deletes planning and join deletes data tables [iceberg]

2023-10-12 Thread via GitHub


zinking commented on PR #8807:
URL: https://github.com/apache/iceberg/pull/8807#issuecomment-1760653401

   
   > wondering if we could benefit from reads in general as well ? 
   
   yep, like mentioned in the distributed planning work: when metadata becomes 
big, hand crafted parallel code is no longer optimal. if reads are planned 
optimally these delete files would be read concurrently instead of what we have 
now. 
   
   > Also do you have more crisp benchmarks demonstrating this would benefit 
always ?
   
   I don't think this benefit always, it's easy to imagine that when there are 
only a couple of delete files, join would certainly not outperform. but when 
metadata becomes larger, it would always benefit as in theory file reads 
decreased. 
   
   I don't have more numbers at the moment, and the benchmark above isn't fully 
optimized. 
   
   > have you tried the caching of delete files on executor solution which 
@aokolnychyi is working on and integrating with it ?
   
   not yet
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] struct value design [iceberg-rust]

2023-10-12 Thread via GitHub


ZENOTME commented on issue #77:
URL: https://github.com/apache/iceberg-rust/issues/77#issuecomment-1760655157

   cc @JanKaul @Fokko @Xuanwo  @liurenjie1024


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] How to read data in the order in which files are commited? [iceberg]

2023-10-12 Thread via GitHub


MarsKT commented on issue #8802:
URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1760659443

   > Thanks @pvary, I have a maybe naive question @MarsKT
   > 
   > > want the data read from iceberg to be in the same order every time.
   > 
   > Can I ask what's driving this need?
   
   The users of a data analytics software that utilizes a data lake as its 
storage layer desire consistent data ordering when viewing the data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] [BUG] string row filter ignore 2nd (and onwards And) [iceberg-python]

2023-10-12 Thread via GitHub


puchengy opened a new issue, #64:
URL: https://github.com/apache/iceberg-python/issues/64

   ### Apache Iceberg version
   
   None
   
   ### Please describe the bug 🐞
   
   ```
   tasks = table.scan(row_filter="dt='2023-08-20' AND view_type=1 AND hr='00' 
").plan_files()
   ```
   Only filter for `dt` and `view_type` is taken care of, but `hr` is not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] How to read data in the order in which files are commited? [iceberg]

2023-10-12 Thread via GitHub


Zhanxiao-Ma commented on issue #8802:
URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1760700741

   > Currently there is no way to order the scan task. The planning side 
specifically makes sure that even the planning could be done by parallel 
threads (reading manifests files parallel)
   > 
   > Sometimes we need to do similar thing in Flink Source, and we ended up 
creating our own comparator for this which compares Iceberg splits (which are a 
wrapper above ScanTasks).
   > 
   > You can do something similar like this in java code with one serious 
caveat: For a big table you might not want/able to keep all of the tasks in 
memory, which is needed for sorting. What we do in flink is limit the number of 
snapshots to read once.
   > 
   > I hope this helps, Peter
   
   
   
   > Currently there is no way to order the scan task. The planning side 
specifically makes sure that even the planning could be done by parallel 
threads (reading manifests files parallel)
   > 
   > Sometimes we need to do similar thing in Flink Source, and we ended up 
creating our own comparator for this which compares Iceberg splits (which are a 
wrapper above ScanTasks).
   > 
   > You can do something similar like this in java code with one serious 
caveat: For a big table you might not want/able to keep all of the tasks in 
memory, which is needed for sorting. What we do in flink is limit the number of 
snapshots to read once.
   > 
   > I hope this helps, Peter
   
   
   
   > Currently there is no way to order the scan task. The planning side 
specifically makes sure that even the planning could be done by parallel 
threads (reading manifests files parallel)
   > 
   > Sometimes we need to do similar thing in Flink Source, and we ended up 
creating our own comparator for this which compares Iceberg splits (which are a 
wrapper above ScanTasks).
   > 
   > You can do something similar like this in java code with one serious 
caveat: For a big table you might not want/able to keep all of the tasks in 
memory, which is needed for sorting. What we do in flink is limit the number of 
snapshots to read once.
   > 
   > I hope this helps, Peter
   
   > 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] How to read data in the order in which files are commited? [iceberg]

2023-10-12 Thread via GitHub


Zhanxiao-Ma closed issue #8802: How to read data in the order in which files 
are commited?
URL: https://github.com/apache/iceberg/issues/8802


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] How to read data in the order in which files are commited? [iceberg]

2023-10-12 Thread via GitHub


Zhanxiao-Ma opened a new issue, #8802:
URL: https://github.com/apache/iceberg/issues/8802

   ### Query engine
   
   Iceberg java api(Version 0.14.1)
   
   ### Question
   
   I want the data read from iceberg to be in the same order every time. But I 
can't find an attribute that would make FileScanTask ordered. Is there a way I 
can implement it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] How to read data in the order in which files are commited? [iceberg]

2023-10-12 Thread via GitHub


Zhanxiao-Ma commented on issue #8802:
URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1760704067

   > Sometimes we need to do similar thing in Flink Source, and we ended up 
creating our own comparator for this which compares Iceberg splits (which are a 
wrapper above ScanTasks).
   
   I'm sorry, I didn't quite understand this point. Could you please explain it 
in more detail?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-12 Thread via GitHub


liurenjie1024 opened a new pull request, #78:
URL: https://github.com/apache/iceberg-rust/pull/78

   In this pr we add initial support for rest, which finished simple rest apis. 
   
   Complex apis such as create table, update table, commits which be added in 
following pr so that we can make each pr's size reasonable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-12 Thread via GitHub


liurenjie1024 commented on PR #78:
URL: https://github.com/apache/iceberg-rust/pull/78#issuecomment-1760707832

   cc @JanKaul @Xuanwo @Fokko @ZENOTME  PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-12 Thread via GitHub


Xuanwo commented on code in PR #78:
URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1357725856


##
crates/iceberg/src/catalog/rest.rs:
##
@@ -0,0 +1,912 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This module contains rest catalog implementation.
+
+use std::collections::HashMap;
+
+use async_trait::async_trait;
+use reqwest::header::{self, HeaderMap, HeaderName, HeaderValue};
+use reqwest::{Client, Request};
+use serde::de::DeserializeOwned;
+use urlencoding::encode;
+
+use crate::error::Result;
+use crate::table::Table;
+use crate::{
+Catalog, Error, ErrorKind, Namespace, NamespaceIdent, TableCommit, 
TableCreation, TableIdent,
+};
+
+use self::_serde::{
+CatalogConfig, ErrorModel, ErrorResponse, ListNamespaceResponse, 
ListTableResponse,
+NamespaceSerde, RenameTableRequest, NO_CONTENT, OK,
+};
+
+const ICEBERG_REST_SPEC_VERSION: &str = "0.14.1";
+const PATH_V1: &str = "v1";
+
+#[derive(Debug, Builder)]
+pub struct RestCatalogConfig {
+uri: String,
+#[builder(default)]
+warehouse: Option,
+
+#[builder(default)]
+props: HashMap,
+}
+
+impl RestCatalogConfig {
+fn config_endpoint(&self) -> String {
+[&self.uri, PATH_V1, "config"].join("/")
+}
+
+fn namespaces_endpoint(&self) -> String {
+[&self.uri, PATH_V1, "namespaces"].join("/")
+}
+
+fn namespace_endpoint(&self, ns: &NamespaceIdent) -> Result {
+Ok([&self.uri, PATH_V1, "namespaces", &ns.encode_in_url()?].join("/"))
+}
+
+fn tables_endpoint(&self, ns: &NamespaceIdent) -> Result {
+Ok([
+&self.uri,
+PATH_V1,
+"namespaces",
+&ns.encode_in_url()?,
+"tables",
+]
+.join("/"))
+}
+
+fn rename_table_endpoint(&self) -> Result {
+Ok([&self.uri, PATH_V1, "tables", "rename"].join("/"))
+}
+
+fn table_endpoint(&self, table: &TableIdent) -> Result {
+Ok([
+&self.uri,
+PATH_V1,
+"namespaces",
+&table.namespace.encode_in_url()?,
+"tables",
+encode(&table.name).as_ref(),
+]
+.join("/"))
+}
+
+fn try_create_rest_client(&self) -> Result {
+//TODO: We will add oauth, ssl config, sigv4 later
+let mut headers = HeaderMap::new();
+headers.insert(
+header::CONTENT_TYPE,
+HeaderValue::from_static("application/json"),
+);
+headers.insert(
+HeaderName::from_static("x-client-version"),
+HeaderValue::from_static(ICEBERG_REST_SPEC_VERSION),
+);
+headers.insert(
+header::USER_AGENT,
+HeaderValue::from_str(&format!("iceberg-rs/{}", 
env!("CARGO_PKG_VERSION"))).unwrap(),
+);
+
+Ok(HttpClient(
+Client::builder().default_headers(headers).build()?,
+))
+}
+}
+
+impl NamespaceIdent {
+/// Returns url encoded format.
+pub fn encode_in_url(&self) -> Result {
+if self.0.is_empty() {

Review Comment:
   It's better to ensure that `NamespaceIdent` is valid so that we don't have 
to check it when using it. This change can removes a lot of `Result` in 
related APIs.



##
crates/iceberg/src/catalog/rest.rs:
##
@@ -0,0 +1,912 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This module contains rest catalog implementation.
+
+use std::collection

Re: [I] struct value design [iceberg-rust]

2023-10-12 Thread via GitHub


liurenjie1024 commented on issue #77:
URL: https://github.com/apache/iceberg-rust/issues/77#issuecomment-1760723094

   > Another solution is pass struct type as another parameter when we need it, 
e.g.
   
   I prefer this approach. It's weird for me to store types with values, and we 
can always to attach type to it when necessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-12 Thread via GitHub


liurenjie1024 commented on code in PR #78:
URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1357732820


##
crates/iceberg/Cargo.toml:
##
@@ -41,20 +41,24 @@ either = "1"
 futures = "0.3"
 itertools = "0.11"
 lazy_static = "1"
+log = "^0.4"
 murmur3 = "0.5.2"
 once_cell = "1"
 opendal = "0.40"
 ordered-float = "4.0.0"
+reqwest = { version = "^0.11", features = ["json"] }

Review Comment:
   Maybe we should make it a feature? I'm not sure if it deserves another 
crate. cc @JanKaul  @ZENOTME How do you guys think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-12 Thread via GitHub


Xuanwo commented on code in PR #78:
URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1357738084


##
crates/iceberg/Cargo.toml:
##
@@ -41,20 +41,24 @@ either = "1"
 futures = "0.3"
 itertools = "0.11"
 lazy_static = "1"
+log = "^0.4"
 murmur3 = "0.5.2"
 once_cell = "1"
 opendal = "0.40"
 ordered-float = "4.0.0"
+reqwest = { version = "^0.11", features = ["json"] }

Review Comment:
   The problem with the features is that they are add-only, making it difficult 
for users to disable them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] rewrite_position_delete_files leads to error [iceberg]

2023-10-12 Thread via GitHub


atifiu commented on issue #8045:
URL: https://github.com/apache/iceberg/issues/8045#issuecomment-1760787847

   @szehon-ho Thanks for the fix. I am facing the same issue on iceberg 1.3.0 
while trying to remove delete files using proc `rewrite_position_delete_files` 
. Reason why I have remove delete files is the fact that aggregate pushdown is 
failing with this error message `SparkScanBuilder: Skipping aggregate pushdown: 
detected row level deletes`. 
https://github.com/apache/iceberg/pull/6252#issuecomment-1757848584 
   And I am still not sure how delete files were created when I have defined 
Merge on Read for dml operations.
   https://github.com/apache/iceberg/pull/6252#issuecomment-1758873680
   
   So my questions to you is how can we remove delete files if we are still 
using 1.3.0 ? Is it somehow possible to manually remove reference of delete 
files without corrupting the metadata ? Thanks for your help.
   
   
   ```
   23/10/13 00:16:56 ERROR RewritePositionDeleteFilesSparkAction: Failure 
during rewrite group FileGroupInfo{globalIndex=1, partitionIndex=1, 
partition=org.apache.iceberg.util.StructProjection@3162902b}
   org.apache.spark.sql.AnalysisException: cannot resolve 
'(partition.`page_view_dtm_day` = 18384)' due to data type mismatch: differing 
types in '(partition.`page_view_dtm_day` = 18384)' (date and int).;
   'Filter (partition#4925.page_view_dtm_day = 18384)
   +- RelationV2[content#4921, file_path#4922, file_format#4923, spec_id#4924, 
partition#4925, record_count#4926L, file_size_in_bytes#4927L, 
column_sizes#4928, value_counts#4929, null_value_counts#4930, 
nan_value_counts#4931, lower_bounds#4932, upper_bounds#4933, key_metadata#4934, 
split_offsets#4935, equality_ids#4936, sort_order_id#4937, 
readable_metrics#4938]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-12 Thread via GitHub


atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1760825445

   @huaxingao What can be the possible reasons for aggregate pushdown to not 
work when using filters, if you can give me some idea/hint I will try to look 
into it further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-12 Thread via GitHub


liurenjie1024 commented on code in PR #78:
URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1357791341


##
crates/iceberg/Cargo.toml:
##
@@ -41,20 +41,24 @@ either = "1"
 futures = "0.3"
 itertools = "0.11"
 lazy_static = "1"
+log = "^0.4"
 murmur3 = "0.5.2"
 once_cell = "1"
 opendal = "0.40"
 ordered-float = "4.0.0"
+reqwest = { version = "^0.11", features = ["json"] }

Review Comment:
   Sorry, I don't get your point, would you give a concrete example? 
One concern with separate crate approach is that it makes loading catalog 
dynamically like Python difficult.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Flaky test: TestRemoveOrphanFilesAction3 > orphanedFileRemovedWithParallelTasks [iceberg]

2023-10-12 Thread via GitHub


ajantha-bhat opened a new issue, #8824:
URL: https://github.com/apache/iceberg/issues/8824

   PR: https://github.com/apache/iceberg/pull/8822
   Build: 
https://github.com/apache/iceberg/actions/runs/6499875599/job/17655030618?pr=8822
   
   
   ```
   TestRemoveOrphanFilesAction3 > orphanedFileRemovedWithParallelTasks FAILED
   java.lang.AssertionError: Should delete 4 files expected:<4> but was:<3>
   at org.junit.Assert.fail(Assert.java:89)
   at org.junit.Assert.failNotEquals(Assert.java:835)
   at org.junit.Assert.assertEquals(Assert.java:647)
   at 
org.apache.iceberg.spark.actions.TestRemoveOrphanFilesAction.orphanedFileRemovedWithParallelTasks(TestRemoveOrphanFilesAction.java:307)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
   at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
   at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
   at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
   at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
   at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
   at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
   at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
   at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
   at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
   at 
org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
   at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
   at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
   at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
   at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
   at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
   at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
   at 
org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
   at 
org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
   at 
org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.processAllTestClasses(JUnitPlatformTestClassProcessor.java:110)
   at 
org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor$CollectAllTestClassesExecutor.access$000(JUnitPlatformTestClassProcessor.java:90)
   at 
org.gradle.api.internal.tasks.testing.junitplatform.JUnitPlatformTestClassProcessor.stop(JUnitPlatformTestClassProcessor.java:85)
   at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:62)
   at

Re: [I] Flaky test: TestRemoveOrphanFilesAction3 > orphanedFileRemovedWithParallelTasks [iceberg]

2023-10-12 Thread via GitHub


ajantha-bhat commented on issue #8824:
URL: https://github.com/apache/iceberg/issues/8824#issuecomment-1760921944

   Looks like it is a regression : https://github.com/apache/iceberg/pull/4859
   It seems we tried fixing it long time back but didn't fix properly. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] feat: suport read/write Manifest [iceberg-rust]

2023-10-12 Thread via GitHub


ZENOTME opened a new pull request, #79:
URL: https://github.com/apache/iceberg-rust/pull/79

   This PR prepare to support read/write Manifest. 
   
   related issue: #36
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: suport read/write Manifest [iceberg-rust]

2023-10-12 Thread via GitHub


ZENOTME commented on PR #79:
URL: https://github.com/apache/iceberg-rust/pull/79#issuecomment-1760940516

   For now, it still not been completed. 
   
   It only complete the basic design and I want to make sure whether the design 
well first. 
   
   If it looks well, I will complete it and add the test later.
   
   cc @JanKaul @Fokko @Xuanwo @liurenjie1024


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-12 Thread via GitHub


ZENOTME commented on code in PR #78:
URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1357818399


##
crates/iceberg/src/catalog/rest.rs:
##
@@ -0,0 +1,900 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This module contains rest catalog implementation.
+
+use std::collections::HashMap;
+
+use async_trait::async_trait;
+use reqwest::header::{self, HeaderMap, HeaderName, HeaderValue};
+use reqwest::{Client, Request};
+use serde::de::DeserializeOwned;
+use urlencoding::encode;
+
+use crate::error::Result;
+use crate::table::Table;
+use crate::{
+Catalog, Error, ErrorKind, Namespace, NamespaceIdent, TableCommit, 
TableCreation, TableIdent,
+};
+
+use self::_serde::{
+CatalogConfig, ErrorModel, ErrorResponse, ListNamespaceResponse, 
ListTableResponse,
+NamespaceSerde, RenameTableRequest, NO_CONTENT, OK,
+};
+
+const ICEBERG_REST_SPEC_VERSION: &str = "0.14.1";
+const PATH_V1: &str = "v1";
+
+#[derive(Debug, Builder)]
+pub struct RestCatalogConfig {
+uri: String,
+#[builder(default)]
+warehouse: Option,
+
+#[builder(default)]
+props: HashMap,
+}
+
+impl RestCatalogConfig {
+fn config_endpoint(&self) -> String {
+[&self.uri, PATH_V1, "config"].join("/")
+}
+
+fn namespaces_endpoint(&self) -> String {
+[&self.uri, PATH_V1, "namespaces"].join("/")
+}
+
+fn namespace_endpoint(&self, ns: &NamespaceIdent) -> String {
+[&self.uri, PATH_V1, "namespaces", &ns.encode_in_url()].join("/")
+}
+
+fn tables_endpoint(&self, ns: &NamespaceIdent) -> String {
+[
+&self.uri,
+PATH_V1,
+"namespaces",
+&ns.encode_in_url(),
+"tables",
+]
+.join("/")
+}
+
+fn rename_table_endpoint(&self) -> String {
+[&self.uri, PATH_V1, "tables", "rename"].join("/")
+}
+
+fn table_endpoint(&self, table: &TableIdent) -> String {
+[
+&self.uri,
+PATH_V1,
+"namespaces",
+&table.namespace.encode_in_url(),
+"tables",
+encode(&table.name).as_ref(),
+]
+.join("/")
+}
+
+fn try_create_rest_client(&self) -> Result {
+//TODO: We will add oauth, ssl config, sigv4 later
+let mut headers = HeaderMap::new();
+headers.insert(
+header::CONTENT_TYPE,
+HeaderValue::from_static("application/json"),
+);
+headers.insert(
+HeaderName::from_static("x-client-version"),
+HeaderValue::from_static(ICEBERG_REST_SPEC_VERSION),
+);
+headers.insert(
+header::USER_AGENT,
+HeaderValue::from_str(&format!("iceberg-rs/{}", 
env!("CARGO_PKG_VERSION"))).unwrap(),
+);
+
+Ok(HttpClient(
+Client::builder().default_headers(headers).build()?,
+))
+}
+}
+
+impl NamespaceIdent {
+/// Returns url encoded format.
+pub fn encode_in_url(&self) -> String {
+encode(&self.0.join("\u{1F}")).to_string()
+}
+}
+
+struct HttpClient(Client);
+
+impl HttpClient {
+async fn execute<
+R: DeserializeOwned,
+E: DeserializeOwned + Into,
+const SUCCESS_CODE: u16,
+>(
+&self,
+request: Request,
+) -> Result {
+let resp = self.0.execute(request).await?;
+
+if resp.status().as_u16() == SUCCESS_CODE {
+let text = resp.bytes().await?;
+Ok(serde_json::from_slice::(&text).map_err(|e| {
+Error::new(
+ErrorKind::Unexpected,
+"Failed to parse response from rest catalog server!",
+)
+.with_context("json", String::from_utf8_lossy(&text))
+.with_source(e)
+})?)
+} else {
+let text = resp.bytes().await?;
+let e = serde_json::from_slice::(&text).map_err(|e| {
+Error::new(
+ErrorKind::Unexpected,
+"Failed to parse response from rest catalog server!",
+)
+.with_context("json", String::from_utf8_lossy(&text))
+.with_source(