date:20241213

Re: [PR] Fix `Table.scan` to enable case sensitive argument [iceberg-python]

2024-12-13 Thread via GitHub



jiakai-li commented on PR #1423:
URL: https://github.com/apache/iceberg-python/pull/1423#issuecomment-2542657647

   Hey guys, thanks a lot for your kind guidance and great suggestions. I've 
updated the PR to:
   - Enable `Table.delete` and `Table.overwrite` operations to control 
case-sensitivity
   - Updated test cases to make it more readable
   
   Please let me know if there is other changes you would like me to make, 
thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix `Table.scan` to enable case sensitive argument [iceberg-python]

2024-12-13 Thread via GitHub



jiakai-li commented on code in PR #1423:
URL: https://github.com/apache/iceberg-python/pull/1423#discussion_r1884721249


##
tests/table/test_init.py:
##
@@ -310,6 +310,19 @@ def test_table_scan_row_filter(table_v2: Table) -> None:
 assert scan.filter(EqualTo("x", 10)).filter(In("y", (10, 11))).row_filter 
== And(EqualTo("x", 10), In("y", (10, 11)))
 
 
+def test_table_scan_partition_filters_case_sensitive(table_v2: Table) -> None:

Review Comment:
   Thanks very much @kevinjqliu , I've updated the test case for scan and the 
other operations as well. Hope it looks better now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] IO Implementation using Go CDK [iceberg-go]

2024-12-13 Thread via GitHub



dwilson1988 commented on PR #176:
URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542660861

   @zeroshade - I'll take a look this weekend!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Change Delete granularity to file for Spark 3.5 [iceberg]

2024-12-13 Thread via GitHub



aokolnychyi commented on code in PR #11478:
URL: https://github.com/apache/iceberg/pull/11478#discussion_r1884625538


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java:
##
@@ -719,7 +719,7 @@ public DeleteGranularity deleteGranularity() {
 .stringConf()
 .option(SparkWriteOptions.DELETE_GRANULARITY)
 .tableProperty(TableProperties.DELETE_GRANULARITY)
-.defaultValue(TableProperties.DELETE_GRANULARITY_DEFAULT)
+.defaultValue(DeleteGranularity.FILE.toString())

Review Comment:
   Let's switch to `enumConf` for this one, which was added recently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] GCS: Suppress JavaUtilDate in OAuth2RefreshCredentialsHandler [iceberg]

2024-12-13 Thread via GitHub



nastra merged PR #11773:
URL: https://github.com/apache/iceberg/pull/11773


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883568018


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java:
##
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.List;
+import java.util.Map;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.rest.HTTPHeaders.HTTPHeader;
+import org.junit.jupiter.api.Test;
+
+class TestHTTPHeaders {
+
+  final HTTPHeaders headers =
+  HTTPHeaders.of(
+  HTTPHeader.of("header1", "value1a"),

Review Comment:
   please also add a test where the key/value is null



##
core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java:
##
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.List;
+import java.util.Map;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.rest.HTTPHeaders.HTTPHeader;
+import org.junit.jupiter.api.Test;
+
+class TestHTTPHeaders {
+
+  final HTTPHeaders headers =

Review Comment:
   can be private



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883571471


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java:
##
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.List;
+import java.util.Map;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.rest.HTTPHeaders.HTTPHeader;
+import org.junit.jupiter.api.Test;
+
+class TestHTTPHeaders {
+
+  final HTTPHeaders headers =
+  HTTPHeaders.of(
+  HTTPHeader.of("header1", "value1a"),
+  HTTPHeader.of("HEADER1", "value1b"),
+  HTTPHeader.of("header2", "value2"));
+
+  @Test
+  void asMap() {
+assertThat(headers.asMap())
+.isEqualTo(
+Map.of(
+"header1", List.of("value1a", "value1b"),
+"header2", List.of("value2")));
+  }
+
+  @Test
+  void asSimpleMap() {
+assertThat(headers.asSimpleMap())
+.isEqualTo(
+Map.of(
+"header1", "value1a",
+"header2", "value2"));
+  }
+
+  @Test
+  void asMultiMap() {
+assertThat(headers.asMultiMap())
+.isEqualTo(
+ImmutableListMultimap.of(
+"header1", "value1a", "header1", "value1b", "header2", 
"value2"));
+  }
+
+  @Test
+  void entries() {
+assertThat(headers.entries("header1"))

Review Comment:
   please also add tests where `null` is passed to all the methods that take 
nullable parameters, such as `entries()`/`contains()`/`addIfAbsent()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883847591


##
core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.immutables.value.Value;
+
+/**
+ * Represents a group of HTTP headers.
+ *
+ * Header name comparison in this class is always case-insensitive, in 
accordance with RFC 2616.
+ *
+ * This class exposes methods to convert to and from different 
representations such as maps and
+ * multimap, for easier access and manipulation – especially when dealing with 
multiple headers with
+ * the same name.
+ */
+@Value.Style(depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPHeaders {
+
+  HTTPHeaders EMPTY = of();
+
+  /** Returns all the header entries in this group. */
+  List entries();
+
+  /** Returns all the entries in this group for the given name 
(case-insensitive). */
+  default List entries(String name) {
+return entries().stream()
+.filter(header -> header.name().equalsIgnoreCase(name))
+.collect(Collectors.toList());
+  }
+
+  /** Returns whether this group contains an entry with the given name 
(case-insensitive). */
+  default boolean contains(String name) {
+return entries().stream().anyMatch(header -> 
header.name().equalsIgnoreCase(name));
+  }
+
+  /**
+   * Adds the given header to the current group if no entry with the same name 
is already present.
+   * Returns a new instance with the added header, or the current instance if 
the header is already
+   * present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeader header) {

Review Comment:
   we might want to consider naming this (and the other method) 
`withHeaderIfAbsent()` to clearly indicate that (potentially) a new instance is 
returned because `add` in the name suggests that the existing object is 
modified 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883560954


##
core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.stream.Collectors;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.Multimap;
+import org.immutables.value.Value;
+
+/**
+ * Represents a group of HTTP headers.
+ *
+ * Header name comparison in this class is always case-insensitive, in 
accordance with RFC 2616.
+ *
+ * This class exposes methods to convert to and from different 
representations such as maps and
+ * multimap, for easier access and manipulation – especially when dealing with 
multiple headers with
+ * the same name.
+ */
+@Value.Style(depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPHeaders {
+
+  HTTPHeaders EMPTY = of();
+
+  /** Returns all the header entries in this group. */
+  List entries();
+
+  /**
+   * Returns a map representation of the headers where each header name is 
mapped to a list of its
+   * values. Header names are case-insensitive.
+   */
+  @Value.Lazy
+  default Map> asMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(
+headers -> headers.get(0).name(),
+headers -> 
headers.stream().map(HTTPHeader::value).collect(Collectors.toList(;
+  }
+
+  /**
+   * Returns a simple map representation of the headers where each header name 
is mapped to its
+   * first value. If a header has multiple values, only the first value is 
used. Header names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default Map asSimpleMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(headers -> headers.get(0).name(), headers -> 
headers.get(0).value()));
+  }
+
+  /**
+   * Returns a {@link ListMultimap} representation of the headers. Header 
names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default ListMultimap asMultiMap() {

Review Comment:
   do we actually have a use case for this? I see the value in having `asMap()` 
/ `asSimpleMap()` but I feel like we don't need this method atm?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883564062


##
core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.stream.Collectors;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.Multimap;
+import org.immutables.value.Value;
+
+/**
+ * Represents a group of HTTP headers.
+ *
+ * Header name comparison in this class is always case-insensitive, in 
accordance with RFC 2616.
+ *
+ * This class exposes methods to convert to and from different 
representations such as maps and
+ * multimap, for easier access and manipulation – especially when dealing with 
multiple headers with
+ * the same name.
+ */
+@Value.Style(depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPHeaders {
+
+  HTTPHeaders EMPTY = of();
+
+  /** Returns all the header entries in this group. */
+  List entries();
+
+  /**
+   * Returns a map representation of the headers where each header name is 
mapped to a list of its
+   * values. Header names are case-insensitive.
+   */
+  @Value.Lazy
+  default Map> asMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(
+headers -> headers.get(0).name(),
+headers -> 
headers.stream().map(HTTPHeader::value).collect(Collectors.toList(;
+  }
+
+  /**
+   * Returns a simple map representation of the headers where each header name 
is mapped to its
+   * first value. If a header has multiple values, only the first value is 
used. Header names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default Map asSimpleMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(headers -> headers.get(0).name(), headers -> 
headers.get(0).value()));
+  }
+
+  /**
+   * Returns a {@link ListMultimap} representation of the headers. Header 
names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default ListMultimap asMultiMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+ImmutableListMultimap.flatteningToImmutableListMultimap(
+headers -> headers.get(0).name(),
+headers -> headers.stream().map(HTTPHeader::value)));
+  }
+
+  /** Returns all the entries in this group for the given name 
(case-insensitive). */
+  default List entries(String name) {
+return entries().stream()
+.filter(header -> header.name().equalsIgnoreCase(name))
+.collect(Collectors.toList());
+  }
+
+  /** Returns whether this group contains an entry with the given name 
(case-insensitive). */
+  default boolean contains(String name) {
+return entries().stream().anyMatch(header -> 
header.name().equalsIgnoreCase(name));
+  }
+
+  /**
+   * Adds the given header to the current group if no entry with the same name 
is already present.
+   * Returns a new instance with the added header, or the current instance if 
the header is already
+   * present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeader header) {
+return contains(header.name())
+? this
+: ImmutableHTTPHeaders.builder().from(this).addEntry(header).build();
+  }
+
+  /**
+   * Adds the given headers to the current group if no entries with same names 
are already present.
+   * Returns a new instance with the added headers, or the current instance if 
all headers are
+   * already present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeaders headers) {
+

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



adutra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883565088


##
core/src/main/java/org/apache/iceberg/rest/HTTPRequest.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import java.net.URI;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.immutables.value.Value;
+
+/** Represents an HTTP request. */
+@Value.Style(redactedMask = "", depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPRequest {
+
+  enum HTTPMethod {
+GET,
+HEAD,
+POST,
+DELETE
+  }
+
+  /**
+   * Returns the base URI configured at the REST client level. The base URI is 
used to construct the
+   * full {@link #requestUri()}.
+   */
+  URI baseUri();
+
+  /**
+   * Returns the full URI of this request. The URI is constructed from the 
base URI, path, and query
+   * parameters. It cannot be modified directly.
+   */
+  @Value.Lazy
+  default URI requestUri() {
+return RESTUtil.buildRequestUri(this);

Review Comment:
   Yes that's a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



adutra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883564166


##
core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.stream.Collectors;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.Multimap;
+import org.immutables.value.Value;
+
+/**
+ * Represents a group of HTTP headers.
+ *
+ * Header name comparison in this class is always case-insensitive, in 
accordance with RFC 2616.
+ *
+ * This class exposes methods to convert to and from different 
representations such as maps and
+ * multimap, for easier access and manipulation – especially when dealing with 
multiple headers with
+ * the same name.
+ */
+@Value.Style(depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPHeaders {
+
+  HTTPHeaders EMPTY = of();
+
+  /** Returns all the header entries in this group. */
+  List entries();
+
+  /**
+   * Returns a map representation of the headers where each header name is 
mapped to a list of its
+   * values. Header names are case-insensitive.
+   */
+  @Value.Lazy
+  default Map> asMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(
+headers -> headers.get(0).name(),
+headers -> 
headers.stream().map(HTTPHeader::value).collect(Collectors.toList(;
+  }
+
+  /**
+   * Returns a simple map representation of the headers where each header name 
is mapped to its
+   * first value. If a header has multiple values, only the first value is 
used. Header names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default Map asSimpleMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(headers -> headers.get(0).name(), headers -> 
headers.get(0).value()));
+  }
+
+  /**
+   * Returns a {@link ListMultimap} representation of the headers. Header 
names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default ListMultimap asMultiMap() {

Review Comment:
   This one will be very useful later to do things like this in `HTTPClient`:
   
   ```java
   HTTPRequest req = ...
   HttpUriRequestBase request = new HttpUriRequestBase(req.method().name(), 
req.requestUri());
   req.headers().asMultiMap().forEach(request::addHeader);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883809126


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java:
##
@@ -95,6 +100,9 @@ void addIfAbsentHTTPHeader() {
 "header1", List.of("value1a", "value1b"),
 "header2", List.of("value2"),
 "header3", List.of("value3")));
+
+assertThatThrownBy(() -> headers.addIfAbsent((HTTPHeaders) null))
+.isInstanceOf(NullPointerException.class);

Review Comment:
   checks like this should always have a `.hasMessage` check to make sure we 
get the right error msg back (unfortunately we can't easily enforce such a rule 
via checkstyle)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



adutra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883811731


##
core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.stream.Collectors;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.Multimap;
+import org.immutables.value.Value;
+
+/**
+ * Represents a group of HTTP headers.
+ *
+ * Header name comparison in this class is always case-insensitive, in 
accordance with RFC 2616.
+ *
+ * This class exposes methods to convert to and from different 
representations such as maps and
+ * multimap, for easier access and manipulation – especially when dealing with 
multiple headers with
+ * the same name.
+ */
+@Value.Style(depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPHeaders {
+
+  HTTPHeaders EMPTY = of();
+
+  /** Returns all the header entries in this group. */
+  List entries();
+
+  /**
+   * Returns a map representation of the headers where each header name is 
mapped to a list of its
+   * values. Header names are case-insensitive.
+   */
+  @Value.Lazy
+  default Map> asMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(
+headers -> headers.get(0).name(),
+headers -> 
headers.stream().map(HTTPHeader::value).collect(Collectors.toList(;
+  }
+
+  /**
+   * Returns a simple map representation of the headers where each header name 
is mapped to its
+   * first value. If a header has multiple values, only the first value is 
used. Header names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default Map asSimpleMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(headers -> headers.get(0).name(), headers -> 
headers.get(0).value()));
+  }
+
+  /**
+   * Returns a {@link ListMultimap} representation of the headers. Header 
names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default ListMultimap asMultiMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+ImmutableListMultimap.flatteningToImmutableListMultimap(
+headers -> headers.get(0).name(),
+headers -> headers.stream().map(HTTPHeader::value)));
+  }
+
+  /** Returns all the entries in this group for the given name 
(case-insensitive). */
+  default List entries(String name) {
+return entries().stream()
+.filter(header -> header.name().equalsIgnoreCase(name))
+.collect(Collectors.toList());
+  }
+
+  /** Returns whether this group contains an entry with the given name 
(case-insensitive). */
+  default boolean contains(String name) {
+return entries().stream().anyMatch(header -> 
header.name().equalsIgnoreCase(name));
+  }
+
+  /**
+   * Adds the given header to the current group if no entry with the same name 
is already present.
+   * Returns a new instance with the added header, or the current instance if 
the header is already
+   * present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeader header) {
+return contains(header.name())
+? this
+: ImmutableHTTPHeaders.builder().from(this).addEntry(header).build();
+  }
+
+  /**
+   * Adds the given headers to the current group if no entries with same names 
are already present.
+   * Returns a new instance with the added headers, or the current instance if 
all headers are
+   * already present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeaders headers) {
+

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



adutra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883813563


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java:
##
@@ -95,6 +100,9 @@ void addIfAbsentHTTPHeader() {
 "header1", List.of("value1a", "value1b"),
 "header2", List.of("value2"),
 "header3", List.of("value3")));
+
+assertThatThrownBy(() -> headers.addIfAbsent((HTTPHeaders) null))
+.isInstanceOf(NullPointerException.class);

Review Comment:
   The message is generated by Immutables and will be rather cryptic. But OK.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883806469


##
core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.stream.Collectors;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.Multimap;
+import org.immutables.value.Value;
+
+/**
+ * Represents a group of HTTP headers.
+ *
+ * Header name comparison in this class is always case-insensitive, in 
accordance with RFC 2616.
+ *
+ * This class exposes methods to convert to and from different 
representations such as maps and
+ * multimap, for easier access and manipulation – especially when dealing with 
multiple headers with
+ * the same name.
+ */
+@Value.Style(depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPHeaders {
+
+  HTTPHeaders EMPTY = of();
+
+  /** Returns all the header entries in this group. */
+  List entries();
+
+  /**
+   * Returns a map representation of the headers where each header name is 
mapped to a list of its
+   * values. Header names are case-insensitive.
+   */
+  @Value.Lazy
+  default Map> asMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(
+headers -> headers.get(0).name(),
+headers -> 
headers.stream().map(HTTPHeader::value).collect(Collectors.toList(;
+  }
+
+  /**
+   * Returns a simple map representation of the headers where each header name 
is mapped to its
+   * first value. If a header has multiple values, only the first value is 
used. Header names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default Map asSimpleMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(headers -> headers.get(0).name(), headers -> 
headers.get(0).value()));
+  }
+
+  /**
+   * Returns a {@link ListMultimap} representation of the headers. Header 
names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default ListMultimap asMultiMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+ImmutableListMultimap.flatteningToImmutableListMultimap(
+headers -> headers.get(0).name(),
+headers -> headers.stream().map(HTTPHeader::value)));
+  }
+
+  /** Returns all the entries in this group for the given name 
(case-insensitive). */
+  default List entries(String name) {
+return entries().stream()
+.filter(header -> header.name().equalsIgnoreCase(name))
+.collect(Collectors.toList());
+  }
+
+  /** Returns whether this group contains an entry with the given name 
(case-insensitive). */
+  default boolean contains(String name) {
+return entries().stream().anyMatch(header -> 
header.name().equalsIgnoreCase(name));
+  }
+
+  /**
+   * Adds the given header to the current group if no entry with the same name 
is already present.
+   * Returns a new instance with the added header, or the current instance if 
the header is already
+   * present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeader header) {
+return contains(header.name())
+? this
+: ImmutableHTTPHeaders.builder().from(this).addEntry(header).build();
+  }
+
+  /**
+   * Adds the given headers to the current group if no entries with same names 
are already present.
+   * Returns a new instance with the added headers, or the current instance if 
all headers are
+   * already present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeaders headers) {
+

Re: [PR] Docs: add note for `day` transform [iceberg]

2024-12-13 Thread via GitHub



Fokko commented on code in PR #11749:
URL: https://github.com/apache/iceberg/pull/11749#discussion_r1883817906


##
format/spec.md:
##
@@ -454,7 +454,7 @@ Partition field IDs must be reused if an existing partition 
spec contains an equ
 | **`truncate[W]`** | Value truncated to width `W` (see below) 
| `int`, `long`, `decimal`, `string`, `binary`  
| Source type |
 | **`year`**| Extract a date or timestamp year, as years from 1970 
| `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`  
| `int`   |
 | **`month`**   | Extract a date or timestamp month, as months from 
1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, 
`timestamptz_ns`  | `int`   |
-| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 
| `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`  
| `int`   |
+| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 
| `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`  
| `int` (the physical type should be an `int`, 
but the the logical type should be a `date`) |

Review Comment:
   > The value should be serialized/displayed as Date (This is not mentioned in 
the spec here, but is in the reference implementation.)
   
   Yes, but since it is serialized with Avro, it will always be an `int`:
   
   ```json
   {
 "type": "int",
 "logicalType": "date"
   }
   ```
   
   Specifying this twice would lead to duplication of [the Avro 
spec](https://avro.apache.org/docs/1.12.0/specification/#date) into the Iceberg 
spec. The error in Iceberg-Rust did raise my eyebrow a bit since I would expect 
it to read the `int` without the `logicalType` as well because there is no 
ambiguity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] REST catalog doesn't return old history if we execute `CREATE OR REPLACE TABLE` statement [iceberg]

2024-12-13 Thread via GitHub



ebyhr commented on issue #11777:
URL: https://github.com/apache/iceberg/issues/11777#issuecomment-2540941152

   I think we need to add a new `MetadataUpdate` class (e.g. `ResetMainBranch`) 
or modify `RemoveSnapshotRef` to provide an option to clear snapshot-logs or 
not. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



adutra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883626591


##
core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.stream.Collectors;
+import 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap;
+import org.apache.iceberg.relocated.com.google.common.collect.Multimap;
+import org.immutables.value.Value;
+
+/**
+ * Represents a group of HTTP headers.
+ *
+ * Header name comparison in this class is always case-insensitive, in 
accordance with RFC 2616.
+ *
+ * This class exposes methods to convert to and from different 
representations such as maps and
+ * multimap, for easier access and manipulation – especially when dealing with 
multiple headers with
+ * the same name.
+ */
+@Value.Style(depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPHeaders {
+
+  HTTPHeaders EMPTY = of();
+
+  /** Returns all the header entries in this group. */
+  List entries();
+
+  /**
+   * Returns a map representation of the headers where each header name is 
mapped to a list of its
+   * values. Header names are case-insensitive.
+   */
+  @Value.Lazy
+  default Map> asMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(
+headers -> headers.get(0).name(),
+headers -> 
headers.stream().map(HTTPHeader::value).collect(Collectors.toList(;
+  }
+
+  /**
+   * Returns a simple map representation of the headers where each header name 
is mapped to its
+   * first value. If a header has multiple values, only the first value is 
used. Header names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default Map asSimpleMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+Collectors.toMap(headers -> headers.get(0).name(), headers -> 
headers.get(0).value()));
+  }
+
+  /**
+   * Returns a {@link ListMultimap} representation of the headers. Header 
names are
+   * case-insensitive.
+   */
+  @Value.Lazy
+  default ListMultimap asMultiMap() {
+return entries().stream()
+.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT)))
+.values()
+.stream()
+.collect(
+ImmutableListMultimap.flatteningToImmutableListMultimap(
+headers -> headers.get(0).name(),
+headers -> headers.stream().map(HTTPHeader::value)));
+  }
+
+  /** Returns all the entries in this group for the given name 
(case-insensitive). */
+  default List entries(String name) {
+return entries().stream()
+.filter(header -> header.name().equalsIgnoreCase(name))
+.collect(Collectors.toList());
+  }
+
+  /** Returns whether this group contains an entry with the given name 
(case-insensitive). */
+  default boolean contains(String name) {
+return entries().stream().anyMatch(header -> 
header.name().equalsIgnoreCase(name));
+  }
+
+  /**
+   * Adds the given header to the current group if no entry with the same name 
is already present.
+   * Returns a new instance with the added header, or the current instance if 
the header is already
+   * present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeader header) {
+return contains(header.name())
+? this
+: ImmutableHTTPHeaders.builder().from(this).addEntry(header).build();
+  }
+
+  /**
+   * Adds the given headers to the current group if no entries with same names 
are already present.
+   * Returns a new instance with the added headers, or the current instance if 
all headers are
+   * already present.
+   */
+  default HTTPHeaders addIfAbsent(HTTPHeaders headers) {
+

Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on code in PR #785:
URL: https://github.com/apache/iceberg-rust/pull/785#discussion_r1883575270


##
crates/iceberg/src/arrow/reader.rs:
##
@@ -1180,4 +1194,62 @@ mod tests {
 
 assert_eq!(visitor.field_ids, expected);
 }
+
+#[test]
+fn test_arrow_projection_mask() {

Review Comment:
   Hi, thank you @gruuya for working on this, would you like to add a test case 
for decimal? Others LGTM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] refactor: avoid async_trait for FileRead and provide object safe dyn methods [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on PR #761:
URL: https://github.com/apache/iceberg-rust/pull/761#issuecomment-2540901488

   I have similar comments like 
https://github.com/apache/iceberg-rust/pull/760#issuecomment-2540446608


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] refactor: Move puffin crate contents inside iceberg crate [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo merged PR #789:
URL: https://github.com/apache/iceberg-rust/pull/789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Move puffin crate contents inside iceberg crate [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on PR #789:
URL: https://github.com/apache/iceberg-rust/pull/789#issuecomment-2540906883

   Thank you, @Fokko, for the feedback. Since neither of us has a strong 
opinion on this and the suggestion comes from @liurenjie1024 with agreement 
from @fqaiser94, I believe there’s no need to wait for another round of review. 
This change can easily be reverted at any time if we find it to be incorrect. 
So, let’s move forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Doc: Add staus page for different implementations. [iceberg]

2024-12-13 Thread via GitHub



Fokko commented on code in PR #11772:
URL: https://github.com/apache/iceberg/pull/11772#discussion_r1883511101


##
site/docs/status.md:
##
@@ -0,0 +1,362 @@
+---
+title: "Implementation Status"
+---
+
+
+# Implementations Status
+
+Apache iceberg now has implementations of the iceberg spec in multiple 
languages. This page provides a summary of the
+current status of these implementations.

Review Comment:
   ```suggestion
   Apache iceberg's spec is implemented in multiple languages. This page 
provides an overview of the
   current capabilities.
   ```



##
site/docs/status.md:
##
@@ -0,0 +1,362 @@
+---
+title: "Implementation Status"
+---
+
+
+# Implementations Status
+
+Apache iceberg now has implementations of the iceberg spec in multiple 
languages. This page provides a summary of the
+current status of these implementations.
+
+## Versions
+
+This section describes the versions of each implementation that are being 
tracked in this page.
+
+| Language  | Version |
+|---|-|
+| Java  | 1.7.1   |
+| PyIceberg | 0.7.0   |
+| Rust  | 0.3.0   |
+| Go| 0.1.0   |
+
+## Data Types
+
+| Data Type  | Java | PyIceberg | Rust | Go |
+||--|---|--||
+| boolean| Y| Y | Y| Y  |
+| int| Y| Y | Y| Y  |
+| long   | Y| Y | Y| Y  |
+| float  | Y| Y | Y| Y  |
+| double | Y| Y | Y| Y  |
+| decimal| Y| Y | Y| Y  |
+| date   | Y| Y | Y| Y  |
+| time   | Y| Y | Y| Y  |
+| timestamp  | Y| Y | Y| Y  |
+| timestamptz| Y| Y | Y| Y  |
+| timestamp_ns   | Y| Y | Y| Y  |
+| timestamptz_ns | Y| Y | Y| Y  |
+| string | Y| Y | Y| Y  |
+| uuid   | Y| Y | Y| Y  |
+| fixed  | Y| Y | Y| Y  |
+| binary | Y| Y | Y| Y  |
+
+## Data File Formats
+
+| Format  | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Parquet | Y| Y | Y| Y  |
+| ORC | Y| N | N| N  |
+| Puffin  | Y| N | N| N  |
+
+## File IO
+
+| Storage  | Java | PyIceberg | Rust | Go |
+|--|--|---|--||
+| Local Filesystem | Y| Y | Y| Y  |
+| Hadoop Filesystem| Y| Y | Y| Y  |
+| Aws S3   | Y| Y | Y| Y  |
+| Google Cloud Storage | Y| Y | Y| Y  |
+| Memory Fs| Y| Y | Y| Y  |
+
+## Table Update Operations
+
+### Table Spec V1
+
+| Operation   | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Update schema   | Y| N | N| N  |
+| Update partition spec   | Y| N | N| N  |
+| Update table properties | Y| Y | Y| N  |
+| Replace sort order  | Y| N | N| N  |
+| Update table location   | Y| N | N| N  |
+| Append data files   | Y| N | N| N  |
+| Rewrite files   | Y| N | N| N  |
+| Rewrite manifests   | Y| N | N| N  |
+| Overwrite files | Y| N | N| N  |
+| Row delta   | Y| N | N| N  |
+| Delete files| Y| N | N| N  |
+| Update statistics   | Y| N | N| N  |
+| Update partition statistics | Y| N | N| N  |
+| Expire snapshots| Y| N | N| N  |
+| Manage snapshots| Y| N | N| N  |
+
+### Table Spec V2
+
+| Operation   | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Update schema   | Y| N | N| N  |
+| Update partition spec   | Y| N | N| N  |
+| Update table properties | Y| Y | Y| N  |
+| Replace sort order  | Y| N | N| N  |
+| Update table location   | Y| N | N| N  |
+| Append data files   | Y| N | N| N  |
+| Rewrite files   | Y| N | N| N  |
+| Rewrite manifests   | Y| N | N| N  |
+| Overwrite files | Y| N | N| N  |
+| Row delta   | Y| N | N| N  |
+| Delete files| Y| N | N| N  |
+| Update statistics   | Y| N | N| N  |
+| Update partition statistics | Y| N | N| N  |
+| Expire snapshots| Y| N | N| N  |
+| Manage snapshots| Y| N | N| N  |
+

[I] Failed to read iceberg TPCH generated by snowflake [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo opened a new issue, #790:
URL: https://github.com/apache/iceberg-rust/issues/790

   The table generated in this way:
   
   ```sql
   CREATE OR REPLACE ICEBERG TABLE lineitem (
   l_orderkey   BIGINT,
   l_partkeyBIGINT,
   l_suppkeyBIGINT,
   l_linenumber INT,
   l_quantity   DECIMAL(15,2),
   l_extendedprice  DECIMAL(15,2),
   l_discount   DECIMAL(15,2),
   l_taxDECIMAL(15,2),
   l_returnflag STRING,
   l_linestatus STRING,
   l_shipdate   DATE,
   l_commitdate DATE,
   l_receiptdateDATE,
   l_shipinstruct   STRING,
   l_shipmode   STRING,
   l_commentSTRING
   )
   CATALOG = 'SNOWFLAKE'
   EXTERNAL_VOLUME = 'iceberg_external_volume'
   BASE_LOCATION = 'lineitem'
   CATALOG_SYNC = 'icebergPolaris';
   
   INSERT INTO lineitem 
   SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM;
   ```
   
   I'm using iceberg-rust to read data from given table via polaris catalog:
   
   ```sql
   select * from lineitem limit 1;
   ```
   
   Got the following error:
   
   ```
   1001=>iceberg data stream read: DataInvalid => Parquet schema table {
   1: L_ORDERKEY: optional long
   2: L_PARTKEY: optional long
   3: L_SUPPKEY: optional long
   4: L_LINENUMBER: optional int
   5: L_QUANTITY: optional decimal(4,2)
   6: L_EXTENDEDPRICE: optional decimal(8,2)
   7: L_DISCOUNT: optional decimal(3,2)
   8: L_TAX: optional decimal(3,2)
   9: L_RETURNFLAG: optional string
   10: L_LINESTATUS: optional string
   11: L_SHIPDATE: optional date
   12: L_COMMITDATE: optional date
   13: L_RECEIPTDATE: optional date
   14: L_SHIPINSTRUCT: optional string
   15: L_SHIPMODE: optional string
   16: L_COMMENT: optional string
   }
   and Iceberg schema table {
   1: L_ORDERKEY: optional long
   2: L_PARTKEY: optional long
   3: L_SUPPKEY: optional long
   4: L_LINENUMBER: optional int
   5: L_QUANTITY: optional decimal(15,2)
   6: L_EXTENDEDPRICE: optional decimal(15,2)
   7: L_DISCOUNT: optional decimal(15,2)
   8: L_TAX: optional decimal(15,2)
   9: L_RETURNFLAG: optional string
   10: L_LINESTATUS: optional string
   11: L_SHIPDATE: optional date
   12: L_COMMITDATE: optional date
   13: L_RECEIPTDATE: optional date
   14: L_SHIPINSTRUCT: optional string
   15: L_SHIPMODE: optional string
   16: L_COMMENT: optional string
   }
   do not match.
   ```
   
   It appears snowflake writes parquet files with different schema when using 
iceberg tables. Should we support this scenario?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Retry object store reads on temporary errors. [iceberg-rust]

2024-12-13 Thread via GitHub



ryzhyk commented on PR #788:
URL: https://github.com/apache/iceberg-rust/pull/788#issuecomment-2540837170

   > Comparing to add retry layer for every service, how about adding it here?
   > 
   > 
https://github.com/apache/iceberg-rust/blob/e073e75bf51fd4e8999e9d99a9986a4380afd0bc/crates/iceberg/src/io/storage.rs#L92
   
   Done, thanks for the feedback!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] feat: Print debug source error instead [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo opened a new pull request, #792:
URL: https://github.com/apache/iceberg-rust/pull/792

   This PR will print debug source error instead for better understanding what 
happened inside.
   
   Fix error like:
   
   ```
   iceberg table scan plan: Unexpected => Failure in doing io operation
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Geospatial Support [iceberg]

2024-12-13 Thread via GitHub



jaakla commented on issue #10260:
URL: https://github.com/apache/iceberg/issues/10260#issuecomment-2541099197

   IMHO the concept of SPATIAL_REF_SYS, (and the other geo metadata tables) is 
a bit legacy from the times when (1) storage was really expensive and (2) 
databases did not have proper ways to add rich semantic metadata to the 
tables/columns. Today we don't have either problem - the modern table 
approaches (like Iceberg) do allow rich metadata attached properly directly to 
each relevant table, in space-effective way, no physical need of creating 
additional metadata tables and think of additional effort to take care of 
these. Maybe for some legacy clients you want virtual views, just for 
compatibility purposes. 
   
   Here for me the question is on clear definition content and readability of 
that metadata. I dont currently really work with specialized geodata, I'm in 
"normal IT business" and here explicit (or even default/implied) definition of 
a `named_crs='OGC:CRS84'` would cover basically all the specific geo-metadata 
CRS needs. As you can see from geojson (or KML) popularity and history, only 
specific domain of geospatial scientists and land survey etc do care about 
anything else, for the others "correct" axis order and knowing how to do 
Havershine properly does the job. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo merged PR #785:
URL: https://github.com/apache/iceberg-rust/pull/785


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Avoid RANGE mode broken chain when write parallelism changes [iceberg]

2024-12-13 Thread via GitHub



pvary commented on PR #11702:
URL: https://github.com/apache/iceberg/pull/11702#issuecomment-2541180202

   @huyuanfeng2018: Could we have a unit test for this to avoid future 
regressions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] chore: Add more debug message inside error [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo opened a new pull request, #793:
URL: https://github.com/apache/iceberg-rust/pull/793

   This PR will add more debug message inside error from rest catalog so we can 
avoid error message like:
   
   ```
   1107=>
   Iceberg catalog load failed: ReadTableDataError. Code: 1107, Text = Iceberg 
catalog load failed: Unexpected => Failed to parse response from rest catalog 
server!
   
   Context:
  json: 
  code: 429 Too Many Requests
   
   Source: EOF while parsing a value at line 1 column 0
   .
   
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Eagerly project arrow schema when calculating the parquet `ProjectionMask` [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo closed issue #784: Eagerly project arrow schema when calculating the 
parquet `ProjectionMask`
URL: https://github.com/apache/iceberg-rust/issues/784


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] add Status data structure [iceberg-cpp]

2024-12-13 Thread via GitHub



zhjwpku opened a new pull request, #8:
URL: https://github.com/apache/iceberg-cpp/pull/8

   This is adpated from Apache Arrow, this is a pre step before integret the 
Result data structure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]

2024-12-13 Thread via GitHub



raulcd commented on code in PR #7:
URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1883773249


##
.github/workflows/test.yml:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Test
+
+on:
+  push:
+branches:
+  - '**'
+  - '!dependabot/**'
+tags:
+  - '**'
+  pull_request:
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+
+env:
+  ICEBERG_HOME: /tmp/iceberg
+
+jobs:
+  ubuntu:
+name: AMD64 Ubuntu latest
+runs-on: ubuntu-latest

Review Comment:
   I've pinned them to ubuntu-24.04 and macos-14



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]

2024-12-13 Thread via GitHub



raulcd commented on PR #7:
URL: https://github.com/apache/iceberg-cpp/pull/7#issuecomment-2541213585

   Run is successful as seen here:
   https://github.com/raulcd/iceberg-cpp/actions/runs/12314556083
   @gaborkaszab @wgtmac if you are :+1:  with this it probably can be merged


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Path/file URI handling [iceberg]

2024-12-13 Thread via GitHub



jdockerty opened a new issue, #11778:
URL: https://github.com/apache/iceberg/issues/11778

   ### Query engine
   
   N/A
   
   ### Question
   
   Are the path URIs intended to be fully addressable, such as within the 
`location` or `metadata-file` fields? 
   
   From the example given in the spec with a path of `s3://bucket/.../v1.json` 
to some table metadata JSON, this would be missing a region configuration and 
would be down to a client implementation to set `AWS_REGION=YOUR_REGION` or 
configured in some other way.
   
   ---
   
   For some extra context, I asked this question in the [Iceberg 
Slack](https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1734022973216809) 
and believe I understand the answer, but wanted to make it more discoverable 
for others who are searching issues.
   
   I'd also like to ask whether there is anywhere within the spec I can add 
this information? My original thought was within the [implementation 
notes](https://iceberg.apache.org/spec/#appendix-f-implementation-notes), but 
it is not clear to me whether this is the correct place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Encountering `429 Too Many Requests` error every time when accessing Snowflake's Polaris catalog [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo opened a new issue, #791:
URL: https://github.com/apache/iceberg-rust/issues/791

   I found iceberg-rust always raise `429 Too Many Requests` while calling 
Snowflake's Polaris catalog:
   
   ```rust
   1107=>
   Iceberg catalog load failed: ReadTableDataError. Code: 1107, Text = Iceberg 
catalog load failed: Unexpected => Failed to parse response from rest catalog 
server!
   
   Context:
  json: 
  code: 429 Too Many Requests
   
   Source: EOF while parsing a value at line 1 column 0
   .
   

   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Implement Decimal from/to bytes represents [iceberg-rust]

2024-12-13 Thread via GitHub



liurenjie1024 merged PR #665:
URL: https://github.com/apache/iceberg-rust/pull/665


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]

2024-12-13 Thread via GitHub



gruuya commented on code in PR #785:
URL: https://github.com/apache/iceberg-rust/pull/785#discussion_r1883665954


##
crates/iceberg/src/arrow/reader.rs:
##
@@ -1180,4 +1194,62 @@ mod tests {
 
 assert_eq!(visitor.field_ids, expected);
 }
+
+#[test]
+fn test_arrow_projection_mask() {

Review Comment:
   Certainly, let me think of one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]

2024-12-13 Thread via GitHub



gruuya commented on code in PR #785:
URL: https://github.com/apache/iceberg-rust/pull/785#discussion_r1883721441


##
crates/iceberg/src/arrow/reader.rs:
##
@@ -1180,4 +1194,62 @@ mod tests {
 
 assert_eq!(visitor.field_ids, expected);
 }
+
+#[test]
+fn test_arrow_projection_mask() {

Review Comment:
   I've slightly revised the existing test now. It tests a parquet/arrow schema 
with one supported type, one unsupported type and one with a decimal with a 
precision beyond the supported range. It asserts that projection masks fails 
until the projection leaves out the last two fields.
   
   Granted it's a bit of a stretch to think how this could come up in practice, 
but it does exercise the proposed changes in this PR nonetheless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Change the glue catalog doc ( `list_tables` method only return Iceberg Tables ) [iceberg-python]

2024-12-13 Thread via GitHub



omkenge closed issue #1291: Change the glue catalog doc ( `list_tables` method 
only return Iceberg Tables )
URL: https://github.com/apache/iceberg-python/issues/1291


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]

2024-12-13 Thread via GitHub



gruuya commented on code in PR #785:
URL: https://github.com/apache/iceberg-rust/pull/785#discussion_r1883721441


##
crates/iceberg/src/arrow/reader.rs:
##
@@ -1180,4 +1194,62 @@ mod tests {
 
 assert_eq!(visitor.field_ids, expected);
 }
+
+#[test]
+fn test_arrow_projection_mask() {

Review Comment:
   I've slightly revised the existing test now. It tests a parquet/arrow schema 
with one supported type, one unsupported type and one with a decimal with a 
precision beyond the supported range. It asserts that projection masks fails 
until the projection leaves out the last two fields.
   
   Granted it's a bit of a stretch to think how this could come up in practice, 
but it does exercise the proposed changes in this PR nonetheless. Let me know 
if this is close to what you had in mind.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] add Status data structure [iceberg-cpp]

2024-12-13 Thread via GitHub



pitrou commented on code in PR #8:
URL: https://github.com/apache/iceberg-cpp/pull/8#discussion_r1883865597


##
CMakeLists.txt:
##
@@ -56,6 +64,11 @@ add_subdirectory(api)
 add_subdirectory(src)
 
 if(ICEBERG_BUILD_TESTS)
+  fetchcontent_declare(googletest

Review Comment:
   This adds a GoogleTest dependency. Was there a decision to use GT? A popular 
alternative is https://github.com/catchorg/Catch2/, I'm not sure which one 
would be better.



##
api/iceberg/util/string_builder.h:
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#pragma once
+
+#include 
+#include 
+
+#include "visibility.h"
+
+namespace iceberg {
+
+namespace util {
+
+namespace detail {
+
+class ICEBERG_EXPORT StringStreamWrapper {

Review Comment:
   Do we really to keep this from Arrow (which was C++11 compatible back then), 
or do we want to adopt a modern formatting library such as 
https://en.cppreference.com/w/cpp/utility/format or a backport thereof?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Failed to read iceberg TPCH generated by snowflake [iceberg-rust]

2024-12-13 Thread via GitHub



Fokko commented on issue #790:
URL: https://github.com/apache/iceberg-rust/issues/790#issuecomment-2540844729

   Yes, we should support this. I think 
https://github.com/apache/iceberg-rust/pull/786 fixes this.
   
   Looking at the schema, only the `decimal` ones are different. And we support 
increased precision, as long as the scale stays the same:
   
   
![image](https://github.com/user-attachments/assets/2a08bc06-6a59-4d20-8fdb-ed530e253238)
   
   Keep in mind that when people migrate from an existing Hive table to Iceberg 
(which is a pretty common scenario), we'll bump into this kind of thing more 
often, and I think that's okay to accommodate that, as long that it is 
perfectly safe.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Failed to read iceberg TPCH generated by snowflake [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on issue #790:
URL: https://github.com/apache/iceberg-rust/issues/790#issuecomment-2540849063

   Thank you @Fokko for this. I will review the PR accordingly!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix loading a table in CachingCatalog with metadata table name [iceberg]

2024-12-13 Thread via GitHub



gaborkaszab commented on PR #11738:
URL: https://github.com/apache/iceberg/pull/11738#issuecomment-2540849458

   I see you have made changes previously in CachingCatalog, @rdblue 
@aokolnychyi @RussellSpitzer . Would you mind taking a look at this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Allow for schema evolution [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on PR #786:
URL: https://github.com/apache/iceberg-rust/pull/786#issuecomment-2540855953

   Hi @Fokko, I'm following up on issue #790.
   
   It appears this PR doesn't address the scenario I described in #790, where 
the underlying parquet schema differs from the declared one. Could schema 
evolution potentially resolve this? I'm concerned we might also need to adjust 
the Arrow schema mappings instead of just project mask.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883552348


##
core/src/main/java/org/apache/iceberg/rest/HTTPRequest.java:
##
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import java.net.URI;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.immutables.value.Value;
+
+/** Represents an HTTP request. */
+@Value.Style(redactedMask = "", depluralize = true)
+@Value.Immutable
+@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"})
+public interface HTTPRequest {
+
+  enum HTTPMethod {
+GET,
+HEAD,
+POST,
+DELETE
+  }
+
+  /**
+   * Returns the base URI configured at the REST client level. The base URI is 
used to construct the
+   * full {@link #requestUri()}.
+   */
+  URI baseUri();
+
+  /**
+   * Returns the full URI of this request. The URI is constructed from the 
base URI, path, and query
+   * parameters. It cannot be modified directly.
+   */
+  @Value.Lazy
+  default URI requestUri() {
+return RESTUtil.buildRequestUri(this);

Review Comment:
   to me it seems like we should rather embed the functionality of building an 
URI from a `HTTPRequest` here (and also do the same for encoding the body). 
Additionally, building an URI from the path does some validation to make sure 
it doesn't start with a `/`. I'd say this is something that we can move into a 
check method annotated with `@Value.Check`. That way we don't need the util 
methods in `RESTUtil` and can move testing into `TestHTTPRequest`. I've added 
an example diff below:
   
   ```
   --- a/core/src/main/java/org/apache/iceberg/rest/HTTPRequest.java
   +++ b/core/src/main/java/org/apache/iceberg/rest/HTTPRequest.java
   @@ -18,10 +18,14 @@
 */
package org.apache.iceberg.rest;
   
   +import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.net.URI;
   +import java.net.URISyntaxException;
import java.util.Map;
import javax.annotation.Nullable;
   +import org.apache.hc.core5.net.URIBuilder;
   +import org.apache.iceberg.exceptions.RESTException;
import org.immutables.value.Value;
   
/** Represents an HTTP request. */
   @@ -49,7 +53,19 @@ public interface HTTPRequest {
   */
  @Value.Lazy
  default URI requestUri() {
   -return RESTUtil.buildRequestUri(this);
   +// if full path is provided, use the input path as path
   +String fullPath =
   +(path().startsWith("https://";) || path().startsWith("http://";))
   +? path()
   +: String.format("%s/%s", baseUri(), path());
   +try {
   +  URIBuilder builder = new 
URIBuilder(RESTUtil.stripTrailingSlash(fullPath));
   +  queryParameters().forEach(builder::addParameter);
   +  return builder.build();
   +} catch (URISyntaxException e) {
   +  throw new RESTException(
   +  "Failed to create request URI from base %s, params %s", fullPath, 
queryParameters());
   +}
  }
   
  /** Returns the HTTP method of this request. */
   @@ -77,7 +93,17 @@ public interface HTTPRequest {
  @Nullable
  @Value.Redacted
  default String encodedBody() {
   -return RESTUtil.encodeRequestBody(this);
   +Object body = body();
   +if (body instanceof Map) {
   +  return RESTUtil.encodeFormData((Map) body);
   +} else if (body != null) {
   +  try {
   +return mapper().writeValueAsString(body);
   +  } catch (JsonProcessingException e) {
   +throw new RESTException(e, "Failed to encode request body: %s", 
body);
   +  }
   +}
   +return null;
  }
   
  /**
   @@ -88,4 +114,13 @@ public interface HTTPRequest {
  default ObjectMapper mapper() {
return RESTObjectMapper.mapper();
  }
   +
   +  @Value.Check
   +  default void check() {
   +if (path().startsWith("/")) {
   +  throw new RESTException(
   +  "Received a malformed path for a REST request: %s. Paths should 
not start with /",
   +  path());
   +}
   +  }
}

Re: [PR] feat: Allow for schema evolution [iceberg-rust]

2024-12-13 Thread via GitHub



Fokko commented on PR #786:
URL: https://github.com/apache/iceberg-rust/pull/786#issuecomment-2540869512

   This is pretty similar. With this PR we first write an int, and then read it 
as a long, which is a valid schema evolution. In the case of #790 we try to 
read a narrower decimal from the Parquet file, into a wider one from the 
Iceberg schema. I haven't tested it with the decimal, that might be a good 
followup 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Allow for schema evolution [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on PR #786:
URL: https://github.com/apache/iceberg-rust/pull/786#issuecomment-2540873200

   Thank you for the explanation! This PR is good enough for me. Let's proceed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Allow for schema evolution [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo merged PR #786:
URL: https://github.com/apache/iceberg-rust/pull/786


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Implement Decimal from/to bytes represents [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on PR #665:
URL: https://github.com/apache/iceberg-rust/pull/665#issuecomment-2540876657

   cc @Fokko, would you like to take another? I believe it's good for merging 
now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Retry object store reads on temporary errors. [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo merged PR #788:
URL: https://github.com/apache/iceberg-rust/pull/788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Data loss bug in MergeIntoCommand [iceberg]

2024-12-13 Thread via GitHub



RussellSpitzer commented on issue #11765:
URL: https://github.com/apache/iceberg/issues/11765#issuecomment-2541337509

   You'll need to elaborate a bit more. What does "losing some of the increment 
new data writes" mean? 
   
   Can you give an example? We can't really debug the generic case since we 
don't even know what you are actually attempting. Other important info would be 
things like what serialization level you are using, what version of Spark, the 
queries you are using, how you are running them etc ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Kafka connect iceberg connector seems to stop after having processed all messages in Kafka topic [iceberg]

2024-12-13 Thread via GitHub



thjaeckle commented on issue #11703:
URL: https://github.com/apache/iceberg/issues/11703#issuecomment-2541339588

   I found out the configuration mistake I did..
   Turns out that `iceberg.connect.group-id` **must** match the configured 
Kafka Connect `consumer.group.id`.
   
   By default, this is `connect-` (both in Kafka Connect and in 
the Iceberg connector).
   
   With 0 prior Kafka Connect knowledge this is however very hard to find out - 
a hint in the documentation that this must match the `consumer.group.id` would 
be helpful.  
   When configuring this, one probably has to adhere to ACLs in place - and 
then it is likely to to wrong ..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]

2024-12-13 Thread via GitHub



nastra commented on code in PR #11769:
URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883856330


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPRequest.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+
+import java.net.URI;
+import java.util.stream.Stream;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.exceptions.RESTException;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.rest.requests.CreateNamespaceRequest;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+class TestHTTPRequest {
+
+  @ParameterizedTest
+  @MethodSource("validRequestUris")
+  public void requestUriSuccess(HTTPRequest request, URI expected) {
+assertThat(request.requestUri()).isEqualTo(expected);
+  }
+
+  public static Stream validRequestUris() {
+return Stream.of(
+Arguments.of(
+ImmutableHTTPRequest.builder()
+.baseUri(URI.create("http://localhost:8080/foo";))
+.method(HTTPRequest.HTTPMethod.GET)
+.path("v1/namespaces/ns/tables/") // trailing slash should be 
removed
+.putQueryParameter("pageToken", "1234")
+.putQueryParameter("pageSize", "10")
+.build(),
+URI.create(
+
"http://localhost:8080/foo/v1/namespaces/ns/tables?pageToken=1234&pageSize=10";)),
+Arguments.of(
+ImmutableHTTPRequest.builder()
+.baseUri(URI.create("http://localhost:8080/foo";))
+.method(HTTPRequest.HTTPMethod.GET)
+.path("https://authserver.com/token";) // absolute path HTTPS
+.build(),
+URI.create("https://authserver.com/token";)),
+Arguments.of(
+ImmutableHTTPRequest.builder()
+.baseUri(URI.create("http://localhost:8080/foo";))
+.method(HTTPRequest.HTTPMethod.GET)
+.path("http://authserver.com/token";) // absolute path HTTP
+.build(),
+URI.create("http://authserver.com/token";)));
+  }
+
+  @Test
+  public void malformedPath() {
+assertThatThrownBy(
+() ->
+ImmutableHTTPRequest.builder()
+.baseUri(URI.create("http://localhost";))
+.method(HTTPRequest.HTTPMethod.GET)
+.path("/v1/namespaces") // wrong leading slash
+.build())
+.isInstanceOf(RESTException.class)
+.hasMessage(
+"Received a malformed path for a REST request: /v1/namespaces. 
Paths should not start with /");
+  }
+
+  @Test
+  public void invalidPath() {
+HTTPRequest request =
+ImmutableHTTPRequest.builder()
+.baseUri(URI.create("http://localhost";))
+.method(HTTPRequest.HTTPMethod.GET)
+.path(" not a valid path") // wrong path
+.build();
+assertThatThrownBy(request::requestUri)
+.isInstanceOf(RESTException.class)
+.hasMessage(
+"Failed to create request URI from base http://localhost/ not a 
valid path, params {}");
+  }
+
+  @ParameterizedTest
+  @MethodSource("validRequestBodies")
+  public void encodedBody(HTTPRequest request, String expected) {
+assertThat(request.encodedBody()).isEqualTo(expected);

Review Comment:
   personally I think the test code would be much more readable without having 
a parameterized test and rather have just two separate test methods that verify 
the body is properly encoded as form data/JSON, but I'm ok with what's 
currently here and I'll leave that up to you.
   
   Also we probably might want to have some tests where the body can't be 
encoded due to:
   * body is null
   * body is a Map with null values
   * body is an invalid JSON



-- 
This is an automated message from the

Re: [I] Data loss bug in MergeIntoCommand [iceberg]

2024-12-13 Thread via GitHub



RussellSpitzer commented on issue #11765:
URL: https://github.com/apache/iceberg/issues/11765#issuecomment-2541339227

   It would also be very helpful to know how you are determining there is data 
loss


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Kafka connect iceberg connector seems to stop after having processed all messages in Kafka topic [iceberg]

2024-12-13 Thread via GitHub



thjaeckle closed issue #11703: Kafka connect iceberg connector seems to stop 
after having processed all messages in Kafka topic
URL: https://github.com/apache/iceberg/issues/11703


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] add Status data structure [iceberg-cpp]

2024-12-13 Thread via GitHub



pitrou commented on PR #8:
URL: https://github.com/apache/iceberg-cpp/pull/8#issuecomment-2541348036

   Also @zhjwpku can you make sure you proof-read your PR description? There 
are spell checkers that can help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-12-13 Thread via GitHub



advancedxy commented on PR #10755:
URL: https://github.com/apache/iceberg/pull/10755#issuecomment-2541354779

   @danielcweeks would you mind to take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2024-12-13 Thread via GitHub



osscm commented on code in PR #11781:
URL: https://github.com/apache/iceberg/pull/11781#discussion_r1884308287


##
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##
@@ -257,17 +257,17 @@ private static class Task implements 
Supplier>>, Closeable {
 @Override
 public Optional> get() {
   try {
+if (queue.size() >= approximateMaxQueueSize) {

Review Comment:
   
https://github.com/apache/iceberg/pull/11781/files#diff-0586c055bbd9d220727cfa60c802321693a959c0cfb3a510376fc5a5bb1dc897R99
   
   `this.taskFutures = new CompletableFuture[2 * 
ThreadPools.WORKER_THREAD_POOL_SIZE];`
   
   here the global thread pool size is driving the taskFuture array size, can 
this leads to more tasks then the available S3 connections?
   
   as I believe each table scan will create a new `ParallelIterable` ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] HiveCatalog incorrectly uses FileIOTracker [iceberg]

2024-12-13 Thread via GitHub



tom-s-powell opened a new issue, #11783:
URL: https://github.com/apache/iceberg/issues/11783

   ### Apache Iceberg version
   
   1.7.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Encountering an issue with `HiveCatalog` and `S3FileIO`. I believe this is 
due to the change introduced in https://github.com/apache/iceberg/pull/10771. 
There is a single `FileIO` instance for the `HiveCatalog` thus shared between 
all `HiveTableOperations`. By "tracking" each `TableOperations` and using 
`weakKeys` cache removal to close the `FileIO` this can lead to subsequent 
`HiveTableOperation` attempting to use a closed `FileIO`. When using `S3FileIO` 
this would lead to this connection pool error below:
   
   ```
   ERROR [2024-12-13T13:22:41.936423Z] 
com.palantir.conjure.java.undertow.runtime.ConjureExceptions: Error handling 
request (_endpoint: resolveVersion, _internalErrorIdentifier: 08f54309, 
_internalErrorType: internal, _requestId: 957c3c716a48658e, _service: 
TableService, errorInstanceId: d7495079-28bd-48fe-af10-f22e94135457, errorName: 
Default:Internal)
   java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at 
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269)
at 
software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75)
at 
software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57)
at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at 
software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
at 
software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254)
at 
software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104)
at 
software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231)
at 
software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228)
at 
software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:67)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39)
at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:52)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:37)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages

Re: [PR] IO Implementation using Go CDK [iceberg-go]

2024-12-13 Thread via GitHub



zeroshade commented on PR #176:
URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542399559

   @loicalleyne following pyiceberg's example, I've added an option to force 
virtual addressing. That work for you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Feat: support aliyun oss backend. [iceberg-go]

2024-12-13 Thread via GitHub



zeroshade commented on PR #216:
URL: https://github.com/apache/iceberg-go/pull/216#issuecomment-2542403956

   This seems generally good to me. Does Aliyun have something similar to how 
MinIO works for S3 that can be added to the integration tests to have CI 
testing the backend? i.e. is there any way to use a docker image to run 
something that simulates the Aliyun backend for testing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Read DVs when reading from .position_deletes table [iceberg]

2024-12-13 Thread via GitHub



aokolnychyi commented on code in PR #11657:
URL: https://github.com/apache/iceberg/pull/11657#discussion_r1884381201


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/DVIterator.java:
##
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.BaseDeleteLoader;
+import org.apache.iceberg.io.CloseableIterator;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Types;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow;
+import org.apache.spark.unsafe.types.UTF8String;
+
+class DVIterator implements CloseableIterator {
+  private final PartitionSpec spec;

Review Comment:
   Do we still need `spec`?



##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/DVIterator.java:
##
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.data.BaseDeleteLoader;
+import org.apache.iceberg.io.CloseableIterator;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Types;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.catalyst.expressions.GenericInternalRow;
+import org.apache.spark.unsafe.types.UTF8String;
+
+class DVIterator implements CloseableIterator {
+  private final PartitionSpec spec;
+  private final DeleteFile deleteFile;
+  private final Schema projection;
+  private final Map idToConstant;
+  private final Iterator positions;
+  private Integer deletedPositionIndex;
+  private GenericInternalRow row;
+
+  DVIterator(
+  InputFile inputFile,
+  DeleteFile deleteFile,
+  PartitionSpec spec,
+  Schema projection,
+  Map idToConstant) {
+this.deleteFile = deleteFile;
+this.spec = spec;
+this.projection = projection;
+this.idToConstant = idToConstant;
+List pos = Lists.newArrayList();
+new BaseDeleteLoader(ignored -> inputFile)
+.loadPositionDeletes(ImmutableList.of(deleteFile), 
deleteFile.referencedDataFile())
+.forEach(pos::add);
+this.positions = pos.iterator();
+  }
+
+  @Override
+  public boolean hasNext() {
+return positions.hasNext();
+  }
+
+  @Override
+  public InternalRow next() {
+long position = positions.next();
+
+if (null == row) {
+  List rowValues = Lists.newArrayList();
+  for (Types.NestedField column : projection.columns()) {
+int fieldId = column.fieldId();
+if (fieldId == MetadataColumns.DELETE_FILE_PATH.fieldId()) {
+  
rowValues.add(UTF

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-12-13 Thread via GitHub



aokolnychyi commented on code in PR #11620:
URL: https://github.com/apache/iceberg/pull/11620#discussion_r1884367833


##
core/src/main/java/org/apache/iceberg/TableUtil.java:
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public class TableUtil {
+  private TableUtil() {}
+
+  /** Returns the format version of the given table */
+  public static int formatVersion(Table table) {
+Preconditions.checkArgument(null != table, "Invalid table: null");
+
+if (table instanceof SerializableTable) {
+  SerializableTable serializableTable = (SerializableTable) table;
+  return serializableTable.formatVersion();
+} else if (table instanceof HasTableOperations) {
+  return ((HasTableOperations) 
table).operations().current().formatVersion();

Review Comment:
   Optional: Same comment about an extra variable for `TableOperations`.



##
core/src/main/java/org/apache/iceberg/SerializableTable.java:
##
@@ -158,6 +160,21 @@ public Map properties() {
 return properties;
   }
 
+  public int formatVersion() {
+return formatVersion;
+  }
+
+  private int formatVersion(Table table) {
+if (table instanceof HasTableOperations) {
+  return ((HasTableOperations) 
table).operations().current().formatVersion();
+} else {
+  // formatVersion=-1 will never be used/returned, because

Review Comment:
   Optional: We may actually avoid the override by doing something like this 
below.
   
   ```
   public int formatVersion() {
 if (formatVersion == UNKNOWN_FORMAT_VERSION) {
   throw new UnsupportedOperationException("Format version is unknown");
 }
 return formatVersion;
   }
   
   private int formatVersion(Table table) {
 if (table instanceof HasTableOperations) {
   ...
 } else {
   return UNKNOWN_FORMAT_VERSION;
 }
   }
   ```
   
   This would also cover cases for regular tables that didn't have 
`TableOperations` under the hood.



##
core/src/main/java/org/apache/iceberg/SerializableTable.java:
##
@@ -158,6 +160,21 @@ public Map properties() {
 return properties;
   }
 
+  public int formatVersion() {
+return formatVersion;
+  }
+
+  private int formatVersion(Table table) {
+if (table instanceof HasTableOperations) {
+  return ((HasTableOperations) 
table).operations().current().formatVersion();

Review Comment:
   Optional: What about an extra variable for `TableOperations`? We have an 
explicit casts + 3 method invocations, it may be easier to read with code on 
multiple lines and it would be consistent with the `metadataFileLocation` 
method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]

2024-12-13 Thread via GitHub



aokolnychyi commented on code in PR #11620:
URL: https://github.com/apache/iceberg/pull/11620#discussion_r1884369789


##
core/src/main/java/org/apache/iceberg/SerializableTable.java:
##
@@ -158,6 +160,21 @@ public Map properties() {
 return properties;
   }
 
+  public int formatVersion() {
+return formatVersion;
+  }
+
+  private int formatVersion(Table table) {
+if (table instanceof HasTableOperations) {
+  return ((HasTableOperations) 
table).operations().current().formatVersion();

Review Comment:
   Optional: What about an extra variable for `TableOperations`? We have an 
explicit casts and 3 method invocations on a single line. An extra variable may 
be easier to read and would be consistent with `metadataFileLocation`.



##
core/src/main/java/org/apache/iceberg/SerializableTable.java:
##
@@ -158,6 +160,21 @@ public Map properties() {
 return properties;
   }
 
+  public int formatVersion() {
+return formatVersion;
+  }
+
+  private int formatVersion(Table table) {
+if (table instanceof HasTableOperations) {
+  return ((HasTableOperations) 
table).operations().current().formatVersion();
+} else {
+  // formatVersion=-1 will never be used/returned, because

Review Comment:
   Optional: We may actually avoid the override by doing something like below.
   
   ```
   public int formatVersion() {
 if (formatVersion == UNKNOWN_FORMAT_VERSION) {
   throw new UnsupportedOperationException("Format version is unknown");
 }
 return formatVersion;
   }
   
   private int formatVersion(Table table) {
 if (table instanceof HasTableOperations) {
   ...
 } else {
   return UNKNOWN_FORMAT_VERSION;
 }
   }
   ```
   
   This would also cover cases for regular tables that didn't have 
`TableOperations` under the hood.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: TableMetadata Statistic Files [iceberg-rust]

2024-12-13 Thread via GitHub



c-thiel commented on code in PR #799:
URL: https://github.com/apache/iceberg-rust/pull/799#discussion_r1884298259


##
crates/iceberg/src/catalog/mod.rs:
##
@@ -446,6 +446,30 @@ pub enum TableUpdate {
 /// Properties to remove
 removals: Vec,
 },
+/// Set statistics for a snapshot
+#[serde(with = "_serde_set_statistics")]
+SetStatistics {
+/// File containing the statistics
+statistics: StatisticsFile,
+},

Review Comment:
   In IRC and Java `SetStatistics` has an additional field `snapshot_id` 
([link](https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/open-api/rest-catalog-open-api.yaml#L2902)).
   This field is redundant with `StatisticsFile.snapshot_id` and is only used 
as an [assertion in 
Java](https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1314).
   
   I removed the redundancy for rust and will start a discussion on the mailing 
List how to handle this.
   
   As we still need to be spec compliant, we need custom serializer / 
deserializer.
   
   Slack: 
https://apache-iceberg.slack.com/archives/C03LG1D563F/p1734109745807119
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive: Add Hive 4 support and remove Hive runtime [iceberg]

2024-12-13 Thread via GitHub



rdblue commented on code in PR #11750:
URL: https://github.com/apache/iceberg/pull/11750#discussion_r1884343692


##
gradle.properties:
##
@@ -18,8 +18,8 @@ jmhJsonOutputPath=build/reports/jmh/results.json
 jmhIncludeRegex=.*
 systemProp.defaultFlinkVersions=1.20
 systemProp.knownFlinkVersions=1.18,1.19,1.20
-systemProp.defaultHiveVersions=2
-systemProp.knownHiveVersions=2,3
+systemProp.defaultHiveVersions=4
+systemProp.knownHiveVersions=4

Review Comment:
   This is no longer needed, right? This only applied to the runtime modules.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] feat: TableMetadata Statistics [iceberg-rust]

2024-12-13 Thread via GitHub



c-thiel opened a new pull request, #799:
URL: https://github.com/apache/iceberg-rust/pull/799

   Adds `StatisticFile` and `PartitionStatisticsFile` to spec, builder and REST 
TableUpdate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] IO Implementation using Go CDK [iceberg-go]

2024-12-13 Thread via GitHub



loicalleyne commented on PR #176:
URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542287393

   Is it intended to not provide the choice between virtual hosted bucket 
addressing and path-style addressing? 
   LGTM otherwise - the tests are passing :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2024-12-13 Thread via GitHub



sopel39 commented on code in PR #11781:
URL: https://github.com/apache/iceberg/pull/11781#discussion_r1884520727


##
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##
@@ -257,17 +257,17 @@ private static class Task implements 
Supplier>>, Closeable {
 @Override
 public Optional> get() {
   try {
+if (queue.size() >= approximateMaxQueueSize) {

Review Comment:
   > here the global thread pool size is driving the taskFuture array size, can 
this leads to more tasks then the available S3 connections?
   
   Correct
   
   > as I believe each table scan will create a new ParallelIterable ?
   
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add plan tasks for TableScan [iceberg-python]

2024-12-13 Thread via GitHub



corleyma commented on code in PR #1427:
URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884523394


##
pyiceberg/table/__init__.py:
##
@@ -1229,7 +1240,8 @@ def with_case_sensitive(self: S, case_sensitive: bool = 
True) -> S:
 
 
 class ScanTask(ABC):
-pass
+def size_in_bytes(self) -> int:

Review Comment:
   would it be better to mark this method as abstract?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add plan tasks for TableScan [iceberg-python]

2024-12-13 Thread via GitHub



corleyma commented on code in PR #1427:
URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884524371


##
pyiceberg/table/__init__.py:
##
@@ -1253,6 +1265,22 @@ def __init__(
 self.start = start or 0
 self.length = length or data_file.file_size_in_bytes
 
+def size_in_bytes(self) -> int:
+return self.length + sum(f.file_size_in_bytes for f in 
self.delete_files)
+
+
+@dataclass(init=False)
+class CombinedFileScanTask(ScanTask):
+"""Task representing combined multiple file scan tasks."""

Review Comment:
   I think this could use a better docstring. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2024-12-13 Thread via GitHub



RussellSpitzer commented on code in PR #11781:
URL: https://github.com/apache/iceberg/pull/11781#discussion_r1884527019


##
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##
@@ -257,17 +257,17 @@ private static class Task implements 
Supplier>>, Closeable {
 @Override
 public Optional> get() {
   try {
+if (queue.size() >= approximateMaxQueueSize) {
+  // Yield when queue is over the size limit. Task will be resubmitted 
later and continue
+  // the work.
+  return Optional.of(this);
+}
+
 if (iterator == null) {
   iterator = input.iterator();
 }
 
 while (iterator.hasNext()) {
-  if (queue.size() >= approximateMaxQueueSize) {

Review Comment:
   Isn't this a pretty significant change in behavior or ParallelIterable? I 
assume we have some performance implications of not fully loading the queue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Data loss bug in MergeIntoCommand [iceberg]

2024-12-13 Thread via GitHub



RussellSpitzer commented on issue #11765:
URL: https://github.com/apache/iceberg/issues/11765#issuecomment-2542309392


   > However, sir, I might have discovered some issues. When executing the 
COW-MERGE-INTO command, Spark needs to use the ods_table twice. The first time 
is to match data files based on incremental records, and the second time is to 
perform the actual data merge. If the data in the ods_table changes between the 
first and second usage, I would like to know if this could lead to abnormal 
execution results? What would happen if the data in the ods_table suddenly 
increases? What about if the data in the ods_table suddenly decreases?
   > 
   > Example: ods_table is a partitioned table, during the execution of the 
merge-into statement, someone adds or deletes partitions.
   
   Yes this is true, the relation which is created of source data must remain 
constant through the two different passes of the source data. The Target 
(Iceberg Table) can change. If the query would return different results I 
believe you could see odd behavior. 
   
   We have some hooks to prevent this when the source is Iceberg I think but I 
don't believe we have any for non Iceberg sources. I may be forgetting 
something else but @aokolnychyi probably knows. I believe the workaround here 
is for a non-idempotent Subquery you should cache or persist it prior to 
merging.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] add .gitignore [iceberg-cpp]

2024-12-13 Thread via GitHub



pitrou commented on code in PR #9:
URL: https://github.com/apache/iceberg-cpp/pull/9#discussion_r1884280319


##
.gitignore:
##
@@ -0,0 +1,18 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+build/

Review Comment:
   Ok, fair enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat(puffin): Parse Puffin FileMetadata [iceberg-rust]

2024-12-13 Thread via GitHub



c-thiel commented on PR #765:
URL: https://github.com/apache/iceberg-rust/pull/765#issuecomment-2541965685

   @fqaiser94, just added the higher level statistic files in 
https://github.com/apache/iceberg-rust/pull/799 FYI. I would guess you would 
end up building those soon too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]

2024-12-13 Thread via GitHub



rdblue commented on PR #11415:
URL: https://github.com/apache/iceberg/pull/11415#issuecomment-2542017239

   The Spark failures are a port conflict. I think it's unrelated to these 
changes. We'll see the next time CI runs (I'm sure we'll have more changes to 
trigger them)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add plan tasks for TableScan [iceberg-python]

2024-12-13 Thread via GitHub



corleyma commented on code in PR #1427:
URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884533676


##
pyiceberg/table/__init__.py:
##
@@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]:
 for data_entry in data_entries
 ]
 
+def _target_split_size(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, 
TableProperties.READ_SPLIT_SIZE_DEFAULT
+)
+return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, 
table_value)  # type: ignore
+
+def _loop_back(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties, 
TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT
+)
+return property_as_int(self.options, 
TableProperties.READ_SPLIT_LOOKBACK, table_value)  # type: ignore
+
+def _split_open_file_cost(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties,
+TableProperties.READ_SPLIT_OPEN_FILE_COST,
+TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT,
+)
+return property_as_int(self.options, 
TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value)  # type: ignore
+
+def plan_task(self) -> Iterable[CombinedFileScanTask]:

Review Comment:
   I assume it's intentional that we're not actually calling this in any of the 
methods (like to_arrow, etc) that actually execute a scan?
   
   If the plan is to call this in `to_arrow` eventually, it would be good to 
have some benchmarks with realistic latencies (e.g., against actual object 
storage).
   
   If there is no plan to call this directly in pyiceberg, I wonder who the 
intended consumers of this API would be?  I would expect most query engines  -- 
distributed or otherwise -- to have their own notions for how to optimize scan 
planning.



##
pyiceberg/table/__init__.py:
##
@@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]:
 for data_entry in data_entries
 ]
 
+def _target_split_size(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, 
TableProperties.READ_SPLIT_SIZE_DEFAULT
+)
+return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, 
table_value)  # type: ignore
+
+def _loop_back(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties, 
TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT
+)
+return property_as_int(self.options, 
TableProperties.READ_SPLIT_LOOKBACK, table_value)  # type: ignore
+
+def _split_open_file_cost(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties,
+TableProperties.READ_SPLIT_OPEN_FILE_COST,
+TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT,
+)
+return property_as_int(self.options, 
TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value)  # type: ignore
+
+def plan_task(self) -> Iterable[CombinedFileScanTask]:

Review Comment:
   I assume it's intentional that we're not actually calling this in any of the 
methods (like `to_arrow`, etc) that actually execute a scan?
   
   If the plan is to call this in `to_arrow` eventually, it would be good to 
have some benchmarks with realistic latencies (e.g., against actual object 
storage).
   
   If there is no plan to call this directly in pyiceberg, I wonder who the 
intended consumers of this API would be?  I would expect most query engines  -- 
distributed or otherwise -- to have their own notions for how to optimize scan 
planning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add plan tasks for TableScan [iceberg-python]

2024-12-13 Thread via GitHub



corleyma commented on code in PR #1427:
URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884536195


##
pyiceberg/table/__init__.py:
##
@@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]:
 for data_entry in data_entries
 ]
 
+def _target_split_size(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, 
TableProperties.READ_SPLIT_SIZE_DEFAULT
+)
+return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, 
table_value)  # type: ignore
+
+def _loop_back(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties, 
TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT
+)
+return property_as_int(self.options, 
TableProperties.READ_SPLIT_LOOKBACK, table_value)  # type: ignore
+
+def _split_open_file_cost(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties,
+TableProperties.READ_SPLIT_OPEN_FILE_COST,
+TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT,
+)
+return property_as_int(self.options, 
TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value)  # type: ignore
+
+def plan_task(self) -> Iterable[CombinedFileScanTask]:

Review Comment:
Also, assuming there's still some future plan to offload expensive stuff 
like data scanning to a rust core, I do wonder if we want to commit to exposing 
additional scan planning APIs that may not align with whatever query engine 
gets embedded in pyiceberg?

   In the case that this is primarily intended to be for the benefit of 
pyiceberg's own scan execution I would consider making this a private API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2024-12-13 Thread via GitHub



sopel39 commented on code in PR #11781:
URL: https://github.com/apache/iceberg/pull/11781#discussion_r1884535343


##
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##
@@ -257,17 +257,17 @@ private static class Task implements 
Supplier>>, Closeable {
 @Override
 public Optional> get() {
   try {
+if (queue.size() >= approximateMaxQueueSize) {
+  // Yield when queue is over the size limit. Task will be resubmitted 
later and continue
+  // the work.
+  return Optional.of(this);
+}
+
 if (iterator == null) {
   iterator = input.iterator();
 }
 
 while (iterator.hasNext()) {
-  if (queue.size() >= approximateMaxQueueSize) {

Review Comment:
   The queue should still be bounded albeit not strictly up to 
`approximateMaxQueueSize`.
   
   > I assume we have some performance implications of not fully loading the 
queue?
   
   What do you mean by not fully loading the queue? This PR guarantees that 
`Task` that is started will complete and not hold external resources.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add plan tasks for TableScan [iceberg-python]

2024-12-13 Thread via GitHub



corleyma commented on code in PR #1427:
URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884536195


##
pyiceberg/table/__init__.py:
##
@@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]:
 for data_entry in data_entries
 ]
 
+def _target_split_size(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, 
TableProperties.READ_SPLIT_SIZE_DEFAULT
+)
+return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, 
table_value)  # type: ignore
+
+def _loop_back(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties, 
TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT
+)
+return property_as_int(self.options, 
TableProperties.READ_SPLIT_LOOKBACK, table_value)  # type: ignore
+
+def _split_open_file_cost(self) -> int:
+table_value = property_as_int(
+self.table_metadata.properties,
+TableProperties.READ_SPLIT_OPEN_FILE_COST,
+TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT,
+)
+return property_as_int(self.options, 
TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value)  # type: ignore
+
+def plan_task(self) -> Iterable[CombinedFileScanTask]:

Review Comment:
   Also, assuming there's still some future plan to offload expensive stuff 
like data scanning to a rust core, I do wonder if we want to commit to exposing 
additional scan planning APIs that may not align with whatever query engine 
gets embedded in pyiceberg?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Hivemetastore unable to create hive lock after upgrading from hivemetastore 3.1.3 to 4.0.0 during iceberg operations [iceberg]

2024-12-13 Thread via GitHub



mAlf1999 opened a new issue, #11784:
URL: https://github.com/apache/iceberg/issues/11784

   ### Apache Iceberg version
   
   1.6.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   We are currently using iceberg version 1.6.0 and have been successfully 
using it along with hivemetastore version 3.1.3 with spark as the query engine. 
Recently, we went about upgrading hivemetastore to version 4.0.0, and since 
then we have been running into an issue where the hivemetastore is unable to 
create hivelock.
   
   The logs from the spark job are as follows:
   WARN MetastoreLock: Failed to create lock 
LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, 
dbname:test_hive, tablename:my_table, operationType:UNSET)], user:spark, 
hostname:my-driver, agentInfo:Iceberg-8de7e5ba-3be2-47e3-ae28-8d22fhghbfa6)
   org.apache.thrift.TApplicationException: Internal error processing lock
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:4678)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:4665)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2153)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
at jdk.proxy2/jdk.proxy2.$Proxy54.lock(Unknown Source)
at 
org.apache.iceberg.hive.MetastoreLock.lambda$createLock$3(MetastoreLock.java:305)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:72)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:65)
at 
org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
at 
org.apache.iceberg.hive.MetastoreLock.lambda$createLock$4(MetastoreLock.java:305)
at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
at 
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
at 
org.apache.iceberg.hive.MetastoreLock.createLock(MetastoreLock.java:302)
at 
org.apache.iceberg.hive.MetastoreLock.acquireLock(MetastoreLock.java:185)
at org.apache.iceberg.hive.MetastoreLock.lock(MetastoreLock.java:146)
at 
org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:184)
at 
org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:128)
at 
org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:412)
at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
at 
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:384)
at 
org.apache.iceberg.BaseReplacePartitions.commit(BaseReplacePartitions.java:26)
at 
org.apache.iceberg.spark.source.SparkWrite.commitOperation(SparkWrite.java:233)
at 
org.apache.iceberg.spark.source.SparkWrite.access$1300(SparkWrite.java:84)
at 
org.apache.iceberg.spark.source.SparkWrite$DynamicOverwrite.commit(SparkWrite.java:337)
at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:399)
at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:359)
at 
org.apache.spark.sql.execution.datasources.v2.OverwritePartitionsDynamicExec.writeWithV2(WriteToDataSourceV2Exec.scala:260)
at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:337)
at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:336)
at 
org.apache.spark.sql.execution.datasources.v2.OverwritePartitionsDynamicExec.run(WriteToDataSourceV2Exec.scala:260)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfu

Re: [PR] IO Implementation using Go CDK [iceberg-go]

2024-12-13 Thread via GitHub



zeroshade commented on PR #176:
URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542213486

   @loicalleyne can you take a look at the latest changes I made here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Doc: Add staus page for different implementations. [iceberg]

2024-12-13 Thread via GitHub



RussellSpitzer commented on code in PR #11772:
URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884491976


##
site/docs/status.md:
##
@@ -0,0 +1,362 @@
+---
+title: "Implementation Status"
+---
+
+
+# Implementations Status
+
+Apache iceberg now has implementations of the iceberg spec in multiple 
languages. This page provides a summary of the
+current status of these implementations.

Review Comment:
   ```suggestion
   The Apache Iceberg spec has multiple language and engine implementations. 
This page provides an overview of their current capabilities.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Doc: Add staus page for different implementations. [iceberg]

2024-12-13 Thread via GitHub



RussellSpitzer commented on code in PR #11772:
URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884493485


##
site/docs/status.md:
##
@@ -0,0 +1,362 @@
+---
+title: "Implementation Status"
+---
+
+
+# Implementations Status
+
+Apache iceberg now has implementations of the iceberg spec in multiple 
languages. This page provides a summary of the
+current status of these implementations.
+
+## Versions

Review Comment:
   Do we also want to note when the following table was last updated, or should 
we just have people check the commit time



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Doc: Add staus page for different implementations. [iceberg]

2024-12-13 Thread via GitHub



RussellSpitzer commented on code in PR #11772:
URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884494396


##
site/docs/status.md:
##
@@ -0,0 +1,362 @@
+---
+title: "Implementation Status"
+---
+
+
+# Implementations Status
+
+Apache iceberg now has implementations of the iceberg spec in multiple 
languages. This page provides a summary of the
+current status of these implementations.
+
+## Versions
+
+This section describes the versions of each implementation that are being 
tracked in this page.
+
+| Language  | Version |
+|---|-|
+| Java  | 1.7.1   |
+| PyIceberg | 0.7.0   |
+| Rust  | 0.3.0   |
+| Go| 0.1.0   |
+
+## Data Types
+
+| Data Type  | Java | PyIceberg | Rust | Go |
+||--|---|--||
+| boolean| Y| Y | Y| Y  |
+| int| Y| Y | Y| Y  |
+| long   | Y| Y | Y| Y  |
+| float  | Y| Y | Y| Y  |
+| double | Y| Y | Y| Y  |
+| decimal| Y| Y | Y| Y  |
+| date   | Y| Y | Y| Y  |
+| time   | Y| Y | Y| Y  |
+| timestamp  | Y| Y | Y| Y  |
+| timestamptz| Y| Y | Y| Y  |
+| timestamp_ns   | Y| Y | Y| Y  |
+| timestamptz_ns | Y| Y | Y| Y  |
+| string | Y| Y | Y| Y  |
+| uuid   | Y| Y | Y| Y  |
+| fixed  | Y| Y | Y| Y  |
+| binary | Y| Y | Y| Y  |
+
+## Data File Formats
+
+| Format  | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Parquet | Y| Y | Y| Y  |
+| ORC | Y| N | N| N  |
+| Puffin  | Y| N | N| N  |

Review Comment:
   Yeah i'd probably break this out into the actual Puffin capabilities instead
   
   "Table Stats, Delete Vectors, ect"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Doc: Add staus page for different implementations. [iceberg]

2024-12-13 Thread via GitHub



RussellSpitzer commented on code in PR #11772:
URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884494856


##
site/docs/status.md:
##
@@ -0,0 +1,362 @@
+---
+title: "Implementation Status"
+---
+
+
+# Implementations Status
+
+Apache iceberg now has implementations of the iceberg spec in multiple 
languages. This page provides a summary of the
+current status of these implementations.
+
+## Versions
+
+This section describes the versions of each implementation that are being 
tracked in this page.
+
+| Language  | Version |
+|---|-|
+| Java  | 1.7.1   |
+| PyIceberg | 0.7.0   |
+| Rust  | 0.3.0   |
+| Go| 0.1.0   |
+
+## Data Types
+
+| Data Type  | Java | PyIceberg | Rust | Go |
+||--|---|--||
+| boolean| Y| Y | Y| Y  |
+| int| Y| Y | Y| Y  |
+| long   | Y| Y | Y| Y  |
+| float  | Y| Y | Y| Y  |
+| double | Y| Y | Y| Y  |
+| decimal| Y| Y | Y| Y  |
+| date   | Y| Y | Y| Y  |
+| time   | Y| Y | Y| Y  |
+| timestamp  | Y| Y | Y| Y  |
+| timestamptz| Y| Y | Y| Y  |
+| timestamp_ns   | Y| Y | Y| Y  |
+| timestamptz_ns | Y| Y | Y| Y  |
+| string | Y| Y | Y| Y  |
+| uuid   | Y| Y | Y| Y  |
+| fixed  | Y| Y | Y| Y  |
+| binary | Y| Y | Y| Y  |
+
+## Data File Formats
+
+| Format  | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Parquet | Y| Y | Y| Y  |
+| ORC | Y| N | N| N  |
+| Puffin  | Y| N | N| N  |
+
+## File IO
+
+| Storage  | Java | PyIceberg | Rust | Go |
+|--|--|---|--||
+| Local Filesystem | Y| Y | Y| Y  |
+| Hadoop Filesystem| Y| Y | Y| Y  |
+| Aws S3   | Y| Y | Y| Y  |

Review Comment:
   S3 Compatible
   GCS Compatible
   Memory FS? < Do we want to include this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] ci(infra): Remove sha256 [iceberg-go]

2024-12-13 Thread via GitHub



zeroshade merged PR #226:
URL: https://github.com/apache/iceberg-go/pull/226


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] IO Implementation using Go CDK [iceberg-go]

2024-12-13 Thread via GitHub



loicalleyne commented on PR #176:
URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542428675

   LGTM 👍


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] IO Implementation using Go CDK [iceberg-go]

2024-12-13 Thread via GitHub



zeroshade commented on PR #176:
URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542436725

   @dwilson1988 When you get a chance, can you take a look at the changes I 
made here. I liked your thought on isolating things, but there was still a 
bunch of specific options for particular bucket types that needed to get 
accounted for as the options are not always passed via URL due to how Iceberg 
config properties work.
   
   So I'd like your thoughts or comments on what I ultimately came up with to 
simplify what @loicalleyne already had while solving the failing tests and 
whether it fits what you were thinking and using internally. Once this is 
merged, I'd definitely greatly appreciate contributions for Azure as you said :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: TableMetadata Statistic Files [iceberg-rust]

2024-12-13 Thread via GitHub



c-thiel commented on code in PR #799:
URL: https://github.com/apache/iceberg-rust/pull/799#discussion_r1884798642


##
crates/iceberg/src/spec/table_metadata.rs:
##
@@ -158,11 +160,15 @@ pub struct TableMetadata {
 /// writers, but is not used when reading because reads use the specs
 /// stored in manifest files.
 pub(crate) default_sort_order_id: i64,
-///A map of snapshot references. The map keys are the unique snapshot 
reference
+/// A map of snapshot references. The map keys are the unique snapshot 
reference
 /// names in the table, and the map values are snapshot reference objects.
 /// There is always a main branch reference pointing to the 
current-snapshot-id
 /// even if the refs map is null.
 pub(crate) refs: HashMap,
+/// Mapping of snapshot ids to statistics files.
+pub(crate) statistics: HashMap,

Review Comment:
   Can we ever have more than one `StatisticsFile` for a `snapshot_id`?
   In java this is modeled as a mapping `snapshot_id` : `Vec`, 
however I couldn't find a way to get more than one element into the `Vec` for a 
`snapshot_id` other than deserializing.
   
https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1310-L1320



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on code in PR #795:
URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1884868240


##
crates/iceberg/src/spec/values.rs:
##
@@ -3439,11 +3443,13 @@ mod tests {
 "bar".to_string(),
 ))),
 None,
+Some(Literal::Primitive(PrimitiveLiteral::Int(1000))),
 ])),
 &Type::Struct(StructType::new(vec![
 NestedField::required(2, "id", 
Type::Primitive(PrimitiveType::Int)).into(),
 NestedField::optional(3, "name", 
Type::Primitive(PrimitiveType::String)).into(),
 NestedField::optional(4, "address", 
Type::Primitive(PrimitiveType::String)).into(),
+NestedField::required(5, "extra", 
Type::Primitive(PrimitiveType::Int)).into(),

Review Comment:
   > Yes, but I find that it can also pass originally. I'm trying to find the 
test case that can't pass originally.
   
   Thank you, that will be really meaningful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] fix: set key_metadata to Null by default [iceberg-rust]

2024-12-13 Thread via GitHub



feniljain commented on code in PR #800:
URL: https://github.com/apache/iceberg-rust/pull/800#discussion_r1884869867


##
crates/iceberg/src/expr/visitors/expression_evaluator.rs:
##
@@ -338,7 +338,7 @@ mod tests {
 nan_value_counts: HashMap::new(),
 lower_bounds: HashMap::new(),
 upper_bounds: HashMap::new(),
-key_metadata: vec![],
+key_metadata: Some(vec![]),

Review Comment:
   They were tests so didn't matter much, but I can change all to `None`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] fix: set key_metadata to Null by default [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on code in PR #800:
URL: https://github.com/apache/iceberg-rust/pull/800#discussion_r1884868013


##
crates/iceberg/src/expr/visitors/expression_evaluator.rs:
##
@@ -338,7 +338,7 @@ mod tests {
 nan_value_counts: HashMap::new(),
 lower_bounds: HashMap::new(),
 upper_bounds: HashMap::new(),
-key_metadata: vec![],
+key_metadata: Some(vec![]),

Review Comment:
   Do we need to use `None` by default?



##
crates/iceberg/src/spec/manifest.rs:
##
@@ -1134,7 +1134,7 @@ impl DataFile {
 &self.upper_bounds
 }
 /// Get the Implementation-specific key metadata for the data file.
-pub fn key_metadata(&self) -> &[u8] {
+pub fn key_metadata(&self) -> &Option> {

Review Comment:
   It's better to return `Option<&[u8]>` to make it easier to use.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Doc: Add staus page for different implementations. [iceberg]

2024-12-13 Thread via GitHub



liurenjie1024 commented on code in PR #11772:
URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884790587


##
site/docs/status.md:
##
@@ -0,0 +1,362 @@
+---
+title: "Implementation Status"
+---
+
+
+# Implementations Status
+
+Apache iceberg now has implementations of the iceberg spec in multiple 
languages. This page provides a summary of the
+current status of these implementations.
+
+## Versions
+
+This section describes the versions of each implementation that are being 
tracked in this page.
+
+| Language  | Version |
+|---|-|
+| Java  | 1.7.1   |
+| PyIceberg | 0.7.0   |
+| Rust  | 0.3.0   |
+| Go| 0.1.0   |
+
+## Data Types
+
+| Data Type  | Java | PyIceberg | Rust | Go |
+||--|---|--||
+| boolean| Y| Y | Y| Y  |
+| int| Y| Y | Y| Y  |
+| long   | Y| Y | Y| Y  |
+| float  | Y| Y | Y| Y  |
+| double | Y| Y | Y| Y  |
+| decimal| Y| Y | Y| Y  |
+| date   | Y| Y | Y| Y  |
+| time   | Y| Y | Y| Y  |
+| timestamp  | Y| Y | Y| Y  |
+| timestamptz| Y| Y | Y| Y  |
+| timestamp_ns   | Y| Y | Y| Y  |
+| timestamptz_ns | Y| Y | Y| Y  |
+| string | Y| Y | Y| Y  |
+| uuid   | Y| Y | Y| Y  |
+| fixed  | Y| Y | Y| Y  |
+| binary | Y| Y | Y| Y  |
+
+## Data File Formats
+
+| Format  | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Parquet | Y| Y | Y| Y  |
+| ORC | Y| N | N| N  |
+| Puffin  | Y| N | N| N  |
+
+## File IO
+
+| Storage  | Java | PyIceberg | Rust | Go |
+|--|--|---|--||
+| Local Filesystem | Y| Y | Y| Y  |
+| Hadoop Filesystem| Y| Y | Y| Y  |
+| Aws S3   | Y| Y | Y| Y  |
+| Google Cloud Storage | Y| Y | Y| Y  |
+| Memory Fs| Y| Y | Y| Y  |
+
+## Table Update Operations
+
+### Table Spec V1
+
+| Operation   | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Update schema   | Y| N | Y| N  |
+| Update partition spec   | Y| N | Y| N  |
+| Update table properties | Y| Y | Y| N  |
+| Replace sort order  | Y| N | N| N  |
+| Update table location   | Y| N | N| N  |
+| Append data files   | Y| Y | N| N  |
+| Rewrite files   | Y| Y | N| N  |
+| Rewrite manifests   | Y| Y | N| N  |
+| Overwrite files | Y| Y | N| N  |
+| Row delta   | Y| N | N| N  |
+| Delete files| Y| N | N| N  |
+| Update statistics   | Y| N | N| N  |
+| Update partition statistics | Y| N | N| N  |
+| Expire snapshots| Y| N | N| N  |
+| Manage snapshots| Y| N | N| N  |
+
+### Table Spec V2
+
+| Operation   | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Update schema   | Y| Y | N| N  |
+| Update partition spec   | Y| Y | N| N  |
+| Update table properties | Y| Y | Y| N  |
+| Replace sort order  | Y| N | N| N  |
+| Update table location   | Y| N | N| N  |
+| Append data files   | Y| Y | N| N  |
+| Rewrite files   | Y| Y | N| N  |
+| Rewrite manifests   | Y| Y | N| N  |
+| Overwrite files | Y| Y | N| N  |
+| Row delta   | Y| N | N| N  |
+| Delete files| Y| Y | N| N  |
+| Update statistics   | Y| N | N| N  |
+| Update partition statistics | Y| N | N| N  |
+| Expire snapshots| Y| N | N| N  |
+| Manage snapshots| Y| N | N| N  |
+
+## Table Read Operations
+
+### Table Spec V1
+
+| Operation   | Java | PyIceberg | Rust | Go |
+|-|--|---|--||
+| Plan with data file | Y| Y | Y| Y  |
+| Read data file  | Y| N | Y| N  |
+
+### Table Spec V2
+
+| Operation  | Java | PyIceberg | Rust | Go |
+||--|---|--||
+| Plan with data file| Y|

Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]

2024-12-13 Thread via GitHub



Xuanwo commented on code in PR #795:
URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1884848752


##
crates/iceberg/src/spec/values.rs:
##
@@ -3439,11 +3443,13 @@ mod tests {
 "bar".to_string(),
 ))),
 None,
+Some(Literal::Primitive(PrimitiveLiteral::Int(1000))),
 ])),
 &Type::Struct(StructType::new(vec![
 NestedField::required(2, "id", 
Type::Primitive(PrimitiveType::Int)).into(),
 NestedField::optional(3, "name", 
Type::Primitive(PrimitiveType::String)).into(),
 NestedField::optional(4, "address", 
Type::Primitive(PrimitiveType::String)).into(),
+NestedField::required(5, "extra", 
Type::Primitive(PrimitiveType::Int)).into(),

Review Comment:
   Hi, would you like to add a new test that cover the mis-order cases?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: support to append delete type data file [iceberg-rust]

2024-12-13 Thread via GitHub



ZENOTME commented on PR #798:
URL: https://github.com/apache/iceberg-rust/pull/798#issuecomment-2542902331

   Sorry, I think I have some misunderstanding here since the action which 
support to append data file and delete file is RowDelta.🤔 So I guess what we 
need is right:
   1. MergingSnapshotProducer: 
https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L61
   2. RowDelta: 
https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/api/src/main/java/org/apache/iceberg/RowDelta.java#L32


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]

2024-12-13 Thread via GitHub



ZENOTME commented on code in PR #795:
URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1884853773


##
crates/iceberg/src/spec/values.rs:
##
@@ -3439,11 +3443,13 @@ mod tests {
 "bar".to_string(),
 ))),
 None,
+Some(Literal::Primitive(PrimitiveLiteral::Int(1000))),
 ])),
 &Type::Struct(StructType::new(vec![
 NestedField::required(2, "id", 
Type::Primitive(PrimitiveType::Int)).into(),
 NestedField::optional(3, "name", 
Type::Primitive(PrimitiveType::String)).into(),
 NestedField::optional(4, "address", 
Type::Primitive(PrimitiveType::String)).into(),
+NestedField::required(5, "extra", 
Type::Primitive(PrimitiveType::Int)).into(),

Review Comment:
   Yes, but I find that it can also pass originally. I'm trying to find the 
test case that can't pass originally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

1 2 >

1 - 100 of 197 matches

Mail list logo