Re: [PR] Fix `Table.scan` to enable case sensitive argument [iceberg-python]
jiakai-li commented on PR #1423: URL: https://github.com/apache/iceberg-python/pull/1423#issuecomment-2542657647 Hey guys, thanks a lot for your kind guidance and great suggestions. I've updated the PR to: - Enable `Table.delete` and `Table.overwrite` operations to control case-sensitivity - Updated test cases to make it more readable Please let me know if there is other changes you would like me to make, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix `Table.scan` to enable case sensitive argument [iceberg-python]
jiakai-li commented on code in PR #1423: URL: https://github.com/apache/iceberg-python/pull/1423#discussion_r1884721249 ## tests/table/test_init.py: ## @@ -310,6 +310,19 @@ def test_table_scan_row_filter(table_v2: Table) -> None: assert scan.filter(EqualTo("x", 10)).filter(In("y", (10, 11))).row_filter == And(EqualTo("x", 10), In("y", (10, 11))) +def test_table_scan_partition_filters_case_sensitive(table_v2: Table) -> None: Review Comment: Thanks very much @kevinjqliu , I've updated the test case for scan and the other operations as well. Hope it looks better now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] IO Implementation using Go CDK [iceberg-go]
dwilson1988 commented on PR #176: URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542660861 @zeroshade - I'll take a look this weekend! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Change Delete granularity to file for Spark 3.5 [iceberg]
aokolnychyi commented on code in PR #11478: URL: https://github.com/apache/iceberg/pull/11478#discussion_r1884625538 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java: ## @@ -719,7 +719,7 @@ public DeleteGranularity deleteGranularity() { .stringConf() .option(SparkWriteOptions.DELETE_GRANULARITY) .tableProperty(TableProperties.DELETE_GRANULARITY) -.defaultValue(TableProperties.DELETE_GRANULARITY_DEFAULT) +.defaultValue(DeleteGranularity.FILE.toString()) Review Comment: Let's switch to `enumConf` for this one, which was added recently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] GCS: Suppress JavaUtilDate in OAuth2RefreshCredentialsHandler [iceberg]
nastra merged PR #11773: URL: https://github.com/apache/iceberg/pull/11773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883568018 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java: ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import static org.assertj.core.api.Assertions.assertThat; + +import java.util.List; +import java.util.Map; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.rest.HTTPHeaders.HTTPHeader; +import org.junit.jupiter.api.Test; + +class TestHTTPHeaders { + + final HTTPHeaders headers = + HTTPHeaders.of( + HTTPHeader.of("header1", "value1a"), Review Comment: please also add a test where the key/value is null ## core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java: ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import static org.assertj.core.api.Assertions.assertThat; + +import java.util.List; +import java.util.Map; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.rest.HTTPHeaders.HTTPHeader; +import org.junit.jupiter.api.Test; + +class TestHTTPHeaders { + + final HTTPHeaders headers = Review Comment: can be private -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883571471 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java: ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import static org.assertj.core.api.Assertions.assertThat; + +import java.util.List; +import java.util.Map; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.rest.HTTPHeaders.HTTPHeader; +import org.junit.jupiter.api.Test; + +class TestHTTPHeaders { + + final HTTPHeaders headers = + HTTPHeaders.of( + HTTPHeader.of("header1", "value1a"), + HTTPHeader.of("HEADER1", "value1b"), + HTTPHeader.of("header2", "value2")); + + @Test + void asMap() { +assertThat(headers.asMap()) +.isEqualTo( +Map.of( +"header1", List.of("value1a", "value1b"), +"header2", List.of("value2"))); + } + + @Test + void asSimpleMap() { +assertThat(headers.asSimpleMap()) +.isEqualTo( +Map.of( +"header1", "value1a", +"header2", "value2")); + } + + @Test + void asMultiMap() { +assertThat(headers.asMultiMap()) +.isEqualTo( +ImmutableListMultimap.of( +"header1", "value1a", "header1", "value1b", "header2", "value2")); + } + + @Test + void entries() { +assertThat(headers.entries("header1")) Review Comment: please also add tests where `null` is passed to all the methods that take nullable parameters, such as `entries()`/`contains()`/`addIfAbsent()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883847591 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); + + /** Returns all the entries in this group for the given name (case-insensitive). */ + default List entries(String name) { +return entries().stream() +.filter(header -> header.name().equalsIgnoreCase(name)) +.collect(Collectors.toList()); + } + + /** Returns whether this group contains an entry with the given name (case-insensitive). */ + default boolean contains(String name) { +return entries().stream().anyMatch(header -> header.name().equalsIgnoreCase(name)); + } + + /** + * Adds the given header to the current group if no entry with the same name is already present. + * Returns a new instance with the added header, or the current instance if the header is already + * present. + */ + default HTTPHeaders addIfAbsent(HTTPHeader header) { Review Comment: we might want to consider naming this (and the other method) `withHeaderIfAbsent()` to clearly indicate that (potentially) a new instance is returned because `add` in the name suggests that the existing object is modified -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883560954 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.Multimap; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); + + /** + * Returns a map representation of the headers where each header name is mapped to a list of its + * values. Header names are case-insensitive. + */ + @Value.Lazy + default Map> asMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value).collect(Collectors.toList(; + } + + /** + * Returns a simple map representation of the headers where each header name is mapped to its + * first value. If a header has multiple values, only the first value is used. Header names are + * case-insensitive. + */ + @Value.Lazy + default Map asSimpleMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap(headers -> headers.get(0).name(), headers -> headers.get(0).value())); + } + + /** + * Returns a {@link ListMultimap} representation of the headers. Header names are + * case-insensitive. + */ + @Value.Lazy + default ListMultimap asMultiMap() { Review Comment: do we actually have a use case for this? I see the value in having `asMap()` / `asSimpleMap()` but I feel like we don't need this method atm? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883564062 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.Multimap; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); + + /** + * Returns a map representation of the headers where each header name is mapped to a list of its + * values. Header names are case-insensitive. + */ + @Value.Lazy + default Map> asMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value).collect(Collectors.toList(; + } + + /** + * Returns a simple map representation of the headers where each header name is mapped to its + * first value. If a header has multiple values, only the first value is used. Header names are + * case-insensitive. + */ + @Value.Lazy + default Map asSimpleMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap(headers -> headers.get(0).name(), headers -> headers.get(0).value())); + } + + /** + * Returns a {@link ListMultimap} representation of the headers. Header names are + * case-insensitive. + */ + @Value.Lazy + default ListMultimap asMultiMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +ImmutableListMultimap.flatteningToImmutableListMultimap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value))); + } + + /** Returns all the entries in this group for the given name (case-insensitive). */ + default List entries(String name) { +return entries().stream() +.filter(header -> header.name().equalsIgnoreCase(name)) +.collect(Collectors.toList()); + } + + /** Returns whether this group contains an entry with the given name (case-insensitive). */ + default boolean contains(String name) { +return entries().stream().anyMatch(header -> header.name().equalsIgnoreCase(name)); + } + + /** + * Adds the given header to the current group if no entry with the same name is already present. + * Returns a new instance with the added header, or the current instance if the header is already + * present. + */ + default HTTPHeaders addIfAbsent(HTTPHeader header) { +return contains(header.name()) +? this +: ImmutableHTTPHeaders.builder().from(this).addEntry(header).build(); + } + + /** + * Adds the given headers to the current group if no entries with same names are already present. + * Returns a new instance with the added headers, or the current instance if all headers are + * already present. + */ + default HTTPHeaders addIfAbsent(HTTPHeaders headers) { +
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
adutra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883565088 ## core/src/main/java/org/apache/iceberg/rest/HTTPRequest.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import com.fasterxml.jackson.databind.ObjectMapper; +import java.net.URI; +import java.util.Map; +import javax.annotation.Nullable; +import org.immutables.value.Value; + +/** Represents an HTTP request. */ +@Value.Style(redactedMask = "", depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPRequest { + + enum HTTPMethod { +GET, +HEAD, +POST, +DELETE + } + + /** + * Returns the base URI configured at the REST client level. The base URI is used to construct the + * full {@link #requestUri()}. + */ + URI baseUri(); + + /** + * Returns the full URI of this request. The URI is constructed from the base URI, path, and query + * parameters. It cannot be modified directly. + */ + @Value.Lazy + default URI requestUri() { +return RESTUtil.buildRequestUri(this); Review Comment: Yes that's a good idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
adutra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883564166 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.Multimap; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); + + /** + * Returns a map representation of the headers where each header name is mapped to a list of its + * values. Header names are case-insensitive. + */ + @Value.Lazy + default Map> asMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value).collect(Collectors.toList(; + } + + /** + * Returns a simple map representation of the headers where each header name is mapped to its + * first value. If a header has multiple values, only the first value is used. Header names are + * case-insensitive. + */ + @Value.Lazy + default Map asSimpleMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap(headers -> headers.get(0).name(), headers -> headers.get(0).value())); + } + + /** + * Returns a {@link ListMultimap} representation of the headers. Header names are + * case-insensitive. + */ + @Value.Lazy + default ListMultimap asMultiMap() { Review Comment: This one will be very useful later to do things like this in `HTTPClient`: ```java HTTPRequest req = ... HttpUriRequestBase request = new HttpUriRequestBase(req.method().name(), req.requestUri()); req.headers().asMultiMap().forEach(request::addHeader); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883809126 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java: ## @@ -95,6 +100,9 @@ void addIfAbsentHTTPHeader() { "header1", List.of("value1a", "value1b"), "header2", List.of("value2"), "header3", List.of("value3"))); + +assertThatThrownBy(() -> headers.addIfAbsent((HTTPHeaders) null)) +.isInstanceOf(NullPointerException.class); Review Comment: checks like this should always have a `.hasMessage` check to make sure we get the right error msg back (unfortunately we can't easily enforce such a rule via checkstyle) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
adutra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883811731 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.Multimap; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); + + /** + * Returns a map representation of the headers where each header name is mapped to a list of its + * values. Header names are case-insensitive. + */ + @Value.Lazy + default Map> asMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value).collect(Collectors.toList(; + } + + /** + * Returns a simple map representation of the headers where each header name is mapped to its + * first value. If a header has multiple values, only the first value is used. Header names are + * case-insensitive. + */ + @Value.Lazy + default Map asSimpleMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap(headers -> headers.get(0).name(), headers -> headers.get(0).value())); + } + + /** + * Returns a {@link ListMultimap} representation of the headers. Header names are + * case-insensitive. + */ + @Value.Lazy + default ListMultimap asMultiMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +ImmutableListMultimap.flatteningToImmutableListMultimap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value))); + } + + /** Returns all the entries in this group for the given name (case-insensitive). */ + default List entries(String name) { +return entries().stream() +.filter(header -> header.name().equalsIgnoreCase(name)) +.collect(Collectors.toList()); + } + + /** Returns whether this group contains an entry with the given name (case-insensitive). */ + default boolean contains(String name) { +return entries().stream().anyMatch(header -> header.name().equalsIgnoreCase(name)); + } + + /** + * Adds the given header to the current group if no entry with the same name is already present. + * Returns a new instance with the added header, or the current instance if the header is already + * present. + */ + default HTTPHeaders addIfAbsent(HTTPHeader header) { +return contains(header.name()) +? this +: ImmutableHTTPHeaders.builder().from(this).addEntry(header).build(); + } + + /** + * Adds the given headers to the current group if no entries with same names are already present. + * Returns a new instance with the added headers, or the current instance if all headers are + * already present. + */ + default HTTPHeaders addIfAbsent(HTTPHeaders headers) { +
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
adutra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883813563 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPHeaders.java: ## @@ -95,6 +100,9 @@ void addIfAbsentHTTPHeader() { "header1", List.of("value1a", "value1b"), "header2", List.of("value2"), "header3", List.of("value3"))); + +assertThatThrownBy(() -> headers.addIfAbsent((HTTPHeaders) null)) +.isInstanceOf(NullPointerException.class); Review Comment: The message is generated by Immutables and will be rather cryptic. But OK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883806469 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.Multimap; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); + + /** + * Returns a map representation of the headers where each header name is mapped to a list of its + * values. Header names are case-insensitive. + */ + @Value.Lazy + default Map> asMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value).collect(Collectors.toList(; + } + + /** + * Returns a simple map representation of the headers where each header name is mapped to its + * first value. If a header has multiple values, only the first value is used. Header names are + * case-insensitive. + */ + @Value.Lazy + default Map asSimpleMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap(headers -> headers.get(0).name(), headers -> headers.get(0).value())); + } + + /** + * Returns a {@link ListMultimap} representation of the headers. Header names are + * case-insensitive. + */ + @Value.Lazy + default ListMultimap asMultiMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +ImmutableListMultimap.flatteningToImmutableListMultimap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value))); + } + + /** Returns all the entries in this group for the given name (case-insensitive). */ + default List entries(String name) { +return entries().stream() +.filter(header -> header.name().equalsIgnoreCase(name)) +.collect(Collectors.toList()); + } + + /** Returns whether this group contains an entry with the given name (case-insensitive). */ + default boolean contains(String name) { +return entries().stream().anyMatch(header -> header.name().equalsIgnoreCase(name)); + } + + /** + * Adds the given header to the current group if no entry with the same name is already present. + * Returns a new instance with the added header, or the current instance if the header is already + * present. + */ + default HTTPHeaders addIfAbsent(HTTPHeader header) { +return contains(header.name()) +? this +: ImmutableHTTPHeaders.builder().from(this).addEntry(header).build(); + } + + /** + * Adds the given headers to the current group if no entries with same names are already present. + * Returns a new instance with the added headers, or the current instance if all headers are + * already present. + */ + default HTTPHeaders addIfAbsent(HTTPHeaders headers) { +
Re: [PR] Docs: add note for `day` transform [iceberg]
Fokko commented on code in PR #11749: URL: https://github.com/apache/iceberg/pull/11749#discussion_r1883817906 ## format/spec.md: ## @@ -454,7 +454,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | | **`year`**| Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | | **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` (the physical type should be an `int`, but the the logical type should be a `date`) | Review Comment: > The value should be serialized/displayed as Date (This is not mentioned in the spec here, but is in the reference implementation.) Yes, but since it is serialized with Avro, it will always be an `int`: ```json { "type": "int", "logicalType": "date" } ``` Specifying this twice would lead to duplication of [the Avro spec](https://avro.apache.org/docs/1.12.0/specification/#date) into the Iceberg spec. The error in Iceberg-Rust did raise my eyebrow a bit since I would expect it to read the `int` without the `logicalType` as well because there is no ambiguity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] REST catalog doesn't return old history if we execute `CREATE OR REPLACE TABLE` statement [iceberg]
ebyhr commented on issue #11777: URL: https://github.com/apache/iceberg/issues/11777#issuecomment-2540941152 I think we need to add a new `MetadataUpdate` class (e.g. `ResetMainBranch`) or modify `RemoveSnapshotRef` to provide an option to clear snapshot-logs or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
adutra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883626591 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.ListMultimap; +import org.apache.iceberg.relocated.com.google.common.collect.Multimap; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); + + /** + * Returns a map representation of the headers where each header name is mapped to a list of its + * values. Header names are case-insensitive. + */ + @Value.Lazy + default Map> asMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value).collect(Collectors.toList(; + } + + /** + * Returns a simple map representation of the headers where each header name is mapped to its + * first value. If a header has multiple values, only the first value is used. Header names are + * case-insensitive. + */ + @Value.Lazy + default Map asSimpleMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +Collectors.toMap(headers -> headers.get(0).name(), headers -> headers.get(0).value())); + } + + /** + * Returns a {@link ListMultimap} representation of the headers. Header names are + * case-insensitive. + */ + @Value.Lazy + default ListMultimap asMultiMap() { +return entries().stream() +.collect(Collectors.groupingBy(h -> h.name().toLowerCase(Locale.ROOT))) +.values() +.stream() +.collect( +ImmutableListMultimap.flatteningToImmutableListMultimap( +headers -> headers.get(0).name(), +headers -> headers.stream().map(HTTPHeader::value))); + } + + /** Returns all the entries in this group for the given name (case-insensitive). */ + default List entries(String name) { +return entries().stream() +.filter(header -> header.name().equalsIgnoreCase(name)) +.collect(Collectors.toList()); + } + + /** Returns whether this group contains an entry with the given name (case-insensitive). */ + default boolean contains(String name) { +return entries().stream().anyMatch(header -> header.name().equalsIgnoreCase(name)); + } + + /** + * Adds the given header to the current group if no entry with the same name is already present. + * Returns a new instance with the added header, or the current instance if the header is already + * present. + */ + default HTTPHeaders addIfAbsent(HTTPHeader header) { +return contains(header.name()) +? this +: ImmutableHTTPHeaders.builder().from(this).addEntry(header).build(); + } + + /** + * Adds the given headers to the current group if no entries with same names are already present. + * Returns a new instance with the added headers, or the current instance if all headers are + * already present. + */ + default HTTPHeaders addIfAbsent(HTTPHeaders headers) { +
Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]
Xuanwo commented on code in PR #785: URL: https://github.com/apache/iceberg-rust/pull/785#discussion_r1883575270 ## crates/iceberg/src/arrow/reader.rs: ## @@ -1180,4 +1194,62 @@ mod tests { assert_eq!(visitor.field_ids, expected); } + +#[test] +fn test_arrow_projection_mask() { Review Comment: Hi, thank you @gruuya for working on this, would you like to add a test case for decimal? Others LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] refactor: avoid async_trait for FileRead and provide object safe dyn methods [iceberg-rust]
Xuanwo commented on PR #761: URL: https://github.com/apache/iceberg-rust/pull/761#issuecomment-2540901488 I have similar comments like https://github.com/apache/iceberg-rust/pull/760#issuecomment-2540446608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] refactor: Move puffin crate contents inside iceberg crate [iceberg-rust]
Xuanwo merged PR #789: URL: https://github.com/apache/iceberg-rust/pull/789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Move puffin crate contents inside iceberg crate [iceberg-rust]
Xuanwo commented on PR #789: URL: https://github.com/apache/iceberg-rust/pull/789#issuecomment-2540906883 Thank you, @Fokko, for the feedback. Since neither of us has a strong opinion on this and the suggestion comes from @liurenjie1024 with agreement from @fqaiser94, I believe there’s no need to wait for another round of review. This change can easily be reverted at any time if we find it to be incorrect. So, let’s move forward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Doc: Add staus page for different implementations. [iceberg]
Fokko commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1883511101 ## site/docs/status.md: ## @@ -0,0 +1,362 @@ +--- +title: "Implementation Status" +--- + + +# Implementations Status + +Apache iceberg now has implementations of the iceberg spec in multiple languages. This page provides a summary of the +current status of these implementations. Review Comment: ```suggestion Apache iceberg's spec is implemented in multiple languages. This page provides an overview of the current capabilities. ``` ## site/docs/status.md: ## @@ -0,0 +1,362 @@ +--- +title: "Implementation Status" +--- + + +# Implementations Status + +Apache iceberg now has implementations of the iceberg spec in multiple languages. This page provides a summary of the +current status of these implementations. + +## Versions + +This section describes the versions of each implementation that are being tracked in this page. + +| Language | Version | +|---|-| +| Java | 1.7.1 | +| PyIceberg | 0.7.0 | +| Rust | 0.3.0 | +| Go| 0.1.0 | + +## Data Types + +| Data Type | Java | PyIceberg | Rust | Go | +||--|---|--|| +| boolean| Y| Y | Y| Y | +| int| Y| Y | Y| Y | +| long | Y| Y | Y| Y | +| float | Y| Y | Y| Y | +| double | Y| Y | Y| Y | +| decimal| Y| Y | Y| Y | +| date | Y| Y | Y| Y | +| time | Y| Y | Y| Y | +| timestamp | Y| Y | Y| Y | +| timestamptz| Y| Y | Y| Y | +| timestamp_ns | Y| Y | Y| Y | +| timestamptz_ns | Y| Y | Y| Y | +| string | Y| Y | Y| Y | +| uuid | Y| Y | Y| Y | +| fixed | Y| Y | Y| Y | +| binary | Y| Y | Y| Y | + +## Data File Formats + +| Format | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Parquet | Y| Y | Y| Y | +| ORC | Y| N | N| N | +| Puffin | Y| N | N| N | + +## File IO + +| Storage | Java | PyIceberg | Rust | Go | +|--|--|---|--|| +| Local Filesystem | Y| Y | Y| Y | +| Hadoop Filesystem| Y| Y | Y| Y | +| Aws S3 | Y| Y | Y| Y | +| Google Cloud Storage | Y| Y | Y| Y | +| Memory Fs| Y| Y | Y| Y | + +## Table Update Operations + +### Table Spec V1 + +| Operation | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Update schema | Y| N | N| N | +| Update partition spec | Y| N | N| N | +| Update table properties | Y| Y | Y| N | +| Replace sort order | Y| N | N| N | +| Update table location | Y| N | N| N | +| Append data files | Y| N | N| N | +| Rewrite files | Y| N | N| N | +| Rewrite manifests | Y| N | N| N | +| Overwrite files | Y| N | N| N | +| Row delta | Y| N | N| N | +| Delete files| Y| N | N| N | +| Update statistics | Y| N | N| N | +| Update partition statistics | Y| N | N| N | +| Expire snapshots| Y| N | N| N | +| Manage snapshots| Y| N | N| N | + +### Table Spec V2 + +| Operation | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Update schema | Y| N | N| N | +| Update partition spec | Y| N | N| N | +| Update table properties | Y| Y | Y| N | +| Replace sort order | Y| N | N| N | +| Update table location | Y| N | N| N | +| Append data files | Y| N | N| N | +| Rewrite files | Y| N | N| N | +| Rewrite manifests | Y| N | N| N | +| Overwrite files | Y| N | N| N | +| Row delta | Y| N | N| N | +| Delete files| Y| N | N| N | +| Update statistics | Y| N | N| N | +| Update partition statistics | Y| N | N| N | +| Expire snapshots| Y| N | N| N | +| Manage snapshots| Y| N | N| N | +
[I] Failed to read iceberg TPCH generated by snowflake [iceberg-rust]
Xuanwo opened a new issue, #790: URL: https://github.com/apache/iceberg-rust/issues/790 The table generated in this way: ```sql CREATE OR REPLACE ICEBERG TABLE lineitem ( l_orderkey BIGINT, l_partkeyBIGINT, l_suppkeyBIGINT, l_linenumber INT, l_quantity DECIMAL(15,2), l_extendedprice DECIMAL(15,2), l_discount DECIMAL(15,2), l_taxDECIMAL(15,2), l_returnflag STRING, l_linestatus STRING, l_shipdate DATE, l_commitdate DATE, l_receiptdateDATE, l_shipinstruct STRING, l_shipmode STRING, l_commentSTRING ) CATALOG = 'SNOWFLAKE' EXTERNAL_VOLUME = 'iceberg_external_volume' BASE_LOCATION = 'lineitem' CATALOG_SYNC = 'icebergPolaris'; INSERT INTO lineitem SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM; ``` I'm using iceberg-rust to read data from given table via polaris catalog: ```sql select * from lineitem limit 1; ``` Got the following error: ``` 1001=>iceberg data stream read: DataInvalid => Parquet schema table { 1: L_ORDERKEY: optional long 2: L_PARTKEY: optional long 3: L_SUPPKEY: optional long 4: L_LINENUMBER: optional int 5: L_QUANTITY: optional decimal(4,2) 6: L_EXTENDEDPRICE: optional decimal(8,2) 7: L_DISCOUNT: optional decimal(3,2) 8: L_TAX: optional decimal(3,2) 9: L_RETURNFLAG: optional string 10: L_LINESTATUS: optional string 11: L_SHIPDATE: optional date 12: L_COMMITDATE: optional date 13: L_RECEIPTDATE: optional date 14: L_SHIPINSTRUCT: optional string 15: L_SHIPMODE: optional string 16: L_COMMENT: optional string } and Iceberg schema table { 1: L_ORDERKEY: optional long 2: L_PARTKEY: optional long 3: L_SUPPKEY: optional long 4: L_LINENUMBER: optional int 5: L_QUANTITY: optional decimal(15,2) 6: L_EXTENDEDPRICE: optional decimal(15,2) 7: L_DISCOUNT: optional decimal(15,2) 8: L_TAX: optional decimal(15,2) 9: L_RETURNFLAG: optional string 10: L_LINESTATUS: optional string 11: L_SHIPDATE: optional date 12: L_COMMITDATE: optional date 13: L_RECEIPTDATE: optional date 14: L_SHIPINSTRUCT: optional string 15: L_SHIPMODE: optional string 16: L_COMMENT: optional string } do not match. ``` It appears snowflake writes parquet files with different schema when using iceberg tables. Should we support this scenario? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Retry object store reads on temporary errors. [iceberg-rust]
ryzhyk commented on PR #788: URL: https://github.com/apache/iceberg-rust/pull/788#issuecomment-2540837170 > Comparing to add retry layer for every service, how about adding it here? > > https://github.com/apache/iceberg-rust/blob/e073e75bf51fd4e8999e9d99a9986a4380afd0bc/crates/iceberg/src/io/storage.rs#L92 Done, thanks for the feedback! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] feat: Print debug source error instead [iceberg-rust]
Xuanwo opened a new pull request, #792: URL: https://github.com/apache/iceberg-rust/pull/792 This PR will print debug source error instead for better understanding what happened inside. Fix error like: ``` iceberg table scan plan: Unexpected => Failure in doing io operation ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Geospatial Support [iceberg]
jaakla commented on issue #10260: URL: https://github.com/apache/iceberg/issues/10260#issuecomment-2541099197 IMHO the concept of SPATIAL_REF_SYS, (and the other geo metadata tables) is a bit legacy from the times when (1) storage was really expensive and (2) databases did not have proper ways to add rich semantic metadata to the tables/columns. Today we don't have either problem - the modern table approaches (like Iceberg) do allow rich metadata attached properly directly to each relevant table, in space-effective way, no physical need of creating additional metadata tables and think of additional effort to take care of these. Maybe for some legacy clients you want virtual views, just for compatibility purposes. Here for me the question is on clear definition content and readability of that metadata. I dont currently really work with specialized geodata, I'm in "normal IT business" and here explicit (or even default/implied) definition of a `named_crs='OGC:CRS84'` would cover basically all the specific geo-metadata CRS needs. As you can see from geojson (or KML) popularity and history, only specific domain of geospatial scientists and land survey etc do care about anything else, for the others "correct" axis order and knowing how to do Havershine properly does the job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]
Xuanwo merged PR #785: URL: https://github.com/apache/iceberg-rust/pull/785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Avoid RANGE mode broken chain when write parallelism changes [iceberg]
pvary commented on PR #11702: URL: https://github.com/apache/iceberg/pull/11702#issuecomment-2541180202 @huyuanfeng2018: Could we have a unit test for this to avoid future regressions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] chore: Add more debug message inside error [iceberg-rust]
Xuanwo opened a new pull request, #793: URL: https://github.com/apache/iceberg-rust/pull/793 This PR will add more debug message inside error from rest catalog so we can avoid error message like: ``` 1107=> Iceberg catalog load failed: ReadTableDataError. Code: 1107, Text = Iceberg catalog load failed: Unexpected => Failed to parse response from rest catalog server! Context: json: code: 429 Too Many Requests Source: EOF while parsing a value at line 1 column 0 . ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Eagerly project arrow schema when calculating the parquet `ProjectionMask` [iceberg-rust]
Xuanwo closed issue #784: Eagerly project arrow schema when calculating the parquet `ProjectionMask` URL: https://github.com/apache/iceberg-rust/issues/784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] add Status data structure [iceberg-cpp]
zhjwpku opened a new pull request, #8: URL: https://github.com/apache/iceberg-cpp/pull/8 This is adpated from Apache Arrow, this is a pre step before integret the Result data structure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]
raulcd commented on code in PR #7: URL: https://github.com/apache/iceberg-cpp/pull/7#discussion_r1883773249 ## .github/workflows/test.yml: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: Test + +on: + push: +branches: + - '**' + - '!dependabot/**' +tags: + - '**' + pull_request: + +concurrency: + group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }} + cancel-in-progress: true + +permissions: + contents: read + +env: + ICEBERG_HOME: /tmp/iceberg + +jobs: + ubuntu: +name: AMD64 Ubuntu latest +runs-on: ubuntu-latest Review Comment: I've pinned them to ubuntu-24.04 and macos-14 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add basic CI to build iceberg and example [iceberg-cpp]
raulcd commented on PR #7: URL: https://github.com/apache/iceberg-cpp/pull/7#issuecomment-2541213585 Run is successful as seen here: https://github.com/raulcd/iceberg-cpp/actions/runs/12314556083 @gaborkaszab @wgtmac if you are :+1: with this it probably can be merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Path/file URI handling [iceberg]
jdockerty opened a new issue, #11778: URL: https://github.com/apache/iceberg/issues/11778 ### Query engine N/A ### Question Are the path URIs intended to be fully addressable, such as within the `location` or `metadata-file` fields? From the example given in the spec with a path of `s3://bucket/.../v1.json` to some table metadata JSON, this would be missing a region configuration and would be down to a client implementation to set `AWS_REGION=YOUR_REGION` or configured in some other way. --- For some extra context, I asked this question in the [Iceberg Slack](https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1734022973216809) and believe I understand the answer, but wanted to make it more discoverable for others who are searching issues. I'd also like to ask whether there is anywhere within the spec I can add this information? My original thought was within the [implementation notes](https://iceberg.apache.org/spec/#appendix-f-implementation-notes), but it is not clear to me whether this is the correct place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Encountering `429 Too Many Requests` error every time when accessing Snowflake's Polaris catalog [iceberg-rust]
Xuanwo opened a new issue, #791: URL: https://github.com/apache/iceberg-rust/issues/791 I found iceberg-rust always raise `429 Too Many Requests` while calling Snowflake's Polaris catalog: ```rust 1107=> Iceberg catalog load failed: ReadTableDataError. Code: 1107, Text = Iceberg catalog load failed: Unexpected => Failed to parse response from rest catalog server! Context: json: code: 429 Too Many Requests Source: EOF while parsing a value at line 1 column 0 . ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Implement Decimal from/to bytes represents [iceberg-rust]
liurenjie1024 merged PR #665: URL: https://github.com/apache/iceberg-rust/pull/665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]
gruuya commented on code in PR #785: URL: https://github.com/apache/iceberg-rust/pull/785#discussion_r1883665954 ## crates/iceberg/src/arrow/reader.rs: ## @@ -1180,4 +1194,62 @@ mod tests { assert_eq!(visitor.field_ids, expected); } + +#[test] +fn test_arrow_projection_mask() { Review Comment: Certainly, let me think of one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]
gruuya commented on code in PR #785: URL: https://github.com/apache/iceberg-rust/pull/785#discussion_r1883721441 ## crates/iceberg/src/arrow/reader.rs: ## @@ -1180,4 +1194,62 @@ mod tests { assert_eq!(visitor.field_ids, expected); } + +#[test] +fn test_arrow_projection_mask() { Review Comment: I've slightly revised the existing test now. It tests a parquet/arrow schema with one supported type, one unsupported type and one with a decimal with a precision beyond the supported range. It asserts that projection masks fails until the projection leaves out the last two fields. Granted it's a bit of a stretch to think how this could come up in practice, but it does exercise the proposed changes in this PR nonetheless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Change the glue catalog doc ( `list_tables` method only return Iceberg Tables ) [iceberg-python]
omkenge closed issue #1291: Change the glue catalog doc ( `list_tables` method only return Iceberg Tables ) URL: https://github.com/apache/iceberg-python/issues/1291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: eagerly project the arrow schema to scope out non-selected fields [iceberg-rust]
gruuya commented on code in PR #785: URL: https://github.com/apache/iceberg-rust/pull/785#discussion_r1883721441 ## crates/iceberg/src/arrow/reader.rs: ## @@ -1180,4 +1194,62 @@ mod tests { assert_eq!(visitor.field_ids, expected); } + +#[test] +fn test_arrow_projection_mask() { Review Comment: I've slightly revised the existing test now. It tests a parquet/arrow schema with one supported type, one unsupported type and one with a decimal with a precision beyond the supported range. It asserts that projection masks fails until the projection leaves out the last two fields. Granted it's a bit of a stretch to think how this could come up in practice, but it does exercise the proposed changes in this PR nonetheless. Let me know if this is close to what you had in mind. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] add Status data structure [iceberg-cpp]
pitrou commented on code in PR #8: URL: https://github.com/apache/iceberg-cpp/pull/8#discussion_r1883865597 ## CMakeLists.txt: ## @@ -56,6 +64,11 @@ add_subdirectory(api) add_subdirectory(src) if(ICEBERG_BUILD_TESTS) + fetchcontent_declare(googletest Review Comment: This adds a GoogleTest dependency. Was there a decision to use GT? A popular alternative is https://github.com/catchorg/Catch2/, I'm not sure which one would be better. ## api/iceberg/util/string_builder.h: ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#pragma once + +#include +#include + +#include "visibility.h" + +namespace iceberg { + +namespace util { + +namespace detail { + +class ICEBERG_EXPORT StringStreamWrapper { Review Comment: Do we really to keep this from Arrow (which was C++11 compatible back then), or do we want to adopt a modern formatting library such as https://en.cppreference.com/w/cpp/utility/format or a backport thereof? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to read iceberg TPCH generated by snowflake [iceberg-rust]
Fokko commented on issue #790: URL: https://github.com/apache/iceberg-rust/issues/790#issuecomment-2540844729 Yes, we should support this. I think https://github.com/apache/iceberg-rust/pull/786 fixes this. Looking at the schema, only the `decimal` ones are different. And we support increased precision, as long as the scale stays the same:  Keep in mind that when people migrate from an existing Hive table to Iceberg (which is a pretty common scenario), we'll bump into this kind of thing more often, and I think that's okay to accommodate that, as long that it is perfectly safe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to read iceberg TPCH generated by snowflake [iceberg-rust]
Xuanwo commented on issue #790: URL: https://github.com/apache/iceberg-rust/issues/790#issuecomment-2540849063 Thank you @Fokko for this. I will review the PR accordingly! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fix loading a table in CachingCatalog with metadata table name [iceberg]
gaborkaszab commented on PR #11738: URL: https://github.com/apache/iceberg/pull/11738#issuecomment-2540849458 I see you have made changes previously in CachingCatalog, @rdblue @aokolnychyi @RussellSpitzer . Would you mind taking a look at this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Allow for schema evolution [iceberg-rust]
Xuanwo commented on PR #786: URL: https://github.com/apache/iceberg-rust/pull/786#issuecomment-2540855953 Hi @Fokko, I'm following up on issue #790. It appears this PR doesn't address the scenario I described in #790, where the underlying parquet schema differs from the declared one. Could schema evolution potentially resolve this? I'm concerned we might also need to adjust the Arrow schema mappings instead of just project mask. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883552348 ## core/src/main/java/org/apache/iceberg/rest/HTTPRequest.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import com.fasterxml.jackson.databind.ObjectMapper; +import java.net.URI; +import java.util.Map; +import javax.annotation.Nullable; +import org.immutables.value.Value; + +/** Represents an HTTP request. */ +@Value.Style(redactedMask = "", depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPRequest { + + enum HTTPMethod { +GET, +HEAD, +POST, +DELETE + } + + /** + * Returns the base URI configured at the REST client level. The base URI is used to construct the + * full {@link #requestUri()}. + */ + URI baseUri(); + + /** + * Returns the full URI of this request. The URI is constructed from the base URI, path, and query + * parameters. It cannot be modified directly. + */ + @Value.Lazy + default URI requestUri() { +return RESTUtil.buildRequestUri(this); Review Comment: to me it seems like we should rather embed the functionality of building an URI from a `HTTPRequest` here (and also do the same for encoding the body). Additionally, building an URI from the path does some validation to make sure it doesn't start with a `/`. I'd say this is something that we can move into a check method annotated with `@Value.Check`. That way we don't need the util methods in `RESTUtil` and can move testing into `TestHTTPRequest`. I've added an example diff below: ``` --- a/core/src/main/java/org/apache/iceberg/rest/HTTPRequest.java +++ b/core/src/main/java/org/apache/iceberg/rest/HTTPRequest.java @@ -18,10 +18,14 @@ */ package org.apache.iceberg.rest; +import com.fasterxml.jackson.core.JsonProcessingException; import com.fasterxml.jackson.databind.ObjectMapper; import java.net.URI; +import java.net.URISyntaxException; import java.util.Map; import javax.annotation.Nullable; +import org.apache.hc.core5.net.URIBuilder; +import org.apache.iceberg.exceptions.RESTException; import org.immutables.value.Value; /** Represents an HTTP request. */ @@ -49,7 +53,19 @@ public interface HTTPRequest { */ @Value.Lazy default URI requestUri() { -return RESTUtil.buildRequestUri(this); +// if full path is provided, use the input path as path +String fullPath = +(path().startsWith("https://";) || path().startsWith("http://";)) +? path() +: String.format("%s/%s", baseUri(), path()); +try { + URIBuilder builder = new URIBuilder(RESTUtil.stripTrailingSlash(fullPath)); + queryParameters().forEach(builder::addParameter); + return builder.build(); +} catch (URISyntaxException e) { + throw new RESTException( + "Failed to create request URI from base %s, params %s", fullPath, queryParameters()); +} } /** Returns the HTTP method of this request. */ @@ -77,7 +93,17 @@ public interface HTTPRequest { @Nullable @Value.Redacted default String encodedBody() { -return RESTUtil.encodeRequestBody(this); +Object body = body(); +if (body instanceof Map) { + return RESTUtil.encodeFormData((Map) body); +} else if (body != null) { + try { +return mapper().writeValueAsString(body); + } catch (JsonProcessingException e) { +throw new RESTException(e, "Failed to encode request body: %s", body); + } +} +return null; } /** @@ -88,4 +114,13 @@ public interface HTTPRequest { default ObjectMapper mapper() { return RESTObjectMapper.mapper(); } + + @Value.Check + default void check() { +if (path().startsWith("/")) { + throw new RESTException( + "Received a malformed path for a REST request: %s. Paths should not start with /", + path()); +} + } }
Re: [PR] feat: Allow for schema evolution [iceberg-rust]
Fokko commented on PR #786: URL: https://github.com/apache/iceberg-rust/pull/786#issuecomment-2540869512 This is pretty similar. With this PR we first write an int, and then read it as a long, which is a valid schema evolution. In the case of #790 we try to read a narrower decimal from the Parquet file, into a wider one from the Iceberg schema. I haven't tested it with the decimal, that might be a good followup 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Allow for schema evolution [iceberg-rust]
Xuanwo commented on PR #786: URL: https://github.com/apache/iceberg-rust/pull/786#issuecomment-2540873200 Thank you for the explanation! This PR is good enough for me. Let's proceed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Allow for schema evolution [iceberg-rust]
Xuanwo merged PR #786: URL: https://github.com/apache/iceberg-rust/pull/786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Implement Decimal from/to bytes represents [iceberg-rust]
Xuanwo commented on PR #665: URL: https://github.com/apache/iceberg-rust/pull/665#issuecomment-2540876657 cc @Fokko, would you like to take another? I believe it's good for merging now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Retry object store reads on temporary errors. [iceberg-rust]
Xuanwo merged PR #788: URL: https://github.com/apache/iceberg-rust/pull/788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Data loss bug in MergeIntoCommand [iceberg]
RussellSpitzer commented on issue #11765: URL: https://github.com/apache/iceberg/issues/11765#issuecomment-2541337509 You'll need to elaborate a bit more. What does "losing some of the increment new data writes" mean? Can you give an example? We can't really debug the generic case since we don't even know what you are actually attempting. Other important info would be things like what serialization level you are using, what version of Spark, the queries you are using, how you are running them etc ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Kafka connect iceberg connector seems to stop after having processed all messages in Kafka topic [iceberg]
thjaeckle commented on issue #11703: URL: https://github.com/apache/iceberg/issues/11703#issuecomment-2541339588 I found out the configuration mistake I did.. Turns out that `iceberg.connect.group-id` **must** match the configured Kafka Connect `consumer.group.id`. By default, this is `connect-` (both in Kafka Connect and in the Iceberg connector). With 0 prior Kafka Connect knowledge this is however very hard to find out - a hint in the documentation that this must match the `consumer.group.id` would be helpful. When configuring this, one probably has to adhere to ACLs in place - and then it is likely to to wrong .. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
nastra commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1883856330 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPRequest.java: ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +import java.net.URI; +import java.util.stream.Stream; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.exceptions.RESTException; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.rest.requests.CreateNamespaceRequest; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +class TestHTTPRequest { + + @ParameterizedTest + @MethodSource("validRequestUris") + public void requestUriSuccess(HTTPRequest request, URI expected) { +assertThat(request.requestUri()).isEqualTo(expected); + } + + public static Stream validRequestUris() { +return Stream.of( +Arguments.of( +ImmutableHTTPRequest.builder() +.baseUri(URI.create("http://localhost:8080/foo";)) +.method(HTTPRequest.HTTPMethod.GET) +.path("v1/namespaces/ns/tables/") // trailing slash should be removed +.putQueryParameter("pageToken", "1234") +.putQueryParameter("pageSize", "10") +.build(), +URI.create( + "http://localhost:8080/foo/v1/namespaces/ns/tables?pageToken=1234&pageSize=10";)), +Arguments.of( +ImmutableHTTPRequest.builder() +.baseUri(URI.create("http://localhost:8080/foo";)) +.method(HTTPRequest.HTTPMethod.GET) +.path("https://authserver.com/token";) // absolute path HTTPS +.build(), +URI.create("https://authserver.com/token";)), +Arguments.of( +ImmutableHTTPRequest.builder() +.baseUri(URI.create("http://localhost:8080/foo";)) +.method(HTTPRequest.HTTPMethod.GET) +.path("http://authserver.com/token";) // absolute path HTTP +.build(), +URI.create("http://authserver.com/token";))); + } + + @Test + public void malformedPath() { +assertThatThrownBy( +() -> +ImmutableHTTPRequest.builder() +.baseUri(URI.create("http://localhost";)) +.method(HTTPRequest.HTTPMethod.GET) +.path("/v1/namespaces") // wrong leading slash +.build()) +.isInstanceOf(RESTException.class) +.hasMessage( +"Received a malformed path for a REST request: /v1/namespaces. Paths should not start with /"); + } + + @Test + public void invalidPath() { +HTTPRequest request = +ImmutableHTTPRequest.builder() +.baseUri(URI.create("http://localhost";)) +.method(HTTPRequest.HTTPMethod.GET) +.path(" not a valid path") // wrong path +.build(); +assertThatThrownBy(request::requestUri) +.isInstanceOf(RESTException.class) +.hasMessage( +"Failed to create request URI from base http://localhost/ not a valid path, params {}"); + } + + @ParameterizedTest + @MethodSource("validRequestBodies") + public void encodedBody(HTTPRequest request, String expected) { +assertThat(request.encodedBody()).isEqualTo(expected); Review Comment: personally I think the test code would be much more readable without having a parameterized test and rather have just two separate test methods that verify the body is properly encoded as form data/JSON, but I'm ok with what's currently here and I'll leave that up to you. Also we probably might want to have some tests where the body can't be encoded due to: * body is null * body is a Map with null values * body is an invalid JSON -- This is an automated message from the
Re: [I] Data loss bug in MergeIntoCommand [iceberg]
RussellSpitzer commented on issue #11765: URL: https://github.com/apache/iceberg/issues/11765#issuecomment-2541339227 It would also be very helpful to know how you are determining there is data loss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Kafka connect iceberg connector seems to stop after having processed all messages in Kafka topic [iceberg]
thjaeckle closed issue #11703: Kafka connect iceberg connector seems to stop after having processed all messages in Kafka topic URL: https://github.com/apache/iceberg/issues/11703 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] add Status data structure [iceberg-cpp]
pitrou commented on PR #8: URL: https://github.com/apache/iceberg-cpp/pull/8#issuecomment-2541348036 Also @zhjwpku can you make sure you proof-read your PR description? There are spell checkers that can help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]
advancedxy commented on PR #10755: URL: https://github.com/apache/iceberg/pull/10755#issuecomment-2541354779 @danielcweeks would you mind to take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix ParallelIterable deadlock [iceberg]
osscm commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1884308287 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +257,17 @@ private static class Task implements Supplier>>, Closeable { @Override public Optional> get() { try { +if (queue.size() >= approximateMaxQueueSize) { Review Comment: https://github.com/apache/iceberg/pull/11781/files#diff-0586c055bbd9d220727cfa60c802321693a959c0cfb3a510376fc5a5bb1dc897R99 `this.taskFutures = new CompletableFuture[2 * ThreadPools.WORKER_THREAD_POOL_SIZE];` here the global thread pool size is driving the taskFuture array size, can this leads to more tasks then the available S3 connections? as I believe each table scan will create a new `ParallelIterable` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] HiveCatalog incorrectly uses FileIOTracker [iceberg]
tom-s-powell opened a new issue, #11783: URL: https://github.com/apache/iceberg/issues/11783 ### Apache Iceberg version 1.7.1 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Encountering an issue with `HiveCatalog` and `S3FileIO`. I believe this is due to the change introduced in https://github.com/apache/iceberg/pull/10771. There is a single `FileIO` instance for the `HiveCatalog` thus shared between all `HiveTableOperations`. By "tracking" each `TableOperations` and using `weakKeys` cache removal to close the `FileIO` this can lead to subsequent `HiveTableOperation` attempting to use a closed `FileIO`. When using `S3FileIO` this would lead to this connection pool error below: ``` ERROR [2024-12-13T13:22:41.936423Z] com.palantir.conjure.java.undertow.runtime.ConjureExceptions: Error handling request (_endpoint: resolveVersion, _internalErrorIdentifier: 08f54309, _internalErrorType: internal, _requestId: 957c3c716a48658e, _service: TableService, errorInstanceId: d7495079-28bd-48fe-af10-f22e94135457, errorName: Default:Internal) java.lang.IllegalStateException: Connection pool shut down at org.apache.http.util.Asserts.check(Asserts.java:34) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269) at software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75) at software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72) at software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254) at software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104) at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231) at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228) at software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:67) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:52) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:37) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) at software.amazon.awssdk.core.internal.http.pipeline.stages
Re: [PR] IO Implementation using Go CDK [iceberg-go]
zeroshade commented on PR #176: URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542399559 @loicalleyne following pyiceberg's example, I've added an option to force virtual addressing. That work for you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feat: support aliyun oss backend. [iceberg-go]
zeroshade commented on PR #216: URL: https://github.com/apache/iceberg-go/pull/216#issuecomment-2542403956 This seems generally good to me. Does Aliyun have something similar to how MinIO works for S3 that can be added to the integration tests to have CI testing the backend? i.e. is there any way to use a docker image to run something that simulates the Aliyun backend for testing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Read DVs when reading from .position_deletes table [iceberg]
aokolnychyi commented on code in PR #11657: URL: https://github.com/apache/iceberg/pull/11657#discussion_r1884381201 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/DVIterator.java: ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.spark.source; + +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import org.apache.iceberg.DeleteFile; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.BaseDeleteLoader; +import org.apache.iceberg.io.CloseableIterator; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Types; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow; +import org.apache.spark.unsafe.types.UTF8String; + +class DVIterator implements CloseableIterator { + private final PartitionSpec spec; Review Comment: Do we still need `spec`? ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/DVIterator.java: ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.spark.source; + +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import org.apache.iceberg.DeleteFile; +import org.apache.iceberg.MetadataColumns; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.data.BaseDeleteLoader; +import org.apache.iceberg.io.CloseableIterator; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Types; +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow; +import org.apache.spark.unsafe.types.UTF8String; + +class DVIterator implements CloseableIterator { + private final PartitionSpec spec; + private final DeleteFile deleteFile; + private final Schema projection; + private final Map idToConstant; + private final Iterator positions; + private Integer deletedPositionIndex; + private GenericInternalRow row; + + DVIterator( + InputFile inputFile, + DeleteFile deleteFile, + PartitionSpec spec, + Schema projection, + Map idToConstant) { +this.deleteFile = deleteFile; +this.spec = spec; +this.projection = projection; +this.idToConstant = idToConstant; +List pos = Lists.newArrayList(); +new BaseDeleteLoader(ignored -> inputFile) +.loadPositionDeletes(ImmutableList.of(deleteFile), deleteFile.referencedDataFile()) +.forEach(pos::add); +this.positions = pos.iterator(); + } + + @Override + public boolean hasNext() { +return positions.hasNext(); + } + + @Override + public InternalRow next() { +long position = positions.next(); + +if (null == row) { + List rowValues = Lists.newArrayList(); + for (Types.NestedField column : projection.columns()) { +int fieldId = column.fieldId(); +if (fieldId == MetadataColumns.DELETE_FILE_PATH.fieldId()) { + rowValues.add(UTF
Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]
aokolnychyi commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1884367833 ## core/src/main/java/org/apache/iceberg/TableUtil.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; + +public class TableUtil { + private TableUtil() {} + + /** Returns the format version of the given table */ + public static int formatVersion(Table table) { +Preconditions.checkArgument(null != table, "Invalid table: null"); + +if (table instanceof SerializableTable) { + SerializableTable serializableTable = (SerializableTable) table; + return serializableTable.formatVersion(); +} else if (table instanceof HasTableOperations) { + return ((HasTableOperations) table).operations().current().formatVersion(); Review Comment: Optional: Same comment about an extra variable for `TableOperations`. ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -158,6 +160,21 @@ public Map properties() { return properties; } + public int formatVersion() { +return formatVersion; + } + + private int formatVersion(Table table) { +if (table instanceof HasTableOperations) { + return ((HasTableOperations) table).operations().current().formatVersion(); +} else { + // formatVersion=-1 will never be used/returned, because Review Comment: Optional: We may actually avoid the override by doing something like this below. ``` public int formatVersion() { if (formatVersion == UNKNOWN_FORMAT_VERSION) { throw new UnsupportedOperationException("Format version is unknown"); } return formatVersion; } private int formatVersion(Table table) { if (table instanceof HasTableOperations) { ... } else { return UNKNOWN_FORMAT_VERSION; } } ``` This would also cover cases for regular tables that didn't have `TableOperations` under the hood. ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -158,6 +160,21 @@ public Map properties() { return properties; } + public int formatVersion() { +return formatVersion; + } + + private int formatVersion(Table table) { +if (table instanceof HasTableOperations) { + return ((HasTableOperations) table).operations().current().formatVersion(); Review Comment: Optional: What about an extra variable for `TableOperations`? We have an explicit casts + 3 method invocations, it may be easier to read with code on multiple lines and it would be consistent with the `metadataFileLocation` method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add TableUtil to provide access to a table's format version [iceberg]
aokolnychyi commented on code in PR #11620: URL: https://github.com/apache/iceberg/pull/11620#discussion_r1884369789 ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -158,6 +160,21 @@ public Map properties() { return properties; } + public int formatVersion() { +return formatVersion; + } + + private int formatVersion(Table table) { +if (table instanceof HasTableOperations) { + return ((HasTableOperations) table).operations().current().formatVersion(); Review Comment: Optional: What about an extra variable for `TableOperations`? We have an explicit casts and 3 method invocations on a single line. An extra variable may be easier to read and would be consistent with `metadataFileLocation`. ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -158,6 +160,21 @@ public Map properties() { return properties; } + public int formatVersion() { +return formatVersion; + } + + private int formatVersion(Table table) { +if (table instanceof HasTableOperations) { + return ((HasTableOperations) table).operations().current().formatVersion(); +} else { + // formatVersion=-1 will never be used/returned, because Review Comment: Optional: We may actually avoid the override by doing something like below. ``` public int formatVersion() { if (formatVersion == UNKNOWN_FORMAT_VERSION) { throw new UnsupportedOperationException("Format version is unknown"); } return formatVersion; } private int formatVersion(Table table) { if (table instanceof HasTableOperations) { ... } else { return UNKNOWN_FORMAT_VERSION; } } ``` This would also cover cases for regular tables that didn't have `TableOperations` under the hood. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: TableMetadata Statistic Files [iceberg-rust]
c-thiel commented on code in PR #799: URL: https://github.com/apache/iceberg-rust/pull/799#discussion_r1884298259 ## crates/iceberg/src/catalog/mod.rs: ## @@ -446,6 +446,30 @@ pub enum TableUpdate { /// Properties to remove removals: Vec, }, +/// Set statistics for a snapshot +#[serde(with = "_serde_set_statistics")] +SetStatistics { +/// File containing the statistics +statistics: StatisticsFile, +}, Review Comment: In IRC and Java `SetStatistics` has an additional field `snapshot_id` ([link](https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/open-api/rest-catalog-open-api.yaml#L2902)). This field is redundant with `StatisticsFile.snapshot_id` and is only used as an [assertion in Java](https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1314). I removed the redundancy for rust and will start a discussion on the mailing List how to handle this. As we still need to be spec compliant, we need custom serializer / deserializer. Slack: https://apache-iceberg.slack.com/archives/C03LG1D563F/p1734109745807119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Add Hive 4 support and remove Hive runtime [iceberg]
rdblue commented on code in PR #11750: URL: https://github.com/apache/iceberg/pull/11750#discussion_r1884343692 ## gradle.properties: ## @@ -18,8 +18,8 @@ jmhJsonOutputPath=build/reports/jmh/results.json jmhIncludeRegex=.* systemProp.defaultFlinkVersions=1.20 systemProp.knownFlinkVersions=1.18,1.19,1.20 -systemProp.defaultHiveVersions=2 -systemProp.knownHiveVersions=2,3 +systemProp.defaultHiveVersions=4 +systemProp.knownHiveVersions=4 Review Comment: This is no longer needed, right? This only applied to the runtime modules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] feat: TableMetadata Statistics [iceberg-rust]
c-thiel opened a new pull request, #799: URL: https://github.com/apache/iceberg-rust/pull/799 Adds `StatisticFile` and `PartitionStatisticsFile` to spec, builder and REST TableUpdate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] IO Implementation using Go CDK [iceberg-go]
loicalleyne commented on PR #176: URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542287393 Is it intended to not provide the choice between virtual hosted bucket addressing and path-style addressing? LGTM otherwise - the tests are passing :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix ParallelIterable deadlock [iceberg]
sopel39 commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1884520727 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +257,17 @@ private static class Task implements Supplier>>, Closeable { @Override public Optional> get() { try { +if (queue.size() >= approximateMaxQueueSize) { Review Comment: > here the global thread pool size is driving the taskFuture array size, can this leads to more tasks then the available S3 connections? Correct > as I believe each table scan will create a new ParallelIterable ? Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
corleyma commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884523394 ## pyiceberg/table/__init__.py: ## @@ -1229,7 +1240,8 @@ def with_case_sensitive(self: S, case_sensitive: bool = True) -> S: class ScanTask(ABC): -pass +def size_in_bytes(self) -> int: Review Comment: would it be better to mark this method as abstract? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
corleyma commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884524371 ## pyiceberg/table/__init__.py: ## @@ -1253,6 +1265,22 @@ def __init__( self.start = start or 0 self.length = length or data_file.file_size_in_bytes +def size_in_bytes(self) -> int: +return self.length + sum(f.file_size_in_bytes for f in self.delete_files) + + +@dataclass(init=False) +class CombinedFileScanTask(ScanTask): +"""Task representing combined multiple file scan tasks.""" Review Comment: I think this could use a better docstring. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix ParallelIterable deadlock [iceberg]
RussellSpitzer commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1884527019 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +257,17 @@ private static class Task implements Supplier>>, Closeable { @Override public Optional> get() { try { +if (queue.size() >= approximateMaxQueueSize) { + // Yield when queue is over the size limit. Task will be resubmitted later and continue + // the work. + return Optional.of(this); +} + if (iterator == null) { iterator = input.iterator(); } while (iterator.hasNext()) { - if (queue.size() >= approximateMaxQueueSize) { Review Comment: Isn't this a pretty significant change in behavior or ParallelIterable? I assume we have some performance implications of not fully loading the queue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Data loss bug in MergeIntoCommand [iceberg]
RussellSpitzer commented on issue #11765: URL: https://github.com/apache/iceberg/issues/11765#issuecomment-2542309392 > However, sir, I might have discovered some issues. When executing the COW-MERGE-INTO command, Spark needs to use the ods_table twice. The first time is to match data files based on incremental records, and the second time is to perform the actual data merge. If the data in the ods_table changes between the first and second usage, I would like to know if this could lead to abnormal execution results? What would happen if the data in the ods_table suddenly increases? What about if the data in the ods_table suddenly decreases? > > Example: ods_table is a partitioned table, during the execution of the merge-into statement, someone adds or deletes partitions. Yes this is true, the relation which is created of source data must remain constant through the two different passes of the source data. The Target (Iceberg Table) can change. If the query would return different results I believe you could see odd behavior. We have some hooks to prevent this when the source is Iceberg I think but I don't believe we have any for non Iceberg sources. I may be forgetting something else but @aokolnychyi probably knows. I believe the workaround here is for a non-idempotent Subquery you should cache or persist it prior to merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] add .gitignore [iceberg-cpp]
pitrou commented on code in PR #9: URL: https://github.com/apache/iceberg-cpp/pull/9#discussion_r1884280319 ## .gitignore: ## @@ -0,0 +1,18 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +build/ Review Comment: Ok, fair enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat(puffin): Parse Puffin FileMetadata [iceberg-rust]
c-thiel commented on PR #765: URL: https://github.com/apache/iceberg-rust/pull/765#issuecomment-2541965685 @fqaiser94, just added the higher level statistic files in https://github.com/apache/iceberg-rust/pull/799 FYI. I would guess you would end up building those soon too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]
rdblue commented on PR #11415: URL: https://github.com/apache/iceberg/pull/11415#issuecomment-2542017239 The Spark failures are a port conflict. I think it's unrelated to these changes. We'll see the next time CI runs (I'm sure we'll have more changes to trigger them) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
corleyma commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884533676 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: I assume it's intentional that we're not actually calling this in any of the methods (like to_arrow, etc) that actually execute a scan? If the plan is to call this in `to_arrow` eventually, it would be good to have some benchmarks with realistic latencies (e.g., against actual object storage). If there is no plan to call this directly in pyiceberg, I wonder who the intended consumers of this API would be? I would expect most query engines -- distributed or otherwise -- to have their own notions for how to optimize scan planning. ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: I assume it's intentional that we're not actually calling this in any of the methods (like `to_arrow`, etc) that actually execute a scan? If the plan is to call this in `to_arrow` eventually, it would be good to have some benchmarks with realistic latencies (e.g., against actual object storage). If there is no plan to call this directly in pyiceberg, I wonder who the intended consumers of this API would be? I would expect most query engines -- distributed or otherwise -- to have their own notions for how to optimize scan planning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
corleyma commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884536195 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: Also, assuming there's still some future plan to offload expensive stuff like data scanning to a rust core, I do wonder if we want to commit to exposing additional scan planning APIs that may not align with whatever query engine gets embedded in pyiceberg? In the case that this is primarily intended to be for the benefit of pyiceberg's own scan execution I would consider making this a private API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix ParallelIterable deadlock [iceberg]
sopel39 commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1884535343 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +257,17 @@ private static class Task implements Supplier>>, Closeable { @Override public Optional> get() { try { +if (queue.size() >= approximateMaxQueueSize) { + // Yield when queue is over the size limit. Task will be resubmitted later and continue + // the work. + return Optional.of(this); +} + if (iterator == null) { iterator = input.iterator(); } while (iterator.hasNext()) { - if (queue.size() >= approximateMaxQueueSize) { Review Comment: The queue should still be bounded albeit not strictly up to `approximateMaxQueueSize`. > I assume we have some performance implications of not fully loading the queue? What do you mean by not fully loading the queue? This PR guarantees that `Task` that is started will complete and not hold external resources. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
corleyma commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1884536195 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: Also, assuming there's still some future plan to offload expensive stuff like data scanning to a rust core, I do wonder if we want to commit to exposing additional scan planning APIs that may not align with whatever query engine gets embedded in pyiceberg? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Hivemetastore unable to create hive lock after upgrading from hivemetastore 3.1.3 to 4.0.0 during iceberg operations [iceberg]
mAlf1999 opened a new issue, #11784: URL: https://github.com/apache/iceberg/issues/11784 ### Apache Iceberg version 1.6.0 ### Query engine Spark ### Please describe the bug 🐞 We are currently using iceberg version 1.6.0 and have been successfully using it along with hivemetastore version 3.1.3 with spark as the query engine. Recently, we went about upgrading hivemetastore to version 4.0.0, and since then we have been running into an issue where the hivemetastore is unable to create hivelock. The logs from the spark job are as follows: WARN MetastoreLock: Failed to create lock LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:test_hive, tablename:my_table, operationType:UNSET)], user:spark, hostname:my-driver, agentInfo:Iceberg-8de7e5ba-3be2-47e3-ae28-8d22fhghbfa6) org.apache.thrift.TApplicationException: Internal error processing lock at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:4678) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:4665) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2153) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169) at jdk.proxy2/jdk.proxy2.$Proxy54.lock(Unknown Source) at org.apache.iceberg.hive.MetastoreLock.lambda$createLock$3(MetastoreLock.java:305) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:72) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:65) at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) at org.apache.iceberg.hive.MetastoreLock.lambda$createLock$4(MetastoreLock.java:305) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) at org.apache.iceberg.hive.MetastoreLock.createLock(MetastoreLock.java:302) at org.apache.iceberg.hive.MetastoreLock.acquireLock(MetastoreLock.java:185) at org.apache.iceberg.hive.MetastoreLock.lock(MetastoreLock.java:146) at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:184) at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:128) at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:412) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:384) at org.apache.iceberg.BaseReplacePartitions.commit(BaseReplacePartitions.java:26) at org.apache.iceberg.spark.source.SparkWrite.commitOperation(SparkWrite.java:233) at org.apache.iceberg.spark.source.SparkWrite.access$1300(SparkWrite.java:84) at org.apache.iceberg.spark.source.SparkWrite$DynamicOverwrite.commit(SparkWrite.java:337) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:399) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:359) at org.apache.spark.sql.execution.datasources.v2.OverwritePartitionsDynamicExec.writeWithV2(WriteToDataSourceV2Exec.scala:260) at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:337) at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:336) at org.apache.spark.sql.execution.datasources.v2.OverwritePartitionsDynamicExec.run(WriteToDataSourceV2Exec.scala:260) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfu
Re: [PR] IO Implementation using Go CDK [iceberg-go]
zeroshade commented on PR #176: URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542213486 @loicalleyne can you take a look at the latest changes I made here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Doc: Add staus page for different implementations. [iceberg]
RussellSpitzer commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884491976 ## site/docs/status.md: ## @@ -0,0 +1,362 @@ +--- +title: "Implementation Status" +--- + + +# Implementations Status + +Apache iceberg now has implementations of the iceberg spec in multiple languages. This page provides a summary of the +current status of these implementations. Review Comment: ```suggestion The Apache Iceberg spec has multiple language and engine implementations. This page provides an overview of their current capabilities. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Doc: Add staus page for different implementations. [iceberg]
RussellSpitzer commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884493485 ## site/docs/status.md: ## @@ -0,0 +1,362 @@ +--- +title: "Implementation Status" +--- + + +# Implementations Status + +Apache iceberg now has implementations of the iceberg spec in multiple languages. This page provides a summary of the +current status of these implementations. + +## Versions Review Comment: Do we also want to note when the following table was last updated, or should we just have people check the commit time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Doc: Add staus page for different implementations. [iceberg]
RussellSpitzer commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884494396 ## site/docs/status.md: ## @@ -0,0 +1,362 @@ +--- +title: "Implementation Status" +--- + + +# Implementations Status + +Apache iceberg now has implementations of the iceberg spec in multiple languages. This page provides a summary of the +current status of these implementations. + +## Versions + +This section describes the versions of each implementation that are being tracked in this page. + +| Language | Version | +|---|-| +| Java | 1.7.1 | +| PyIceberg | 0.7.0 | +| Rust | 0.3.0 | +| Go| 0.1.0 | + +## Data Types + +| Data Type | Java | PyIceberg | Rust | Go | +||--|---|--|| +| boolean| Y| Y | Y| Y | +| int| Y| Y | Y| Y | +| long | Y| Y | Y| Y | +| float | Y| Y | Y| Y | +| double | Y| Y | Y| Y | +| decimal| Y| Y | Y| Y | +| date | Y| Y | Y| Y | +| time | Y| Y | Y| Y | +| timestamp | Y| Y | Y| Y | +| timestamptz| Y| Y | Y| Y | +| timestamp_ns | Y| Y | Y| Y | +| timestamptz_ns | Y| Y | Y| Y | +| string | Y| Y | Y| Y | +| uuid | Y| Y | Y| Y | +| fixed | Y| Y | Y| Y | +| binary | Y| Y | Y| Y | + +## Data File Formats + +| Format | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Parquet | Y| Y | Y| Y | +| ORC | Y| N | N| N | +| Puffin | Y| N | N| N | Review Comment: Yeah i'd probably break this out into the actual Puffin capabilities instead "Table Stats, Delete Vectors, ect" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Doc: Add staus page for different implementations. [iceberg]
RussellSpitzer commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884494856 ## site/docs/status.md: ## @@ -0,0 +1,362 @@ +--- +title: "Implementation Status" +--- + + +# Implementations Status + +Apache iceberg now has implementations of the iceberg spec in multiple languages. This page provides a summary of the +current status of these implementations. + +## Versions + +This section describes the versions of each implementation that are being tracked in this page. + +| Language | Version | +|---|-| +| Java | 1.7.1 | +| PyIceberg | 0.7.0 | +| Rust | 0.3.0 | +| Go| 0.1.0 | + +## Data Types + +| Data Type | Java | PyIceberg | Rust | Go | +||--|---|--|| +| boolean| Y| Y | Y| Y | +| int| Y| Y | Y| Y | +| long | Y| Y | Y| Y | +| float | Y| Y | Y| Y | +| double | Y| Y | Y| Y | +| decimal| Y| Y | Y| Y | +| date | Y| Y | Y| Y | +| time | Y| Y | Y| Y | +| timestamp | Y| Y | Y| Y | +| timestamptz| Y| Y | Y| Y | +| timestamp_ns | Y| Y | Y| Y | +| timestamptz_ns | Y| Y | Y| Y | +| string | Y| Y | Y| Y | +| uuid | Y| Y | Y| Y | +| fixed | Y| Y | Y| Y | +| binary | Y| Y | Y| Y | + +## Data File Formats + +| Format | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Parquet | Y| Y | Y| Y | +| ORC | Y| N | N| N | +| Puffin | Y| N | N| N | + +## File IO + +| Storage | Java | PyIceberg | Rust | Go | +|--|--|---|--|| +| Local Filesystem | Y| Y | Y| Y | +| Hadoop Filesystem| Y| Y | Y| Y | +| Aws S3 | Y| Y | Y| Y | Review Comment: S3 Compatible GCS Compatible Memory FS? < Do we want to include this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] ci(infra): Remove sha256 [iceberg-go]
zeroshade merged PR #226: URL: https://github.com/apache/iceberg-go/pull/226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] IO Implementation using Go CDK [iceberg-go]
loicalleyne commented on PR #176: URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542428675 LGTM 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] IO Implementation using Go CDK [iceberg-go]
zeroshade commented on PR #176: URL: https://github.com/apache/iceberg-go/pull/176#issuecomment-2542436725 @dwilson1988 When you get a chance, can you take a look at the changes I made here. I liked your thought on isolating things, but there was still a bunch of specific options for particular bucket types that needed to get accounted for as the options are not always passed via URL due to how Iceberg config properties work. So I'd like your thoughts or comments on what I ultimately came up with to simplify what @loicalleyne already had while solving the failing tests and whether it fits what you were thinking and using internally. Once this is merged, I'd definitely greatly appreciate contributions for Azure as you said :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: TableMetadata Statistic Files [iceberg-rust]
c-thiel commented on code in PR #799: URL: https://github.com/apache/iceberg-rust/pull/799#discussion_r1884798642 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -158,11 +160,15 @@ pub struct TableMetadata { /// writers, but is not used when reading because reads use the specs /// stored in manifest files. pub(crate) default_sort_order_id: i64, -///A map of snapshot references. The map keys are the unique snapshot reference +/// A map of snapshot references. The map keys are the unique snapshot reference /// names in the table, and the map values are snapshot reference objects. /// There is always a main branch reference pointing to the current-snapshot-id /// even if the refs map is null. pub(crate) refs: HashMap, +/// Mapping of snapshot ids to statistics files. +pub(crate) statistics: HashMap, Review Comment: Can we ever have more than one `StatisticsFile` for a `snapshot_id`? In java this is modeled as a mapping `snapshot_id` : `Vec`, however I couldn't find a way to get more than one element into the `Vec` for a `snapshot_id` other than deserializing. https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1310-L1320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]
Xuanwo commented on code in PR #795: URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1884868240 ## crates/iceberg/src/spec/values.rs: ## @@ -3439,11 +3443,13 @@ mod tests { "bar".to_string(), ))), None, +Some(Literal::Primitive(PrimitiveLiteral::Int(1000))), ])), &Type::Struct(StructType::new(vec![ NestedField::required(2, "id", Type::Primitive(PrimitiveType::Int)).into(), NestedField::optional(3, "name", Type::Primitive(PrimitiveType::String)).into(), NestedField::optional(4, "address", Type::Primitive(PrimitiveType::String)).into(), +NestedField::required(5, "extra", Type::Primitive(PrimitiveType::Int)).into(), Review Comment: > Yes, but I find that it can also pass originally. I'm trying to find the test case that can't pass originally. Thank you, that will be really meaningful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] fix: set key_metadata to Null by default [iceberg-rust]
feniljain commented on code in PR #800: URL: https://github.com/apache/iceberg-rust/pull/800#discussion_r1884869867 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -338,7 +338,7 @@ mod tests { nan_value_counts: HashMap::new(), lower_bounds: HashMap::new(), upper_bounds: HashMap::new(), -key_metadata: vec![], +key_metadata: Some(vec![]), Review Comment: They were tests so didn't matter much, but I can change all to `None` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] fix: set key_metadata to Null by default [iceberg-rust]
Xuanwo commented on code in PR #800: URL: https://github.com/apache/iceberg-rust/pull/800#discussion_r1884868013 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -338,7 +338,7 @@ mod tests { nan_value_counts: HashMap::new(), lower_bounds: HashMap::new(), upper_bounds: HashMap::new(), -key_metadata: vec![], +key_metadata: Some(vec![]), Review Comment: Do we need to use `None` by default? ## crates/iceberg/src/spec/manifest.rs: ## @@ -1134,7 +1134,7 @@ impl DataFile { &self.upper_bounds } /// Get the Implementation-specific key metadata for the data file. -pub fn key_metadata(&self) -> &[u8] { +pub fn key_metadata(&self) -> &Option> { Review Comment: It's better to return `Option<&[u8]>` to make it easier to use. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Doc: Add staus page for different implementations. [iceberg]
liurenjie1024 commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1884790587 ## site/docs/status.md: ## @@ -0,0 +1,362 @@ +--- +title: "Implementation Status" +--- + + +# Implementations Status + +Apache iceberg now has implementations of the iceberg spec in multiple languages. This page provides a summary of the +current status of these implementations. + +## Versions + +This section describes the versions of each implementation that are being tracked in this page. + +| Language | Version | +|---|-| +| Java | 1.7.1 | +| PyIceberg | 0.7.0 | +| Rust | 0.3.0 | +| Go| 0.1.0 | + +## Data Types + +| Data Type | Java | PyIceberg | Rust | Go | +||--|---|--|| +| boolean| Y| Y | Y| Y | +| int| Y| Y | Y| Y | +| long | Y| Y | Y| Y | +| float | Y| Y | Y| Y | +| double | Y| Y | Y| Y | +| decimal| Y| Y | Y| Y | +| date | Y| Y | Y| Y | +| time | Y| Y | Y| Y | +| timestamp | Y| Y | Y| Y | +| timestamptz| Y| Y | Y| Y | +| timestamp_ns | Y| Y | Y| Y | +| timestamptz_ns | Y| Y | Y| Y | +| string | Y| Y | Y| Y | +| uuid | Y| Y | Y| Y | +| fixed | Y| Y | Y| Y | +| binary | Y| Y | Y| Y | + +## Data File Formats + +| Format | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Parquet | Y| Y | Y| Y | +| ORC | Y| N | N| N | +| Puffin | Y| N | N| N | + +## File IO + +| Storage | Java | PyIceberg | Rust | Go | +|--|--|---|--|| +| Local Filesystem | Y| Y | Y| Y | +| Hadoop Filesystem| Y| Y | Y| Y | +| Aws S3 | Y| Y | Y| Y | +| Google Cloud Storage | Y| Y | Y| Y | +| Memory Fs| Y| Y | Y| Y | + +## Table Update Operations + +### Table Spec V1 + +| Operation | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Update schema | Y| N | Y| N | +| Update partition spec | Y| N | Y| N | +| Update table properties | Y| Y | Y| N | +| Replace sort order | Y| N | N| N | +| Update table location | Y| N | N| N | +| Append data files | Y| Y | N| N | +| Rewrite files | Y| Y | N| N | +| Rewrite manifests | Y| Y | N| N | +| Overwrite files | Y| Y | N| N | +| Row delta | Y| N | N| N | +| Delete files| Y| N | N| N | +| Update statistics | Y| N | N| N | +| Update partition statistics | Y| N | N| N | +| Expire snapshots| Y| N | N| N | +| Manage snapshots| Y| N | N| N | + +### Table Spec V2 + +| Operation | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Update schema | Y| Y | N| N | +| Update partition spec | Y| Y | N| N | +| Update table properties | Y| Y | Y| N | +| Replace sort order | Y| N | N| N | +| Update table location | Y| N | N| N | +| Append data files | Y| Y | N| N | +| Rewrite files | Y| Y | N| N | +| Rewrite manifests | Y| Y | N| N | +| Overwrite files | Y| Y | N| N | +| Row delta | Y| N | N| N | +| Delete files| Y| Y | N| N | +| Update statistics | Y| N | N| N | +| Update partition statistics | Y| N | N| N | +| Expire snapshots| Y| N | N| N | +| Manage snapshots| Y| N | N| N | + +## Table Read Operations + +### Table Spec V1 + +| Operation | Java | PyIceberg | Rust | Go | +|-|--|---|--|| +| Plan with data file | Y| Y | Y| Y | +| Read data file | Y| N | Y| N | + +### Table Spec V2 + +| Operation | Java | PyIceberg | Rust | Go | +||--|---|--|| +| Plan with data file| Y|
Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]
Xuanwo commented on code in PR #795: URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1884848752 ## crates/iceberg/src/spec/values.rs: ## @@ -3439,11 +3443,13 @@ mod tests { "bar".to_string(), ))), None, +Some(Literal::Primitive(PrimitiveLiteral::Int(1000))), ])), &Type::Struct(StructType::new(vec![ NestedField::required(2, "id", Type::Primitive(PrimitiveType::Int)).into(), NestedField::optional(3, "name", Type::Primitive(PrimitiveType::String)).into(), NestedField::optional(4, "address", Type::Primitive(PrimitiveType::String)).into(), +NestedField::required(5, "extra", Type::Primitive(PrimitiveType::Int)).into(), Review Comment: Hi, would you like to add a new test that cover the mis-order cases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support to append delete type data file [iceberg-rust]
ZENOTME commented on PR #798: URL: https://github.com/apache/iceberg-rust/pull/798#issuecomment-2542902331 Sorry, I think I have some misunderstanding here since the action which support to append data file and delete file is RowDelta.🤔 So I guess what we need is right: 1. MergingSnapshotProducer: https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L61 2. RowDelta: https://github.com/apache/iceberg/blob/540d6a6251e31b232fe6ed2413680621454d107a/api/src/main/java/org/apache/iceberg/RowDelta.java#L32 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]
ZENOTME commented on code in PR #795: URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1884853773 ## crates/iceberg/src/spec/values.rs: ## @@ -3439,11 +3443,13 @@ mod tests { "bar".to_string(), ))), None, +Some(Literal::Primitive(PrimitiveLiteral::Int(1000))), ])), &Type::Struct(StructType::new(vec![ NestedField::required(2, "id", Type::Primitive(PrimitiveType::Int)).into(), NestedField::optional(3, "name", Type::Primitive(PrimitiveType::String)).into(), NestedField::optional(4, "address", Type::Primitive(PrimitiveType::String)).into(), +NestedField::required(5, "extra", Type::Primitive(PrimitiveType::Int)).into(), Review Comment: Yes, but I find that it can also pass originally. I'm trying to find the test case that can't pass originally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org