Re: [I] doc: rust.iceberg.apache.org is not resolved [iceberg-rust]
Xuanwo commented on issue #137: URL: https://github.com/apache/iceberg-rust/issues/137#issuecomment-1876676128 https://rust.iceberg.apache.org/ is online! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] doc: rust.iceberg.apache.org is not resolved [iceberg-rust]
Xuanwo closed issue #137: doc: rust.iceberg.apache.org is not resolved URL: https://github.com/apache/iceberg-rust/issues/137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] docs: Change homepage to rust.i.a.o [iceberg-rust]
Xuanwo opened a new pull request, #146: URL: https://github.com/apache/iceberg-rust/pull/146 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump sqlalchemy from 2.0.24 to 2.0.25 [iceberg-python]
Fokko merged PR #250: URL: https://github.com/apache/iceberg-python/pull/250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bug fix falsy value of zero [iceberg-python]
Fokko merged PR #249: URL: https://github.com/apache/iceberg-python/pull/249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] bug: The current snapshot with id 0 will be skip. [iceberg-python]
Fokko closed issue #232: bug: The current snapshot with id 0 will be skip. URL: https://github.com/apache/iceberg-python/issues/232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Change homepage to rust.i.a.o [iceberg-rust]
Fokko merged PR #146: URL: https://github.com/apache/iceberg-rust/pull/146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Flink: Disable classloader check in TestIcebergSourceWithWatermarkExtractor to fix flakiness [iceberg]
manuzhang opened a new pull request, #9408: URL: https://github.com/apache/iceberg/pull/9408 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo opened a new pull request, #147: URL: https://github.com/apache/iceberg-rust/pull/147 Part of https://github.com/apache/iceberg-rust/issues/81 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Core: Remove partition statistics files during purge table [iceberg]
ajantha-bhat opened a new pull request, #9409: URL: https://github.com/apache/iceberg/pull/9409 follow up from https://github.com/apache/iceberg/pull/9305 cc: @dramaticlly, @amogh-jahagirdar, @nastra, @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] How to release [iceberg-rust]
Xuanwo commented on issue #81: URL: https://github.com/apache/iceberg-rust/issues/81#issuecomment-1876737649 ## Tasks - [ ] Add release docs: https://github.com/apache/iceberg-rust/pull/147 - [ ] Add verify docs - [ ] Add CI to upload crates -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]
manuzhang commented on PR #8988: URL: https://github.com/apache/iceberg/pull/8988#issuecomment-1876754004 I submitted https://github.com/apache/iceberg/pull/9408 to fix failed flaky test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Flink API rewriteDataFile How to set up scanning based on file size [iceberg]
GuoZhaoY commented on issue #9386: URL: https://github.com/apache/iceberg/issues/9386#issuecomment-1876786214 Thank you for your answer. I have understood your answer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Flink API rewriteDataFile How to set up scanning based on file size [iceberg]
GuoZhaoY closed issue #9386: Flink API rewriteDataFileHow to set up scanning based on file size URL: https://github.com/apache/iceberg/issues/9386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] docs: Add link for iceberg rust [iceberg-docs]
suyanhanx opened a new pull request, #300: URL: https://github.com/apache/iceberg-docs/pull/300  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]
advancedxy commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1876803507 Spark adds a configuration to control whether to treat binary as string in parquet reader, see: https://github.com/apache/spark/blob/c8137960a0ba725d1633795a057c68f2bbef414b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L981 For this cases, the similar approach should be adapted, rather to promote binary to string regardless. It would be safer and it's up to the users to decide. WDYT @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Failed to assign splits due to the serialized split size [iceberg]
javrasya opened a new issue, #9410: URL: https://github.com/apache/iceberg/issues/9410 ### Apache Iceberg version 1.4.2 (latest release) ### Query engine Flink ### Please describe the bug 🐞 Hi there, I am trying to consume records from an Iceberg table in my Flink application and I am running into the following issue; ``` Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to serialize splits. at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$4(SourceCoordinatorContext.java:223) at java.base/java.util.HashMap.forEach(HashMap.java:1337) at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$5(SourceCoordinatorContext.java:213) at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.callInCoordinatorThread(SourceCoordinatorContext.java:428) ... 14 more Caused by: java.io.UTFDataFormatException: Encoded string is too long: 123214 at org.apache.flink.core.memory.DataOutputSerializer.writeUTF(DataOutputSerializer.java:257) at org.apache.iceberg.flink.source.split.IcebergSourceSplit.serializeV2(IcebergSourceSplit.java:150) at org.apache.iceberg.flink.source.split.IcebergSourceSplitSerializer.serialize(IcebergSourceSplitSerializer.java:42) at org.apache.iceberg.flink.source.split.IcebergSourceSplitSerializer.serialize(IcebergSourceSplitSerializer.java:25) at org.apache.flink.runtime.source.event.AddSplitEvent.(AddSplitEvent.java:44) at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.lambda$assignSplits$4(SourceCoordinatorContext.java:220) ... 17 more ``` Not really sure why it gets too big but when I looked at the source code [here](https://github.com/apache/iceberg/blob/27e8c421358378bd80bed8b328d5b69e884b7484/flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java#L148-L151), it might be because there is too many file scan task in one split and that is why this is happening. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] how to integrations object storage ceph ? [iceberg]
jkl0898 commented on issue #7158: URL: https://github.com/apache/iceberg/issues/7158#issuecomment-1876852333 > Hi @jkl0898 , > > I am looking a solution to use Ceph with Iceberg. Currently I used MinIO but the we looking for an alternative solution to replace MinIO. Could you share technical detail how to config Ceph with Iceberg. Thanks Hi. refer to amogh-jahagirdar replied please! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to assign splits due to the serialized split size [iceberg]
nastra commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1876875354 @stevenzwu @pvary could you guys take a look please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: New API For sequential / streaming updates [iceberg]
jasonf20 commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1876954557 @rdblue Sure. I added support for setting the sequence number explicitly per file in `MergingSnapshotProducer`. This was almost supported already (it didn't support per file level for added files). This is then used in a similar way to how `Rewrite` used it, but the main difference is that we don't set the sequence number when the user adds the file. Instead, we just collect an ordered list of files and only assign the sequence numbers when `apply` is called. Since apply is called with the `base` metadata we can calculate the sequence numbers from there. If the commit fails due to a conflict `apply` is called again and we can re-calculate the new sequence numbers. Looking again now, I think I might need to remove the `requiresApply` boolean from the `apply` method for retries to work properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for reading Iceberg views [iceberg]
nastra commented on code in PR #9340: URL: https://github.com/apache/iceberg/pull/9340#discussion_r1441669272 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkView.java: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.spark.source; + +import static org.apache.iceberg.TableProperties.FORMAT_VERSION; + +import java.util.Map; +import java.util.Set; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet; +import org.apache.iceberg.spark.SparkSchemaUtil; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.BaseView; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.apache.iceberg.view.ViewOperations; +import org.apache.spark.sql.types.StructType; + +public class SparkView implements org.apache.spark.sql.connector.catalog.View { + + private static final Set RESERVED_PROPERTIES = + ImmutableSet.of("provider", "location", FORMAT_VERSION); + + private final View icebergView; + private final String catalogName; + private StructType lazySchema = null; + + public SparkView(String catalogName, View icebergView) { +this.catalogName = catalogName; +this.icebergView = icebergView; + } + + public View view() { +return icebergView; + } + + @Override + public String name() { +return icebergView.toString(); + } + + @Override + public String query() { +SQLViewRepresentation sqlRepr = icebergView.sqlFor("spark"); +Preconditions.checkState(sqlRepr != null, "Cannot load SQL for view %s", name()); +return sqlRepr.sql(); + } + + @Override + public String currentCatalog() { +return null != icebergView.currentVersion().defaultCatalog() +? icebergView.currentVersion().defaultCatalog() +: catalogName; + } + + @Override + public String[] currentNamespace() { +return icebergView.currentVersion().defaultNamespace().levels(); + } + + @Override + public StructType schema() { +if (null == lazySchema) { + this.lazySchema = SparkSchemaUtil.convert(icebergView.schema()); +} + +return lazySchema; + } + + @Override + public String[] queryColumnNames() { +return new String[0]; + } + + @Override + public String[] columnAliases() { Review Comment: I'll be handling column aliases in the PR that introduces view creation via SQL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]
manuzhang commented on PR #8988: URL: https://github.com/apache/iceberg/pull/8988#issuecomment-1877008303 @nastra can we move forward with this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump actions/labeler from 4 to 5 [iceberg]
nastra merged PR #9331: URL: https://github.com/apache/iceberg/pull/9331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for reading Iceberg views [iceberg]
nastra commented on code in PR #9340: URL: https://github.com/apache/iceberg/pull/9340#discussion_r1441709373 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -810,4 +823,123 @@ private Catalog.TableBuilder newBuilder(Identifier ident, Schema schema) { public Catalog icebergCatalog() { return icebergCatalog; } + + @Override + public Identifier[] listViews(String... namespace) { +if (null != asViewCatalog) { + return asViewCatalog.listViews(Namespace.of(namespace)).stream() + .map(ident -> Identifier.of(ident.namespace().levels(), ident.name())) + .toArray(Identifier[]::new); +} + +return new Identifier[0]; + } + + @Override + public View loadView(Identifier ident) throws NoSuchViewException { +if (null != asViewCatalog) { + try { +org.apache.iceberg.view.View view = asViewCatalog.loadView(buildIdentifier(ident)); +return new SparkView(catalogName, view); + } catch (org.apache.iceberg.exceptions.NoSuchViewException e) { +throw new NoSuchViewException(ident); + } +} + +throw new NoSuchViewException(ident); + } + + @Override + public View createView( + Identifier ident, + String sql, + String currentCatalog, + String[] currentNamespace, + StructType schema, + String[] queryColumnNames, Review Comment: I'll be handling this in the PR that introduces view creation via SQL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Partition Evolution [iceberg-python]
Fokko commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1441716417 ## pyiceberg/partitioning.py: ## @@ -215,3 +230,118 @@ def assign_fresh_partition_spec_ids(spec: PartitionSpec, old_schema: Schema, fre ) ) return PartitionSpec(*partition_fields, spec_id=INITIAL_PARTITION_SPEC_ID) + +T = TypeVar("T") +class PartitionSpecVisitor(Generic[T], ABC): +@abstractmethod +def identity(field_id: int, source_name: str, source_id: int) -> T: +""" +Visit identity partition field +""" + +@abstractmethod +def bucket(field_id: int, source_name: str, source_id: int, num_buckets: int) -> T: +""" +Visit bucket partition field +""" + +@abstractmethod +def truncate(field_id: int, source_name: str, source_id: int, width: int) -> T: +""" +Visit truncate partition field +""" + +@abstractmethod +def year(field_id: int, source_name: str, source_id: int) -> T: +""" +Visit year partition field +""" + +@abstractmethod +def month(field_id: int, source_name: str, source_id: int) -> T: +""" +Visit month partition field +""" + +@abstractmethod +def day(field_id: int, source_name: str, source_id: int) -> T: +""" +Visit day partition field +""" + +@abstractmethod +def hour(field_id: int, source_name: str, source_id: int) -> T: +""" +Visit hour partition field +""" + +@abstractmethod +def always_null(field_id: int, source_name: str, source_id: int) -> T: +""" +Visit void partition field +""" + +@abstractmethod +def unknown(field_id: int, source_name: str, source_id: int, transform: str) -> T: +""" +Visit unknown partition field +""" +raise ValueError(f"Unknown transform {transform} is not supported") + +class _PartitionNameGenerator(PartitionSpecVisitor[str]): +def identity(field_id: int, source_name: str, source_id: int) -> str: +return source_name + +def bucket(field_id: int, source_name: str, source_id: int, num_buckets: int) -> str: +return source_name + "_bucket_" + num_buckets + +def truncate(field_id: int, source_name: str, source_id: int, width: int) -> str: +return source_name + "_trunc_" + width + +def year(field_id: int, source_name: str, source_id: int) -> str: +return source_name + "_year" + +def month(field_id: int, source_name: str, source_id: int) -> str: +return source_name + "_month" + +def day(field_id: int, source_name: str, source_id: int) -> str: +return source_name + "_day" + +def hour(field_id: int, source_name: str, source_id: int) -> str: +return source_name + "_hour" + +def always_null(field_id: int, source_name: str, source_id: int) -> str: +return source_name + "_null" + +R = TypeVar("R") + +@staticmethod +def _visit(spec: PartitionSpec, schema: Schema, visitor: PartitionSpecVisitor[R]) -> [R]: +results = [] +for field in spec.fields: +results.append(_visit(schema, field, visitor)) +return results + +@staticmethod +def _visit(schema: Schema, field: PartitionField, visitor: PartitionSpecVisitor[R]) -> [R]: Review Comment: It would be better to use `singledispatch` here. For example: https://github.com/apache/iceberg-python/blob/452238e33f2be86b9076be78db2441aea043afb8/pyiceberg/expressions/visitors.py#L139 This is faster (which surprised me at first, but it pushes it down to the C level) and looks nicer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Partition Evolution [iceberg-python]
Fokko commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1441723506 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +1913,200 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1 return snapshot_id + +class UpdateSpec: +_table: Table +_schema: Schema +_spec: PartitionSpec +_name_to_field: Dict[str, PartitionField] +_name_to_added_field: Dict[str, PartitionField] +_transform_to_field: Dict[Tuple[int, str], PartitionField] +_transform_to_added_field: Dict[Tuple[int, str], PartitionField] +_case_sensitive: bool +_adds: List[PartitionField] +_deletes: Set[int] +_last_assigned_partition_id: int +_renames = Dict[str, str] + +def __init__( +self, +table: Table, +transaction: Optional[Transaction] = None, +case_sensitive = True +) -> None: +self._table = table +self._schema = table.schema() +self._spec = table.spec() +self._name_to_field = {field.name:field for field in self._spec.fields} +self._name_to_added_field = {} +self._transform_to_field = {(field.sourceId, repr(field.transform)): field for field in self._spec.fields} +self._transform_to_added_field = {} +self._adds = [] +self._deletes = {} +self._last_assigned_partition_id = table.spec().last_assigned_field_id +self._renames = {} +self._transaction = transaction +self._case_sensitive = case_sensitive + + +def add_field(self, name: str, transform: Transform) -> UpdateSpec: +ref = Reference(name) +bound_ref = ref.bind(self._schema, self._case_sensitive) +# verify transform can actually bind it +output_type = bound_ref.field.field_type +if not transform.can_transform(output_type): +raise ValueError(f"{transform} cannot transform {output_type} values from {bound_ref.field.name}") +transform_key = (bound_ref.field.field_id, transform) +existing_partition_field = self._transform_to_field.get(transform) +if existing_partition_field and self._is_duplicate_partition(transform_key[1], existing_partition_field): +raise ValueError(f"Duplicate partition field for ${ref.name}=${ref}, ${existing_partition_field} already exists") +added = self._transform_to_added_field.get(transform_key) +if added: +raise ValueError(f"Already added partition {added}") +new_field = self._partition_field(transform_key, name) +if not new_field.name: +new_field.name = _visit(self._schema, new_field, _PartitionNameGenerator()) + +self._check_redundant_partitions(new_field) +self._transform_to_added_field[transform_key] = new_field + +existing_partition_field = self._name_to_field.get(new_field.name) +if existing_partition_field and not new_field.field_id in self._deletes: +if isinstance(existing_partition_field.transform, VoidTransform): +self.rename_field(existing_partition_field.name, existing_partition_field.name + "_" + existing_partition_field.field_id) +else: +raise ValueError(f"Cannot add duplicate partition field name: {existing_partition_field.name}") + +self._name_to_added_field[new_field.name] = new_field +self._adds.append(new_field) +return self + +def remove_field(self, name: str) -> UpdateSpec: +added = self._name_to_added_field.get(name) +if added: +raise ValueError(f"Cannot delete newly added field {name}") +renamed = self._renames.get(name) +if renamed: +raise ValueError(f"Cannot rename and delete field: {name}") +field = self._name_to_field.get(name) +if not field: +raise ValueError(f"No such partition field: {name}") + +self._deletes.add(field.field_id) +return self + +def rename_field(self, name: str, new_name: str) -> UpdateSpec: +existing_field = self._name_to_field.get(new_name) +if existing_field and isinstance(existing_field.transform, VoidTransform): +return self.rename_field(name, name + "_" + existing_field.field_id) +added = self._name_to_added_field.get(name) +if added: +raise ValueError(f"Cannot rename recently added partitions") +field = self._name_to_field.get(name) +if not field: +raise ValueError(f"Cannot find partition field {name}") +if field.field_id in self._deletes: +raise ValueError(f"Cannot delete and rename partition field {name}") +self._renames[name] = new_name +return self + +def commit(self): Review Comment: For the schema, we have a context-manager, which adds a nice touch to the API: ```python def __
Re: [PR] Partition Evolution [iceberg-python]
Fokko commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1441724507 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +1913,200 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1 return snapshot_id + +class UpdateSpec: +_table: Table +_schema: Schema +_spec: PartitionSpec +_name_to_field: Dict[str, PartitionField] +_name_to_added_field: Dict[str, PartitionField] +_transform_to_field: Dict[Tuple[int, str], PartitionField] +_transform_to_added_field: Dict[Tuple[int, str], PartitionField] +_case_sensitive: bool +_adds: List[PartitionField] +_deletes: Set[int] +_last_assigned_partition_id: int +_renames = Dict[str, str] + +def __init__( +self, +table: Table, +transaction: Optional[Transaction] = None, +case_sensitive = True +) -> None: +self._table = table +self._schema = table.schema() +self._spec = table.spec() +self._name_to_field = {field.name:field for field in self._spec.fields} +self._name_to_added_field = {} +self._transform_to_field = {(field.sourceId, repr(field.transform)): field for field in self._spec.fields} +self._transform_to_added_field = {} +self._adds = [] +self._deletes = {} +self._last_assigned_partition_id = table.spec().last_assigned_field_id +self._renames = {} +self._transaction = transaction +self._case_sensitive = case_sensitive + + +def add_field(self, name: str, transform: Transform) -> UpdateSpec: +ref = Reference(name) +bound_ref = ref.bind(self._schema, self._case_sensitive) +# verify transform can actually bind it +output_type = bound_ref.field.field_type +if not transform.can_transform(output_type): +raise ValueError(f"{transform} cannot transform {output_type} values from {bound_ref.field.name}") +transform_key = (bound_ref.field.field_id, transform) +existing_partition_field = self._transform_to_field.get(transform) +if existing_partition_field and self._is_duplicate_partition(transform_key[1], existing_partition_field): +raise ValueError(f"Duplicate partition field for ${ref.name}=${ref}, ${existing_partition_field} already exists") +added = self._transform_to_added_field.get(transform_key) +if added: +raise ValueError(f"Already added partition {added}") +new_field = self._partition_field(transform_key, name) +if not new_field.name: +new_field.name = _visit(self._schema, new_field, _PartitionNameGenerator()) + +self._check_redundant_partitions(new_field) +self._transform_to_added_field[transform_key] = new_field + +existing_partition_field = self._name_to_field.get(new_field.name) +if existing_partition_field and not new_field.field_id in self._deletes: +if isinstance(existing_partition_field.transform, VoidTransform): +self.rename_field(existing_partition_field.name, existing_partition_field.name + "_" + existing_partition_field.field_id) +else: +raise ValueError(f"Cannot add duplicate partition field name: {existing_partition_field.name}") + +self._name_to_added_field[new_field.name] = new_field +self._adds.append(new_field) +return self + +def remove_field(self, name: str) -> UpdateSpec: +added = self._name_to_added_field.get(name) +if added: +raise ValueError(f"Cannot delete newly added field {name}") +renamed = self._renames.get(name) +if renamed: +raise ValueError(f"Cannot rename and delete field: {name}") +field = self._name_to_field.get(name) +if not field: +raise ValueError(f"No such partition field: {name}") + +self._deletes.add(field.field_id) +return self + +def rename_field(self, name: str, new_name: str) -> UpdateSpec: +existing_field = self._name_to_field.get(new_name) +if existing_field and isinstance(existing_field.transform, VoidTransform): +return self.rename_field(name, name + "_" + existing_field.field_id) +added = self._name_to_added_field.get(name) +if added: +raise ValueError(f"Cannot rename recently added partitions") +field = self._name_to_field.get(name) +if not field: +raise ValueError(f"Cannot find partition field {name}") +if field.field_id in self._deletes: +raise ValueError(f"Cannot delete and rename partition field {name}") +self._renames[name] = new_name +return self + +def commit(self): +new_spec = self._apply() +updates = [] +requirements = [] +if self._table.metadata.default_spec_i
Re: [PR] Partition Evolution [iceberg-python]
Fokko commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1441725509 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +1913,200 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1 return snapshot_id + +class UpdateSpec: +_table: Table +_schema: Schema +_spec: PartitionSpec +_name_to_field: Dict[str, PartitionField] +_name_to_added_field: Dict[str, PartitionField] +_transform_to_field: Dict[Tuple[int, str], PartitionField] +_transform_to_added_field: Dict[Tuple[int, str], PartitionField] +_case_sensitive: bool +_adds: List[PartitionField] +_deletes: Set[int] +_last_assigned_partition_id: int +_renames = Dict[str, str] + +def __init__( +self, +table: Table, +transaction: Optional[Transaction] = None, +case_sensitive = True +) -> None: +self._table = table +self._schema = table.schema() +self._spec = table.spec() +self._name_to_field = {field.name:field for field in self._spec.fields} +self._name_to_added_field = {} +self._transform_to_field = {(field.sourceId, repr(field.transform)): field for field in self._spec.fields} +self._transform_to_added_field = {} +self._adds = [] +self._deletes = {} +self._last_assigned_partition_id = table.spec().last_assigned_field_id +self._renames = {} +self._transaction = transaction +self._case_sensitive = case_sensitive + + +def add_field(self, name: str, transform: Transform) -> UpdateSpec: +ref = Reference(name) +bound_ref = ref.bind(self._schema, self._case_sensitive) +# verify transform can actually bind it +output_type = bound_ref.field.field_type +if not transform.can_transform(output_type): +raise ValueError(f"{transform} cannot transform {output_type} values from {bound_ref.field.name}") +transform_key = (bound_ref.field.field_id, transform) +existing_partition_field = self._transform_to_field.get(transform) +if existing_partition_field and self._is_duplicate_partition(transform_key[1], existing_partition_field): +raise ValueError(f"Duplicate partition field for ${ref.name}=${ref}, ${existing_partition_field} already exists") +added = self._transform_to_added_field.get(transform_key) +if added: +raise ValueError(f"Already added partition {added}") +new_field = self._partition_field(transform_key, name) +if not new_field.name: +new_field.name = _visit(self._schema, new_field, _PartitionNameGenerator()) + +self._check_redundant_partitions(new_field) +self._transform_to_added_field[transform_key] = new_field + +existing_partition_field = self._name_to_field.get(new_field.name) +if existing_partition_field and not new_field.field_id in self._deletes: +if isinstance(existing_partition_field.transform, VoidTransform): +self.rename_field(existing_partition_field.name, existing_partition_field.name + "_" + existing_partition_field.field_id) +else: +raise ValueError(f"Cannot add duplicate partition field name: {existing_partition_field.name}") + +self._name_to_added_field[new_field.name] = new_field +self._adds.append(new_field) +return self + +def remove_field(self, name: str) -> UpdateSpec: +added = self._name_to_added_field.get(name) +if added: +raise ValueError(f"Cannot delete newly added field {name}") +renamed = self._renames.get(name) +if renamed: +raise ValueError(f"Cannot rename and delete field: {name}") +field = self._name_to_field.get(name) +if not field: +raise ValueError(f"No such partition field: {name}") + +self._deletes.add(field.field_id) +return self + +def rename_field(self, name: str, new_name: str) -> UpdateSpec: +existing_field = self._name_to_field.get(new_name) +if existing_field and isinstance(existing_field.transform, VoidTransform): +return self.rename_field(name, name + "_" + existing_field.field_id) +added = self._name_to_added_field.get(name) +if added: +raise ValueError(f"Cannot rename recently added partitions") +field = self._name_to_field.get(name) +if not field: +raise ValueError(f"Cannot find partition field {name}") +if field.field_id in self._deletes: +raise ValueError(f"Cannot delete and rename partition field {name}") +self._renames[name] = new_name +return self + +def commit(self): +new_spec = self._apply() +updates = [] +requirements = [] +if self._table.metadata.default_spec_i
Re: [PR] Python: Add support for Python 3.12 [iceberg-python]
MehulBatra commented on PR #35: URL: https://github.com/apache/iceberg-python/pull/35#issuecomment-1877068534 Looking into this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Partition Evolution [iceberg-python]
Fokko commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1441727718 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +1913,200 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1 return snapshot_id + +class UpdateSpec: +_table: Table +_schema: Schema +_spec: PartitionSpec +_name_to_field: Dict[str, PartitionField] +_name_to_added_field: Dict[str, PartitionField] +_transform_to_field: Dict[Tuple[int, str], PartitionField] +_transform_to_added_field: Dict[Tuple[int, str], PartitionField] +_case_sensitive: bool +_adds: List[PartitionField] +_deletes: Set[int] +_last_assigned_partition_id: int +_renames = Dict[str, str] + +def __init__( +self, +table: Table, +transaction: Optional[Transaction] = None, +case_sensitive = True +) -> None: +self._table = table +self._schema = table.schema() +self._spec = table.spec() +self._name_to_field = {field.name:field for field in self._spec.fields} +self._name_to_added_field = {} +self._transform_to_field = {(field.sourceId, repr(field.transform)): field for field in self._spec.fields} +self._transform_to_added_field = {} +self._adds = [] +self._deletes = {} +self._last_assigned_partition_id = table.spec().last_assigned_field_id +self._renames = {} +self._transaction = transaction +self._case_sensitive = case_sensitive + + +def add_field(self, name: str, transform: Transform) -> UpdateSpec: +ref = Reference(name) +bound_ref = ref.bind(self._schema, self._case_sensitive) +# verify transform can actually bind it +output_type = bound_ref.field.field_type +if not transform.can_transform(output_type): +raise ValueError(f"{transform} cannot transform {output_type} values from {bound_ref.field.name}") +transform_key = (bound_ref.field.field_id, transform) +existing_partition_field = self._transform_to_field.get(transform) +if existing_partition_field and self._is_duplicate_partition(transform_key[1], existing_partition_field): +raise ValueError(f"Duplicate partition field for ${ref.name}=${ref}, ${existing_partition_field} already exists") +added = self._transform_to_added_field.get(transform_key) +if added: +raise ValueError(f"Already added partition {added}") +new_field = self._partition_field(transform_key, name) +if not new_field.name: +new_field.name = _visit(self._schema, new_field, _PartitionNameGenerator()) + +self._check_redundant_partitions(new_field) +self._transform_to_added_field[transform_key] = new_field + +existing_partition_field = self._name_to_field.get(new_field.name) +if existing_partition_field and not new_field.field_id in self._deletes: +if isinstance(existing_partition_field.transform, VoidTransform): +self.rename_field(existing_partition_field.name, existing_partition_field.name + "_" + existing_partition_field.field_id) +else: +raise ValueError(f"Cannot add duplicate partition field name: {existing_partition_field.name}") + +self._name_to_added_field[new_field.name] = new_field +self._adds.append(new_field) +return self + +def remove_field(self, name: str) -> UpdateSpec: +added = self._name_to_added_field.get(name) +if added: +raise ValueError(f"Cannot delete newly added field {name}") +renamed = self._renames.get(name) +if renamed: +raise ValueError(f"Cannot rename and delete field: {name}") +field = self._name_to_field.get(name) +if not field: +raise ValueError(f"No such partition field: {name}") + +self._deletes.add(field.field_id) +return self + +def rename_field(self, name: str, new_name: str) -> UpdateSpec: +existing_field = self._name_to_field.get(new_name) +if existing_field and isinstance(existing_field.transform, VoidTransform): +return self.rename_field(name, name + "_" + existing_field.field_id) +added = self._name_to_added_field.get(name) +if added: +raise ValueError(f"Cannot rename recently added partitions") +field = self._name_to_field.get(name) +if not field: +raise ValueError(f"Cannot find partition field {name}") +if field.field_id in self._deletes: +raise ValueError(f"Cannot delete and rename partition field {name}") +self._renames[name] = new_name +return self + +def commit(self): +new_spec = self._apply() +updates = [] +requirements = [] +if self._table.metadata.default_spec_i
Re: [PR] Core: Add a util to read and write partition stats [iceberg]
ajantha-bhat commented on code in PR #9170: URL: https://github.com/apache/iceberg/pull/9170#discussion_r1407500365 ## core/src/main/java/org/apache/iceberg/Partition.java: ## @@ -0,0 +1,355 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.IndexedRecord; +import org.apache.iceberg.avro.AvroSchemaUtil; +import org.apache.iceberg.types.Types; + +public class Partition implements IndexedRecord { + private PartitionData partitionData; Review Comment: Fields are based on the spec. https://github.com/apache/iceberg/blob/main/format/spec.md#partition-statistics-file ## core/src/main/java/org/apache/iceberg/Partition.java: ## @@ -0,0 +1,355 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.IndexedRecord; +import org.apache.iceberg.avro.AvroSchemaUtil; +import org.apache.iceberg.types.Types; + +public class Partition implements IndexedRecord { + private PartitionData partitionData; + private int specId; + private long dataRecordCount; + private int dataFileCount; + private long dataFileSizeInBytes; + private long posDeleteRecordCount; + private int posDeleteFileCount; + private long eqDeleteRecordCount; + private int eqDeleteFileCount; + // Optional accurate count of records in a partition after applying the delete files if any + private long totalRecordCount; + // Commit time of snapshot that last updated this partition + private long lastUpdatedAt; + // ID of snapshot that last updated this partition + private long lastUpdatedSnapshotId; + + public enum Column { +PARTITION_DATA, +SPEC_ID, +DATA_RECORD_COUNT, +DATA_FILE_COUNT, +DATA_FILE_SIZE_IN_BYTES, +POSITION_DELETE_RECORD_COUNT, +POSITION_DELETE_FILE_COUNT, +EQUALITY_DELETE_RECORD_COUNT, +EQUALITY_DELETE_FILE_COUNT, +TOTAL_RECORD_COUNT, +LAST_UPDATED_AT, +LAST_UPDATED_SNAPSHOT_ID + } + + public Partition() {} + + public static Builder builder() { +return new Builder(); + } + + public PartitionData partitionData() { +return partitionData; + } + + public int specId() { +return specId; + } + + public long dataRecordCount() { +return dataRecordCount; + } + + public int dataFileCount() { +return dataFileCount; + } + + public long dataFileSizeInBytes() { +return dataFileSizeInBytes; + } + + public Long posDeleteRecordCount() { +return posDeleteRecordCount; + } + + public Integer posDeleteFileCount() { +return posDeleteFileCount; + } + + public Long eqDeleteRecordCount() { +return eqDeleteRecordCount; + } + + public Integer eqDeleteFileCount() { +return eqDeleteFileCount; + } + + public Long totalRecordCount() { +return totalRecordCount; + } + + public Long lastUpdatedAt() { +return lastUpdatedAt; + } + + public Long lastUpdatedSnapshotId() { +return lastUpdatedSnapshotId; + } + + @Override + public void put(int i, Object v) { Review Comment: Extends `IndexedRecord` to use the existing Parquet/ORC Avro reader and writer from the `iceberg-parquet` and `iceberg-orc` module. ## core/src/main/java/org/apache/iceberg/Partition.java: ## @@ -0,0 +1,355 @@ +/* + * Licensed to the Apache Soft
Re: [PR] Core: Add a util to read and write partition stats [iceberg]
ajantha-bhat commented on code in PR #9170: URL: https://github.com/apache/iceberg/pull/9170#discussion_r1407505119 ## core/src/main/java/org/apache/iceberg/Partition.java: ## @@ -0,0 +1,355 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.IndexedRecord; +import org.apache.iceberg.avro.AvroSchemaUtil; +import org.apache.iceberg.types.Types; + +public class Partition implements IndexedRecord { + private PartitionData partitionData; + private int specId; + private long dataRecordCount; + private int dataFileCount; + private long dataFileSizeInBytes; + private long posDeleteRecordCount; + private int posDeleteFileCount; + private long eqDeleteRecordCount; + private int eqDeleteFileCount; + // Optional accurate count of records in a partition after applying the delete files if any + private long totalRecordCount; + // Commit time of snapshot that last updated this partition + private long lastUpdatedAt; + // ID of snapshot that last updated this partition + private long lastUpdatedSnapshotId; + + public enum Column { +PARTITION_DATA, +SPEC_ID, +DATA_RECORD_COUNT, +DATA_FILE_COUNT, +DATA_FILE_SIZE_IN_BYTES, +POSITION_DELETE_RECORD_COUNT, +POSITION_DELETE_FILE_COUNT, +EQUALITY_DELETE_RECORD_COUNT, +EQUALITY_DELETE_FILE_COUNT, +TOTAL_RECORD_COUNT, +LAST_UPDATED_AT, +LAST_UPDATED_SNAPSHOT_ID + } + + public Partition() {} + + public static Builder builder() { +return new Builder(); + } + + public PartitionData partitionData() { +return partitionData; + } + + public int specId() { +return specId; + } + + public long dataRecordCount() { +return dataRecordCount; + } + + public int dataFileCount() { +return dataFileCount; + } + + public long dataFileSizeInBytes() { +return dataFileSizeInBytes; + } + + public Long posDeleteRecordCount() { +return posDeleteRecordCount; + } + + public Integer posDeleteFileCount() { +return posDeleteFileCount; + } + + public Long eqDeleteRecordCount() { +return eqDeleteRecordCount; + } + + public Integer eqDeleteFileCount() { +return eqDeleteFileCount; + } + + public Long totalRecordCount() { +return totalRecordCount; + } + + public Long lastUpdatedAt() { +return lastUpdatedAt; + } + + public Long lastUpdatedSnapshotId() { +return lastUpdatedSnapshotId; + } + + @Override + public void put(int i, Object v) { +switch (i) { + case 0: +this.partitionData = (PartitionData) v; +return; + case 1: +this.specId = (int) v; +return; + case 2: +this.dataRecordCount = (long) v; +return; + case 3: +this.dataFileCount = (int) v; +return; + case 4: +this.dataFileSizeInBytes = (long) v; +return; + case 5: +this.posDeleteRecordCount = (long) v; +return; + case 6: +this.posDeleteFileCount = (int) v; +return; + case 7: +this.eqDeleteRecordCount = (long) v; +return; + case 8: +this.eqDeleteFileCount = (int) v; +return; + case 9: +this.totalRecordCount = (long) v; +return; + case 10: +this.lastUpdatedAt = (long) v; +return; + case 11: +this.lastUpdatedSnapshotId = (long) v; +return; + default: +throw new UnsupportedOperationException("Unknown field ordinal: " + i); +} + } + + @Override + public Object get(int i) { +switch (i) { + case 0: +return partitionData; + case 1: +return specId; + case 2: +return dataRecordCount; + case 3: +return dataFileCount; + case 4: +return dataFileSizeInBytes; + case 5: +return posDeleteRecordCount; + case 6: +return posDeleteFileCount; + case 7: +return eqDeleteRecordCount; + case 8: +return eqDeleteFileCount; + case 9: +return totalRecordCount; + case 10: +return lastUpdatedAt; + case 11: +return lastUp
Re: [PR] Core: Add a util to read and write partition stats [iceberg]
ajantha-bhat commented on code in PR #9170: URL: https://github.com/apache/iceberg/pull/9170#discussion_r1441737877 ## core/src/main/java/org/apache/iceberg/PartitionEntry.java: ## @@ -0,0 +1,361 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.IndexedRecord; +import org.apache.iceberg.avro.AvroSchemaUtil; +import org.apache.iceberg.types.Types; + +public class PartitionEntry implements IndexedRecord { + private PartitionData partitionData; Review Comment: Fields are based on the spec. https://github.com/apache/iceberg/blob/main/format/spec.md#partition-statistics-file ## core/src/main/java/org/apache/iceberg/PartitionEntry.java: ## @@ -0,0 +1,361 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.IndexedRecord; +import org.apache.iceberg.avro.AvroSchemaUtil; +import org.apache.iceberg.types.Types; + +public class PartitionEntry implements IndexedRecord { + private PartitionData partitionData; + private int specId; + private long dataRecordCount; + private int dataFileCount; + private long dataFileSizeInBytes; + private long posDeleteRecordCount; + private int posDeleteFileCount; + private long eqDeleteRecordCount; + private int eqDeleteFileCount; + // Optional accurate count of records in a partition after applying the delete files if any + private long totalRecordCount; + // Commit time of snapshot that last updated this partition + private long lastUpdatedAt; + // ID of snapshot that last updated this partition + private long lastUpdatedSnapshotId; + + public enum Column { +PARTITION_DATA, +SPEC_ID, +DATA_RECORD_COUNT, +DATA_FILE_COUNT, +DATA_FILE_SIZE_IN_BYTES, +POSITION_DELETE_RECORD_COUNT, +POSITION_DELETE_FILE_COUNT, +EQUALITY_DELETE_RECORD_COUNT, +EQUALITY_DELETE_FILE_COUNT, +TOTAL_RECORD_COUNT, +LAST_UPDATED_AT, +LAST_UPDATED_SNAPSHOT_ID + } + + private PartitionEntry() {} + + public static Builder builder() { +return new Builder(); + } + + public PartitionData partitionData() { +return partitionData; + } + + public int specId() { +return specId; + } + + public long dataRecordCount() { +return dataRecordCount; + } + + public int dataFileCount() { +return dataFileCount; + } + + public long dataFileSizeInBytes() { +return dataFileSizeInBytes; + } + + public long posDeleteRecordCount() { +return posDeleteRecordCount; + } + + public int posDeleteFileCount() { +return posDeleteFileCount; + } + + public long eqDeleteRecordCount() { +return eqDeleteRecordCount; + } + + public int eqDeleteFileCount() { +return eqDeleteFileCount; + } + + public long totalRecordCount() { +return totalRecordCount; + } + + public long lastUpdatedAt() { +return lastUpdatedAt; + } + + public long lastUpdatedSnapshotId() { +return lastUpdatedSnapshotId; + } + + @Override + public void put(int i, Object v) { +switch (i) { + case 0: +this.partitionData = (PartitionData) v; +return; + case 1: +this.specId = (int) v; +return; + case 2: +this.dataRecordCount = (long) v; +return; + case 3: +this.da
Re: [PR] Core: Add a util to read and write partition stats [iceberg]
ajantha-bhat commented on code in PR #9170: URL: https://github.com/apache/iceberg/pull/9170#discussion_r1441749823 ## core/src/main/java/org/apache/iceberg/PartitionEntry.java: ## @@ -0,0 +1,361 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.IndexedRecord; +import org.apache.iceberg.avro.AvroSchemaUtil; +import org.apache.iceberg.types.Types; + +public class PartitionEntry implements IndexedRecord { + private PartitionData partitionData; + private int specId; + private long dataRecordCount; + private int dataFileCount; + private long dataFileSizeInBytes; + private long posDeleteRecordCount; + private int posDeleteFileCount; + private long eqDeleteRecordCount; + private int eqDeleteFileCount; + // Optional accurate count of records in a partition after applying the delete files if any + private long totalRecordCount; + // Commit time of snapshot that last updated this partition + private long lastUpdatedAt; + // ID of snapshot that last updated this partition + private long lastUpdatedSnapshotId; + + public enum Column { +PARTITION_DATA, +SPEC_ID, +DATA_RECORD_COUNT, +DATA_FILE_COUNT, +DATA_FILE_SIZE_IN_BYTES, +POSITION_DELETE_RECORD_COUNT, +POSITION_DELETE_FILE_COUNT, +EQUALITY_DELETE_RECORD_COUNT, +EQUALITY_DELETE_FILE_COUNT, +TOTAL_RECORD_COUNT, +LAST_UPDATED_AT, +LAST_UPDATED_SNAPSHOT_ID + } + + private PartitionEntry() {} + + public static Builder builder() { +return new Builder(); + } + + public PartitionData partitionData() { +return partitionData; + } + + public int specId() { +return specId; + } + + public long dataRecordCount() { +return dataRecordCount; + } + + public int dataFileCount() { +return dataFileCount; + } + + public long dataFileSizeInBytes() { +return dataFileSizeInBytes; + } + + public long posDeleteRecordCount() { +return posDeleteRecordCount; + } + + public int posDeleteFileCount() { +return posDeleteFileCount; + } + + public long eqDeleteRecordCount() { +return eqDeleteRecordCount; + } + + public int eqDeleteFileCount() { +return eqDeleteFileCount; + } + + public long totalRecordCount() { +return totalRecordCount; + } + + public long lastUpdatedAt() { +return lastUpdatedAt; + } + + public long lastUpdatedSnapshotId() { +return lastUpdatedSnapshotId; + } + + @Override + public void put(int i, Object v) { +switch (i) { + case 0: +this.partitionData = (PartitionData) v; +return; + case 1: +this.specId = (int) v; +return; + case 2: +this.dataRecordCount = (long) v; +return; + case 3: +this.dataFileCount = (int) v; +return; + case 4: +this.dataFileSizeInBytes = (long) v; +return; + case 5: +this.posDeleteRecordCount = (long) v; +return; + case 6: +this.posDeleteFileCount = (int) v; +return; + case 7: +this.eqDeleteRecordCount = (long) v; +return; + case 8: +this.eqDeleteFileCount = (int) v; +return; + case 9: +this.totalRecordCount = (long) v; +return; + case 10: +this.lastUpdatedAt = (long) v; +return; + case 11: +this.lastUpdatedSnapshotId = (long) v; +return; + default: +throw new UnsupportedOperationException("Unknown field ordinal: " + i); +} + } + + @Override + public Object get(int i) { +switch (i) { + case 0: +return partitionData; + case 1: +return specId; + case 2: +return dataRecordCount; + case 3: +return dataFileCount; + case 4: +return dataFileSizeInBytes; + case 5: +return posDeleteRecordCount; + case 6: +return posDeleteFileCount; + case 7: +return eqDeleteRecordCount; + case 8: +return eqDeleteFileCount; + case 9: +return totalRecordCount; + case 10: +return lastUpdatedAt; + case 11: +retur
Re: [PR] Deliver key metadata for encryption of data files [iceberg]
ggershinsky commented on code in PR #9359: URL: https://github.com/apache/iceberg/pull/9359#discussion_r1441763348 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkAppenderFactory.java: ## @@ -161,7 +162,12 @@ private StructType lazyPosDeleteSparkType() { } @Override - public FileAppender newAppender(OutputFile file, FileFormat fileFormat) { + public FileAppender newAppender(OutputFile outputFile, FileFormat format) { +return newAppender(EncryptionUtil.plainAsEncryptedOutput(outputFile), format); + } + + @Override + public FileAppender newAppender(EncryptedOutputFile file, FileFormat fileFormat) { Review Comment: Sounds good, I'll move them to 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Disable classloader check in TestIcebergSourceWithWatermarkExtractor to fix flakiness [iceberg]
pvary merged PR #9408: URL: https://github.com/apache/iceberg/pull/9408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Disable classloader check in TestIcebergSourceWithWatermarkExtractor to fix flakiness [iceberg]
pvary commented on PR #9408: URL: https://github.com/apache/iceberg/pull/9408#issuecomment-1877190593 Thanks @manuzhang for the fix. CC: @stevenzwu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]
domonkosbalogh-seon commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1877212931 Ran into a similar issue (same as in https://github.com/apache/iceberg/issues/8368) using the Glue Catalog. Is there maybe a workaround to this, or this PR would be the only fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to assign splits due to the serialized split size [iceberg]
pvary commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1877233309 @stevenzwu: After a quick check, I have found this: https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/main/java/org/apache/flink/core/memory/DataOutputSerializer.java#L256-L260 ``` if (utflen > 65535) { throw new UTFDataFormatException("Encoded string is too long: " + utflen); } else if (this.position > this.buffer.length - utflen - 2) { resize(utflen + 2); } ``` This means that anything which is above 64k could not be serialized by `DataOutputSerializer.writeUTF`, which seems a bit arbitrary limit for me. We could use `DataOutputSerializer.writeChars` which uses `DataOutputSerializer.writeChar`. The downside is that it is less effective if we use simple chars (between `0x0001` and ` 0x007F`): See: https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-core/src/main/java/org/apache/flink/core/memory/DataOutputSerializer.java#L245C1-L254C10 ``` for (int i = 0; i < strlen; i++) { c = str.charAt(i); if ((c >= 0x0001) && (c <= 0x007F)) { utflen++; } else if (c > 0x07FF) { utflen += 3; } else { utflen += 2; } } ``` The upside that it should work regardless of the size of the string. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Set log level to WARN for rewrite task failure with partial progress [iceberg]
manuzhang commented on PR #9400: URL: https://github.com/apache/iceberg/pull/9400#issuecomment-1877320865 @ajantha-bhat please help review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]
lisirrx commented on PR #9217: URL: https://github.com/apache/iceberg/pull/9217#issuecomment-1877336341 @nastra Sorry to continue working so late. I got COVID last month and worked on my graduate school applications. I add the `TestBase` class and give 2 examples using it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive: Add View support for HIVE catalog [iceberg]
pvary commented on code in PR #8907: URL: https://github.com/apache/iceberg/pull/8907#discussion_r1441980492 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveViewOperations.java: ## @@ -0,0 +1,287 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.hive; + +import java.util.Collections; +import java.util.Locale; +import java.util.Objects; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.IMetaStoreClient; +import org.apache.hadoop.hive.metastore.TableType; +import org.apache.hadoop.hive.metastore.api.InvalidObjectException; +import org.apache.hadoop.hive.metastore.api.NoSuchObjectException; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.iceberg.BaseMetastoreCatalog; +import org.apache.iceberg.BaseMetastoreTableOperations; +import org.apache.iceberg.ClientPool; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.AlreadyExistsException; +import org.apache.iceberg.exceptions.CommitFailedException; +import org.apache.iceberg.exceptions.CommitStateUnknownException; +import org.apache.iceberg.exceptions.NoSuchIcebergViewException; +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.util.MetastoreOperationsUtil; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.apache.thrift.TException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Hive implementation of Iceberg ViewOperations. */ +final class HiveViewOperations extends BaseViewOperations implements HiveOperationsBase { + private static final Logger LOG = LoggerFactory.getLogger(HiveViewOperations.class); + + private final String fullName; + private final String database; + private final String viewName; + private final FileIO fileIO; + private final ClientPool metaClients; + private final long maxHiveTablePropertySize; + + HiveViewOperations( + Configuration conf, + ClientPool metaClients, + FileIO fileIO, + String catalogName, + TableIdentifier viewIdentifier) { +String dbName = viewIdentifier.namespace().level(0); +this.metaClients = metaClients; +this.fileIO = fileIO; +this.fullName = BaseMetastoreCatalog.fullTableName(catalogName, viewIdentifier); +this.database = dbName; +this.viewName = viewIdentifier.name(); +this.maxHiveTablePropertySize = +conf.getLong(HIVE_TABLE_PROPERTY_MAX_SIZE, HIVE_TABLE_PROPERTY_MAX_SIZE_DEFAULT); + } + + @Override + public void doRefresh() { +String metadataLocation = null; +Table table = null; + +try { + table = metaClients.run(client -> client.getTable(database, viewName)); + HiveOperationsBase.validateTableOrViewIsIceberg(table, fullName); + metadataLocation = + table.getParameters().get(BaseMetastoreTableOperations.METADATA_LOCATION_PROP); + +} catch (NoSuchObjectException e) { + if (currentMetadataLocation() != null) { +throw new NoSuchViewException("View does not exist: %s.%s", database, viewName); + } +} catch (TException e) { + String errMsg = + String.format("Failed to get view info from metastore %s.%s", database, viewName); + throw new RuntimeException(errMsg, e); +} catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new RuntimeException("Interrupted during refresh", e); +} + +if (table != null && !table.getTableType().equalsIgnoreCase(TableType.VIRTUAL_VIEW.name())) { + disableRefresh(); +} else { + refreshFromMetadataLocation(metadataLocation); +} + } + + @SuppressWarnings("checkstyle:CyclomaticComplexity") Review Comment: Would it worth to refactor this method to decrease the `CyclomaticComplexity`? ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -264,6 +257,159 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) { } } + @Override + public boolean dropView(TableIdentifier i
[I] Iceberg Glue Concurrent Update can result in missing metadata_location [iceberg]
shaeqahmed opened a new issue, #9411: URL: https://github.com/apache/iceberg/issues/9411 ### Apache Iceberg version 1.4.2 (latest release) ### Query engine None ### Please describe the bug 🐞 Similar issue that i found that was supposed to be fixed in older version: https://github.com/apache/iceberg/issues/7151 We have a Java Iceberg Code that processes from a FIFO queue and does commits to Iceberg in single threaded fashion. I have confirmed that we are not making commits anywhere to a table at the same time. However, when doing a few commits back to back in a row, at some point we encountered the following WARN log indicating that Glue detected a concurrent update, and it was retrying: ``` Retrying task after failure: Cannot commit glue_catalog.matano.cloudflare_http_request because Glue detected concurrent update org.apache.iceberg.exceptions.CommitFailedException: Cannot commit glue_catalog.matano.cloudflare_http_request because Glue detected concurrent update at org.apache.iceberg.aws.glue.GlueTableOperations.handleAWSExceptions(GlueTableOperations.java:355) ~[output.jar:?] at org.apache.iceberg.aws.glue.GlueTableOperations.doCommit(GlueTableOperations.java:180) ... ``` But immediately after this log, while attempting to refresh the Iceberg metadata there is a iceberg NotFoundException as the current metadata location doesn't exist or no longer exists. ``` INFO BaseMetastoreTableOperations - Refreshing table metadata from new version: s3://redacted-bucket/lake/cloudflare_http_request/metadata/xxx-e3e8a38dbdc4.metadata.json ERROR IcebergMetadataWriter - org.apache.iceberg.exceptions.NotFoundException: Location does not exist: s3://redacted-bucket/lake/cloudflare_http_request/metadata/xxx-e3e8a38dbdc4.metadata.json ``` **This has resulted in our table becoming corrupt and the availability of our data lake service being effected until we manually fixed the table by refrencing the Glue `previous_metadata_location` and overriding the invalid current `metadata_location` with it.** It looks to me that when experiencing a CommitFailedException (CFE) these are retried internally and in any case should not result in a corrupt table even if all tried fail. Our code looks as follows, as we catch all exceptions: ``` // tableObj is our class, and a thin wrapper containing the Iceberg Java Table class logger.info("Committing for tables: ${tableObjs.keys}") start = System.currentTimeMillis() runBlocking { for (tableObj in tableObjs.values) { launch(Dispatchers.IO) { try { if (tableObj.isInitalized()) { tableObj.getAppendFiles().commit() } } catch (e: Exception) { logger.error(e.message) e.printStackTrace() failures.addAll(tableObj.sqsMessageIds) } } } } logger.info("Committed tables in ${System.currentTimeMillis() - start} ms") ``` Is this a bug in the Glue Iceberg code, or how should we protect ourselves from a situation where the Iceberg table is left pointing to an invalid location because of failed commits due to concurrent modifications thrown by Glue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for reading Iceberg views [iceberg]
rdblue merged PR #9340: URL: https://github.com/apache/iceberg/pull/9340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for reading Iceberg views [iceberg]
rdblue commented on PR #9340: URL: https://github.com/apache/iceberg/pull/9340#issuecomment-1877413431 Merged! Thanks for getting this working @nastra! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deliver key metadata for encryption of data files [iceberg]
rdblue merged PR #9359: URL: https://github.com/apache/iceberg/pull/9359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deliver key metadata for encryption of data files [iceberg]
rdblue commented on PR #9359: URL: https://github.com/apache/iceberg/pull/9359#issuecomment-1877443032 Merged! Thanks, @ggershinsky for getting this done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]
pvary commented on PR #9185: URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1877465254 Do we have a way to check if the new tests were running on the CI? With the correct parameters? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]
nastra commented on PR #9185: URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1877466639 > Do we have a way to check if the new tests were running on the CI? With the correct parameters? I don't think so, but I was verifying them all locally -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1442126743 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.deletes; + +class EmptyPositionDeleteIndex implements PositionDeleteIndex { + + private static final EmptyPositionDeleteIndex INSTANCE = new EmptyPositionDeleteIndex(); + + private EmptyPositionDeleteIndex() {} + + static EmptyPositionDeleteIndex get() { +return INSTANCE; + } + + @Override + public void delete(long position) { +throw new UnsupportedOperationException("Cannot modify " + getClass().getName()); + } + + @Override + public void delete(long posStart, long posEnd) { +throw new UnsupportedOperationException("Cannot modify " + getClass().getName()); + } + + @Override + public boolean isDeleted(long position) { +return false; + } + + @Override + public boolean isEmpty() { +return true; + } + + @Override + public String toString() { +return "PositionDeleteIndex{}"; Review Comment: Agree, that would be nice to have a reasonable `toString()` implementation in others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support renaming views [iceberg]
rdblue commented on code in PR #9343: URL: https://github.com/apache/iceberg/pull/9343#discussion_r1442151540 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -53,6 +57,17 @@ case class ResolveViews(spark: SparkSession) extends Rule[LogicalPlan] with Look loadView(catalog, ident) .map(createViewRelation(parts, _)) .getOrElse(u) + +case u@UnresolvedTableOrView(CatalogAndIdentifier(catalog, ident), _, _) => + loadView(catalog, ident) +.map(_ => ResolvedV2View(catalog.asViewCatalog, ident)) +.getOrElse(u) + +case RenameTable(ResolvedV2View(oldCatalog, oldIdent), CatalogAndIdentifier(newCatalog, newIdent), true) => Review Comment: What is `true`? Can we add a name to this so that it is more readable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support renaming views [iceberg]
rdblue commented on code in PR #9343: URL: https://github.com/apache/iceberg/pull/9343#discussion_r1442152596 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -53,6 +57,17 @@ case class ResolveViews(spark: SparkSession) extends Rule[LogicalPlan] with Look loadView(catalog, ident) .map(createViewRelation(parts, _)) .getOrElse(u) + +case u@UnresolvedTableOrView(CatalogAndIdentifier(catalog, ident), _, _) => + loadView(catalog, ident) +.map(_ => ResolvedV2View(catalog.asViewCatalog, ident)) +.getOrElse(u) + +case RenameTable(ResolvedV2View(oldCatalog, oldIdent), CatalogAndIdentifier(newCatalog, newIdent), true) => Review Comment: Is it possible for the rule on line 61 to resolve the multi-part `to` identifier before this rule runs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support renaming views [iceberg]
rdblue commented on code in PR #9343: URL: https://github.com/apache/iceberg/pull/9343#discussion_r1442153222 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -590,10 +591,7 @@ public void fullFunctionIdentifier() { @Test public void fullFunctionIdentifierNotRewrittenLoadFailure() { String viewName = "fullFunctionIdentifierNotRewrittenLoadFailure"; -String sql = -String.format( -"SELECT spark_catalog.system.bucket(100, 'a') AS bucket_result, 'a' AS value", -catalogName); +String sql = "SELECT spark_catalog.system.bucket(100, 'a') AS bucket_result, 'a' AS value"; Review Comment: :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]
jackye1995 merged PR #9385: URL: https://github.com/apache/iceberg/pull/9385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]
jackye1995 commented on PR #9385: URL: https://github.com/apache/iceberg/pull/9385#issuecomment-1877659221 Seems like this is not getting much attraction from other reviewers. Given it is a pretty straightforward plugin integration, I will go ahead to merge it. We can fix things later if necessary. This also opens the question that if there is any other valuable plugins that we should consider adding integrations, we can do a search afterwards. Thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Set log level to WARN for rewrite task failure with partial progress [iceberg]
amogh-jahagirdar commented on code in PR #9400: URL: https://github.com/apache/iceberg/pull/9400#discussion_r1442227949 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -345,7 +345,7 @@ private Result doExecuteWithPartialProgress( .noRetry() .onFailure( (fileGroup, exception) -> { - LOG.error("Failure during rewrite group {}", fileGroup.info(), exception); + LOG.warn("Failure during rewrite group {}", fileGroup.info(), exception); Review Comment: Hm I think I get the rationale that during partial progress it could be too noisy if we error logged every failure. But the issue is let's say most of the commit tasks are failures, and we only see warn logs. We would not really see any indication of what the failed partitions are in the logs (at least I don't think so based off reading the code). The only case we error log is when all tasks fail which seems insufficient logging in case of failure. I actually think error is fine here. Are you seeing a lot of noisiness in this in any systems you may be running? I also noticed that the non-partial progress case is also warn, but tbh that also doesn't seem right. What do you think @manuzhang also cc @szehon-ho @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Set log level to WARN for rewrite task failure with partial progress [iceberg]
amogh-jahagirdar commented on code in PR #9400: URL: https://github.com/apache/iceberg/pull/9400#discussion_r1442227949 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -345,7 +345,7 @@ private Result doExecuteWithPartialProgress( .noRetry() .onFailure( (fileGroup, exception) -> { - LOG.error("Failure during rewrite group {}", fileGroup.info(), exception); + LOG.warn("Failure during rewrite group {}", fileGroup.info(), exception); Review Comment: Hm I think I get the rationale that during partial progress it could be too noisy if we error logged every failure. But the issue is let's say most of the commit tasks are failures, and we only see warn logs which are easy to get masked. The only case we error log is when all tasks fail which seems insufficient logging in case of failure. I actually think error is fine here. Are you seeing a lot of noisiness in this in any systems you may be running? I also noticed that the non-partial progress case is also warn, but tbh that also doesn't seem right. What do you think @manuzhang also cc @szehon-ho @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Set log level to WARN for rewrite task failure with partial progress [iceberg]
amogh-jahagirdar commented on code in PR #9400: URL: https://github.com/apache/iceberg/pull/9400#discussion_r1442236190 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -345,7 +345,7 @@ private Result doExecuteWithPartialProgress( .noRetry() .onFailure( (fileGroup, exception) -> { - LOG.error("Failure during rewrite group {}", fileGroup.info(), exception); + LOG.warn("Failure during rewrite group {}", fileGroup.info(), exception); Review Comment: Another idea is that we log an error if the commit results size is less than some threshold rather than completely empty. This would eliminate the task level noise from errors but at the same time indicate to logging systems that the compaction isn't super effective. ``` if (commitResults.size() == 0) { LOG.error( "{} is true but no rewrite commits succeeded. Check the logs to determine why the individual " + "commits failed. If this is persistent it may help to increase {} which will break the rewrite operation " + "into smaller commits.", PARTIAL_PROGRESS_ENABLED, PARTIAL_PROGRESS_MAX_COMMITS); } ``` to if (commitResults.size()/commitAttempts <= SOME_THRESHOLD) { LOG.error( "{} is true but less than SOME_THRESHOLD rewrite commits succeeded. Check the logs to determine why the individual " + "commits failed. If this is persistent it may help to increase {} which will break the rewrite operation " + "into smaller commits.", PARTIAL_PROGRESS_ENABLED, PARTIAL_PROGRESS_MAX_COMMITS); } ``` but maybe determining commitAttempts is overcomplicated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Remove deprecated method from BaseMetadataTable [iceberg]
amogh-jahagirdar commented on code in PR #9298: URL: https://github.com/apache/iceberg/pull/9298#discussion_r1442241575 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseFileRewriteCoordinator.java: ## @@ -72,18 +70,12 @@ public void clearRewrite(Table table, String fileSetId) { public Set fetchSetIds(Table table) { return resultMap.keySet().stream() -.filter(e -> e.first().equals(tableUUID(table))) +.filter(e -> e.first().equals(Spark3Util.tableUUID(table))) Review Comment: Nit: I think it would be a bit cleaner just to import Spark3Util.tableUUID and then just use tableUUID here (the diff on line 73 and other places would essentially go away in favor of just a new import statement) ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java: ## @@ -948,6 +950,17 @@ public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier( return org.apache.spark.sql.catalyst.TableIdentifier.apply(table, database); } + static String tableUUID(org.apache.iceberg.Table table) { +if (table instanceof HasTableOperations) { + TableOperations ops = ((HasTableOperations) table).operations(); + return ops.current().uuid(); +} else if (table instanceof BaseMetadataTable) { + return ((BaseMetadataTable) table).table().operations().current().uuid(); +} else { + return null; Review Comment: I think this should probably throw an exception instead of returning null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Remove deprecated method from BaseMetadataTable [iceberg]
amogh-jahagirdar commented on code in PR #9298: URL: https://github.com/apache/iceberg/pull/9298#discussion_r1442246021 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java: ## @@ -948,6 +950,17 @@ public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier( return org.apache.spark.sql.catalyst.TableIdentifier.apply(table, database); } + static String tableUUID(org.apache.iceberg.Table table) { +if (table instanceof HasTableOperations) { + TableOperations ops = ((HasTableOperations) table).operations(); + return ops.current().uuid(); +} else if (table instanceof BaseMetadataTable) { + return ((BaseMetadataTable) table).table().operations().current().uuid(); +} else { + return null; Review Comment: Also instead of `tableUUID` maybe `baseTableUUID` sine that's what we're really getting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Remove deprecated method from BaseMetadataTable [iceberg]
amogh-jahagirdar commented on code in PR #9298: URL: https://github.com/apache/iceberg/pull/9298#discussion_r1442241575 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseFileRewriteCoordinator.java: ## @@ -72,18 +70,12 @@ public void clearRewrite(Table table, String fileSetId) { public Set fetchSetIds(Table table) { return resultMap.keySet().stream() -.filter(e -> e.first().equals(tableUUID(table))) +.filter(e -> e.first().equals(Spark3Util.tableUUID(table))) Review Comment: Nit: I think it would be a bit cleaner just to import Spark3Util.tableUUID and then just use tableUUID here (the diff on line 73 and other places would essentially go away in favor of just a new import statement). But that's nbd, if we do the method rename like I suggested we lose this benefit anyways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Remove partition statistics files during purge table [iceberg]
amogh-jahagirdar commented on PR #9409: URL: https://github.com/apache/iceberg/pull/9409#issuecomment-1877783387 Thanks @dramaticlly for the review, merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Remove partition statistics files during purge table [iceberg]
amogh-jahagirdar merged PR #9409: URL: https://github.com/apache/iceberg/pull/9409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Correct schema behavior [iceberg-python]
Fokko commented on code in PR #247: URL: https://github.com/apache/iceberg-python/pull/247#discussion_r1442296384 ## pyiceberg/table/__init__.py: ## @@ -942,15 +942,16 @@ def snapshot(self) -> Optional[Snapshot]: return self.table.current_snapshot() def projection(self) -> Schema: -snapshot_schema = self.table.schema() -if snapshot := self.snapshot(): -if snapshot.schema_id is not None: -snapshot_schema = self.table.schemas()[snapshot.schema_id] +current_schema = self.table.schema() +if self.snapshot_id is not None: +snapshot = self.table.snapshot_by_id(self.snapshot_id) Review Comment: Oof, that's a good one. I think we should check if the snapshot-id is valid earlier in the process. I've added a check now, but I'll follow up with another PR to make this more strict. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add a Function variable as an extra partition filter to filter DataFiles [iceberg]
github-actions[bot] closed issue #6871: Add a Function variable as an extra partition filter to filter DataFiles URL: https://github.com/apache/iceberg/issues/6871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add a Function variable as an extra partition filter to filter DataFiles [iceberg]
github-actions[bot] commented on issue #6871: URL: https://github.com/apache/iceberg/issues/6871#issuecomment-1877934782 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [Big manifest] If the manifest file is very big, the decode cost time is very long [iceberg]
github-actions[bot] commented on issue #6868: URL: https://github.com/apache/iceberg/issues/6868#issuecomment-1877934801 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Cache delete files when reading v2 format with merge-on-read mode [iceberg]
github-actions[bot] closed issue #6865: Cache delete files when reading v2 format with merge-on-read mode URL: https://github.com/apache/iceberg/issues/6865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Cache delete files when reading v2 format with merge-on-read mode [iceberg]
github-actions[bot] commented on issue #6865: URL: https://github.com/apache/iceberg/issues/6865#issuecomment-1877934826 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [Big manifest] If the manifest file is very big, the decode cost time is very long [iceberg]
zombee0 closed issue #6868: [Big manifest] If the manifest file is very big, the decode cost time is very long URL: https://github.com/apache/iceberg/issues/6868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] check_github_labeler_v5 [iceberg]
panbingkun closed pull request #9413: check_github_labeler_v5 URL: https://github.com/apache/iceberg/pull/9413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo commented on PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#issuecomment-1878030924 Hi @Fokko, if you think this PR is too large, I can split it up. For instance, I could add the scripts in separate PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to assign splits due to the serialized split size [iceberg]
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1878085557 It seems this bug has been introduced by version 1.4.0 which is kid of new. Tried fixing it by tweaking the SplitAssignerFactory I pass down to the IcebergSource but even though I reduce the size of FileScanTasks per split to be one, it still exceeds that 65K limit. So I ended up downgrading to 1.3 unfortunately until it is fixed with 1.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
liurenjie1024 commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442448609 ## crates/catalog/hms/DEPENDENCIES.rust.tsv: ## Review Comment: What are these tsv files used for? ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting Review Comment: How about moving this before `ASF Side` as first step? ## website/src/download.md: ## Review Comment: `iceberg-rust` mainly serves as a library, I think we already have `install.md` to tell user how to use it, so I think maybe we don't need this? If this is telling user to download source from apache website, adding one section in `install.md` would be enough? ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release Review Comment: I think one missing part is update crates.io? ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new versi
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442472242 ## crates/catalog/hms/DEPENDENCIES.rust.tsv: ## Review Comment: The ASF release requires users to have a complete list of our dependencies to ensure compliance with Apache 2.0 license. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442472860 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo Review Comment: Yes. Only source releases in ASF SVN are considered official; all other releases are provided merely for user convenience. ASF adheres to this approach to ensure that ASF releases are always accessible and not restricted by specific vendors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442473048 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting Review Comment: We will vote on the artifacts that have been uploaded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442473406 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo + +SVN i
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442473747 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release Review Comment: Publishing to crates.io is optional and should be handled by GitHub CI. I'll include a section on this once the crates.io publishing CI setup is complete. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442474047 ## website/src/download.md: ## Review Comment: > I think maybe we don't need this ASF release policy requires that users have the ability to download and build from source. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] API, Core: Add withUpdatedColumnDoc API to Schema [iceberg]
amogh-jahagirdar opened a new pull request, #9414: URL: https://github.com/apache/iceberg/pull/9414 When working on Trino Iceberg view support, I realized that updating column comments across engines is the same logic 1.) Get the view schema 2.) Find the field to update, update it 3.) Return a new Schema (preserving all the other attributes like identifier fields, aliases) 4.) Replace the schema on the view This PR puts 1-3 into a separate API just so engines don't have to duplicate this logic. I don't think this matters for Spark since the way to update column comments is doing a `REPLACE VIEW` which will need to visit the schema separately for any changes, but Trino + other engines support operations which just update column comments, and I think having an Iceberg API which just does that would be convenient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add withUpdatedColumnDoc API to Schema [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442512815 ## core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java: ## @@ -1739,4 +1739,61 @@ public void testSqlForInvalidArguments() { .isInstanceOf(IllegalArgumentException.class) .hasMessage("Invalid dialect: (empty string)"); } + + @Test + public void testReplaceViewWithUpdatedColumnDoc() { +TableIdentifier identifier = TableIdentifier.of("ns", "view"); + +if (requiresNamespaceCreate()) { + catalog().createNamespace(identifier.namespace()); +} + +View view = +catalog() +.buildView(identifier) +.withSchema(SCHEMA) +.withDefaultNamespace(identifier.namespace()) +.withDefaultCatalog(catalog().name()) +.withQuery("spark", "select * from ns.tbl") +.create(); + +view.replaceVersion() +.withDefaultCatalog(view.currentVersion().defaultCatalog()) +.withDefaultNamespace(identifier.namespace()) +.withSchema(view.schema().withColumnDoc("data", "some docs for data")) +.withQuery("spark", view.sqlFor("spark").sql()) +.commit(); Review Comment: We can actually take this a step further by adding an API to view , view.updateColumnDoc which returns `ReplaceViewVersion` which is already initialized with namespace/catalog/query details since those will all be the same for someone who just wants to update the doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add withUpdatedColumnDoc API to Schema [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442516050 ## core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java: ## @@ -1739,4 +1739,61 @@ public void testSqlForInvalidArguments() { .isInstanceOf(IllegalArgumentException.class) .hasMessage("Invalid dialect: (empty string)"); } + + @Test + public void testReplaceViewWithUpdatedColumnDoc() { +TableIdentifier identifier = TableIdentifier.of("ns", "view"); + +if (requiresNamespaceCreate()) { + catalog().createNamespace(identifier.namespace()); +} + +View view = +catalog() +.buildView(identifier) +.withSchema(SCHEMA) +.withDefaultNamespace(identifier.namespace()) +.withDefaultCatalog(catalog().name()) +.withQuery("spark", "select * from ns.tbl") +.create(); + +view.replaceVersion() +.withDefaultCatalog(view.currentVersion().defaultCatalog()) +.withDefaultNamespace(identifier.namespace()) +.withSchema(view.schema().withColumnDoc("data", "some docs for data")) +.withQuery("spark", view.sqlFor("spark").sql()) +.commit(); Review Comment: Actually I think that's more than just a nice to have, since we need to make sure all dialects are preserved in the replace. It's good to have that be in the API implementation itself rather than have different engine behaviors. ## core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java: ## @@ -1739,4 +1739,61 @@ public void testSqlForInvalidArguments() { .isInstanceOf(IllegalArgumentException.class) .hasMessage("Invalid dialect: (empty string)"); } + + @Test + public void testReplaceViewWithUpdatedColumnDoc() { +TableIdentifier identifier = TableIdentifier.of("ns", "view"); + +if (requiresNamespaceCreate()) { + catalog().createNamespace(identifier.namespace()); +} + +View view = +catalog() +.buildView(identifier) +.withSchema(SCHEMA) +.withDefaultNamespace(identifier.namespace()) +.withDefaultCatalog(catalog().name()) +.withQuery("spark", "select * from ns.tbl") +.create(); + +view.replaceVersion() +.withDefaultCatalog(view.currentVersion().defaultCatalog()) +.withDefaultNamespace(identifier.namespace()) +.withSchema(view.schema().withColumnDoc("data", "some docs for data")) +.withQuery("spark", view.sqlFor("spark").sql()) +.commit(); Review Comment: Actually I think that's more than just a nice to have, since we need to make sure all dialects are preserved in the replace. It's importat to have that be in the API implementation itself rather than have different engine behaviors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
liurenjie1024 commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442519521 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo +
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
liurenjie1024 commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442520171 ## website/src/download.md: ## Review Comment: > ASF release policy requires that users have the ability to download and build from source. Then how about adding one section in `install.md` with title `Build from source`? I think they are just one way of install? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
liurenjie1024 commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442520271 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release Review Comment: Got it, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442535706 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -128,4 +128,23 @@ public SQLViewRepresentation sqlFor(String dialect) { return closest; } + + @Override + @SuppressWarnings("checkstyle:HiddenField") + public ReplaceViewVersion updateColumnDoc(String name, String doc) { Review Comment: Alternatively we could implement `UpdateSchema` for Views and the only implementation is updateColumndoc, and everything else throws Unsupported. I was -1 that since even returning UpdateSchema to callers will make it seem like some sort of schema evolution on views is possible, but it's not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442535706 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -128,4 +128,23 @@ public SQLViewRepresentation sqlFor(String dialect) { return closest; } + + @Override + @SuppressWarnings("checkstyle:HiddenField") + public ReplaceViewVersion updateColumnDoc(String name, String doc) { Review Comment: Alternatively we could implement `UpdateSchema` for Views and the only implementation is updateColumndoc, and everything else throws Unsupported. I was -1 that since even returning UpdateSchema to callers will make it seem like some sort of schema evolution on views is possible, but it's not (and shouldn't be) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442540195 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -128,4 +128,23 @@ public SQLViewRepresentation sqlFor(String dialect) { return closest; } + + @Override + @SuppressWarnings("checkstyle:HiddenField") + public ReplaceViewVersion updateColumnDoc(String name, String doc) { Review Comment: Ah the awkward thing here is method chaining ```view.updateColumnDoc("col1", "docs").updateColumnDoc("col2", "docs").commit()``` will only update column2 since we're not really doing any change tracking. Maybe this should be updated to allow multiple column/docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442540195 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -128,4 +128,23 @@ public SQLViewRepresentation sqlFor(String dialect) { return closest; } + + @Override + @SuppressWarnings("checkstyle:HiddenField") + public ReplaceViewVersion updateColumnDoc(String name, String doc) { Review Comment: Ah the awkward thing here is method chaining view.updateColumnDoc("col1", "docs").updateColumnDoc("col2", "docs").commit() will only update column2. Maybe this should be updated to allow multiple column/docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442540195 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -128,4 +128,23 @@ public SQLViewRepresentation sqlFor(String dialect) { return closest; } + + @Override + @SuppressWarnings("checkstyle:HiddenField") + public ReplaceViewVersion updateColumnDoc(String name, String doc) { Review Comment: Ah the awkward thing here is method chaining ```view.updateColumnDoc("col1", "docs").updateColumnDoc("col2", "docs").commit()``` will only update column2 since we're not really doing any change tracking. Maybe this should be updated to allow multiple column/docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442541492 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -128,4 +128,23 @@ public SQLViewRepresentation sqlFor(String dialect) { return closest; } + + @Override + @SuppressWarnings("checkstyle:HiddenField") + public ReplaceViewVersion updateColumnDoc(String name, String doc) { Review Comment: Ah the awkward thing is method chaining... ``` view.updateColumDoc("col1", "doc1").updateColumnDoc("col2, "doc2").commit() ``` This will only update col2 since we're not doing any change tracking or re-using any previous replaces. Maybe this API instead can accept multiple col/docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1442541492 ## core/src/main/java/org/apache/iceberg/view/BaseView.java: ## @@ -128,4 +128,23 @@ public SQLViewRepresentation sqlFor(String dialect) { return closest; } + + @Override + @SuppressWarnings("checkstyle:HiddenField") + public ReplaceViewVersion updateColumnDoc(String name, String doc) { Review Comment: Ah the awkward thing is method chaining... ``` view.updateColumDoc("col1", "doc1").updateColumnDoc("col2, "doc2").commit() ``` This will only update col2 since we're not doing any change tracking or re-using any previous replaces. Maybe this API instead can accept multiple col/docs, but probably if we go down this route it's best to just follow the existing pattern and do some change tracking -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Strip trailing slash for warehouse location [iceberg]
ajantha-bhat commented on code in PR #9415: URL: https://github.com/apache/iceberg/pull/9415#discussion_r1442549368 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java: ## @@ -183,7 +184,8 @@ private String validateWarehouseLocation(String name, Map catalo catalogOptions); throw new IllegalStateException("Parameter 'warehouse' not set, Nessie can't store data."); } -return warehouseLocation; + +return LocationUtil.stripTrailingSlash(warehouseLocation); Review Comment: Just making this inline with other catalogs. We can lookup the callers of `LocationUtil.stripTrailingSlash` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Strip trailing slash for warehouse location [iceberg]
ajantha-bhat commented on PR #9415: URL: https://github.com/apache/iceberg/pull/9415#issuecomment-1878214701 cc: @snazy, @dimas-b , @adutra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Correct schema behavior [iceberg-python]
Fokko merged PR #247: URL: https://github.com/apache/iceberg-python/pull/247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on PR #9414: URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1878270693 Hmafter thinking about this a bit more, I think we can remove the Schema API and add an API to ReplaceViewVersion `withUpdatedColumnDoc`. This API can do change tracking and then do all the schema changing logic internally. The Schema API isn't really needed to be public since this logic really is only needed for views. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org