[PR] Build: Fix minor compilation warnings [iceberg]
nk1506 opened a new pull request, #8758: URL: https://github.com/apache/iceberg/pull/8758 There were few warnings with `./gradlew clean build -x test -x integrationTest` . this change is to make build **green**. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] flink1.14.4+iceberg0.13.1+hive-metastore3.1.2+minio(S3) error! [iceberg]
pvary commented on issue #4743: URL: https://github.com/apache/iceberg/issues/4743#issuecomment-1754585935 Maybe opening another issue would have been > @pvary I know the error is diff than the issue. Maybe opening another issue would have been better in this case > Do we have document for Flink on how to configure Flink with Iceberg, Hive, and Minio. I am more interested in configuration part. Thanks! I do not think we have specific documentation for you case. We have a general docs: https://iceberg.apache.org/docs/latest/flink/ We have https://iceberg.apache.org/docs/latest/flink/#hive-catalog and https://iceberg.apache.org/docs/latest/flink-connector/#table-managed-in-hive-catalog for Hive Catalog We have https://iceberg.apache.org/docs/latest/aws/#s3-fileio for S3 access -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] flink1.14.4+iceberg0.13.1+hive-metastore3.1.2+minio(S3) error! [iceberg]
ramdas-jagtap commented on issue #4743: URL: https://github.com/apache/iceberg/issues/4743#issuecomment-1754588737 Thanks @pvary for sharing the docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Disable merge-commit and enforce linear history [iceberg-python]
Fokko opened a new pull request, #57: URL: https://github.com/apache/iceberg-python/pull/57 This keeps the git history clear -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Migrate Files using TestRule in dell package to Junit5 [iceberg]
nastra closed issue #7888: Migrate Files using TestRule in dell package to Junit5 URL: https://github.com/apache/iceberg/issues/7888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Dell : Migrate Files using TestRule to Junit5. [iceberg]
nastra merged PR #8707: URL: https://github.com/apache/iceberg/pull/8707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to find data source: iceberg. Please find packages at [iceberg]
NhatDuy11 commented on issue #7268: URL: https://github.com/apache/iceberg/issues/7268#issuecomment-1754691644 Can someone tell me if I am using Spark 2.4.5 and Scala 2.11.12 which version of Apache Iceberg should I use?, thank you very much everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] java.lang.IllegalStateException: Connection pool shut down when refreshing table metadata on s3 [iceberg]
AkshayWise commented on issue #8601: URL: https://github.com/apache/iceberg/issues/8601#issuecomment-1754706576 @Kontinuation @stevenzwu I believe this fix was released over 1.4.0 over last week, but I am still getting this error over Flink (1.15) Iceberg jobs: ``` java.lang.IllegalStateException: Connection pool shut down at org.apache.http.util.Asserts.check(Asserts.java:34) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:269) at software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:75) at software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$InstrumentedHttpClientConnectionManager.requestConnection(ClientConnectionManagerFactory.java:57) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72) at software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254) at software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104) at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231) at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228) at software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:63) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56) at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingSt
[PR] Core: Use more permissive check when registering existing table [iceberg]
nastra opened a new pull request, #8759: URL: https://github.com/apache/iceberg/pull/8759 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] java.lang.IllegalStateException: Connection pool shut down when refreshing table metadata on s3 [iceberg]
nastra commented on issue #8601: URL: https://github.com/apache/iceberg/issues/8601#issuecomment-1754735024 @AkshayWise this fix didn't make it into 1.4.0 unfortunately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1351918311 ## pyiceberg/avro/resolver.py: ## @@ -233,7 +255,107 @@ def skip(self, decoder: BinaryDecoder) -> None: pass -class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]): +class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]): +def schema(self, schema: Schema, expected_schema: Optional[IcebergType], result: Writer) -> Writer: +return result + +def struct(self, struct: StructType, provided_struct: Optional[IcebergType], field_writers: List[Writer]) -> Writer: +if not isinstance(provided_struct, StructType): +raise ResolveError(f"File/write schema are not aligned for struct, got {provided_struct}") + +provided_struct_positions: Dict[int, int] = {field.field_id: pos for pos, field in enumerate(provided_struct.fields)} + +results: List[Tuple[Optional[int], Writer]] = [] +iter(field_writers) + +for pos, write_field in enumerate(struct.fields): +if write_field.field_id in provided_struct_positions: + results.append((provided_struct_positions[write_field.field_id], field_writers[pos])) +else: +# There is a default value +if isinstance(write_field, NestedField) and write_field.write_default is not None: +# The field is not in the record, but there is a write default value +default_writer = DefaultWriter( +writer=visit(write_field.field_type, CONSTRUCT_WRITER_VISITOR), value=write_field.write_default Review Comment: @rdblue Just to clarify, the type annotation here is not just a hint, it will be enforced by Pydantic. If you pass in something other than what the type allows, it will raise a Pydantic `ValidationError`. An assertion would be similar (but then it would be done in Python land instead of Rust 🦀 ). ## pyiceberg/avro/resolver.py: ## @@ -233,7 +255,107 @@ def skip(self, decoder: BinaryDecoder) -> None: pass -class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]): +class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]): +def schema(self, schema: Schema, expected_schema: Optional[IcebergType], result: Writer) -> Writer: +return result + +def struct(self, struct: StructType, provided_struct: Optional[IcebergType], field_writers: List[Writer]) -> Writer: +if not isinstance(provided_struct, StructType): +raise ResolveError(f"File/write schema are not aligned for struct, got {provided_struct}") + +provided_struct_positions: Dict[int, int] = {field.field_id: pos for pos, field in enumerate(provided_struct.fields)} + +results: List[Tuple[Optional[int], Writer]] = [] +iter(field_writers) + +for pos, write_field in enumerate(struct.fields): +if write_field.field_id in provided_struct_positions: + results.append((provided_struct_positions[write_field.field_id], field_writers[pos])) +else: +# There is a default value +if isinstance(write_field, NestedField) and write_field.write_default is not None: +# The field is not in the record, but there is a write default value +default_writer = DefaultWriter( +writer=visit(write_field.field_type, CONSTRUCT_WRITER_VISITOR), value=write_field.write_default Review Comment: @rdblue Just to clarify, the type annotation here is not just a hint, it will be enforced by Pydantic. If you pass in something other than what the type allows, it will raise a Pydantic `ValidationError`. An assertion would be similar (but then it would be done in Python land instead of Rust 🦀 ). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add logic for table format-version updates [iceberg-python]
Fokko commented on PR #55: URL: https://github.com/apache/iceberg-python/pull/55#issuecomment-1754783886 @rdblue I agree with you there. I think we can still update the method name since it was just raising an exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Allow missing object in ErrorResponse [iceberg]
amogh-jahagirdar commented on code in PR #8760: URL: https://github.com/apache/iceberg/pull/8760#discussion_r1352021537 ## core/src/main/java/org/apache/iceberg/rest/responses/ErrorResponseParser.java: ## @@ -76,17 +76,20 @@ public static ErrorResponse fromJson(JsonNode jsonNode) { jsonNode != null && jsonNode.isObject(), "Cannot parse error response from non-object value: %s", jsonNode); -Preconditions.checkArgument(jsonNode.has(ERROR), "Cannot parse missing field: error"); -JsonNode error = JsonUtil.get(ERROR, jsonNode); -String message = JsonUtil.getStringOrNull(MESSAGE, error); -String type = JsonUtil.getStringOrNull(TYPE, error); -Integer code = JsonUtil.getIntOrNull(CODE, error); -List stack = JsonUtil.getStringListOrNull(STACK, error); -return ErrorResponse.builder() -.withMessage(message) -.withType(type) -.responseCode(code) -.withStackTrace(stack) -.build(); +if (jsonNode.has(ERROR)) { + JsonNode error = JsonUtil.get(ERROR, jsonNode); + String message = JsonUtil.getStringOrNull(MESSAGE, error); + String type = JsonUtil.getStringOrNull(TYPE, error); + Integer code = JsonUtil.getIntOrNull(CODE, error); + List stack = JsonUtil.getStringListOrNull(STACK, error); + return ErrorResponse.builder() + .withMessage(message) + .withType(type) + .responseCode(code) + .withStackTrace(stack) + .build(); +} else { + return ErrorResponse.builder().build(); +} Review Comment: I maybe missing something but this is `ErrorResponseParser` no? I'd expect whatever JSON that gets passed to this to have all of these details (message, type, code, stack). I just assumed the response model isn't marked as required in the REST spec, because it depends on if an error is thrown as part of the call. If it is thrown, I'd expect all these fields to be set. Maybe there's a better way to convey it in the spec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Allow missing object in ErrorResponse [iceberg]
amogh-jahagirdar commented on code in PR #8760: URL: https://github.com/apache/iceberg/pull/8760#discussion_r135149 ## core/src/main/java/org/apache/iceberg/rest/responses/ErrorResponse.java: ## @@ -22,18 +22,17 @@ import java.io.StringWriter; import java.util.Arrays; import java.util.List; -import org.apache.iceberg.relocated.com.google.common.base.Preconditions; import org.apache.iceberg.rest.RESTResponse; /** Standard response body for all API errors */ public class ErrorResponse implements RESTResponse { private String message; private String type; - private int code; + private Integer code; Review Comment: I think this needs to still be `int`, since `code` is required? https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml#L1108 , which makes sense we should always have some known status code. ## core/src/main/java/org/apache/iceberg/rest/responses/ErrorResponseParser.java: ## @@ -76,17 +76,20 @@ public static ErrorResponse fromJson(JsonNode jsonNode) { jsonNode != null && jsonNode.isObject(), "Cannot parse error response from non-object value: %s", jsonNode); -Preconditions.checkArgument(jsonNode.has(ERROR), "Cannot parse missing field: error"); -JsonNode error = JsonUtil.get(ERROR, jsonNode); -String message = JsonUtil.getStringOrNull(MESSAGE, error); -String type = JsonUtil.getStringOrNull(TYPE, error); -Integer code = JsonUtil.getIntOrNull(CODE, error); -List stack = JsonUtil.getStringListOrNull(STACK, error); -return ErrorResponse.builder() -.withMessage(message) -.withType(type) -.responseCode(code) -.withStackTrace(stack) -.build(); +if (jsonNode.has(ERROR)) { + JsonNode error = JsonUtil.get(ERROR, jsonNode); + String message = JsonUtil.getStringOrNull(MESSAGE, error); + String type = JsonUtil.getStringOrNull(TYPE, error); + Integer code = JsonUtil.getIntOrNull(CODE, error); + List stack = JsonUtil.getStringListOrNull(STACK, error); + return ErrorResponse.builder() + .withMessage(message) + .withType(type) + .responseCode(code) + .withStackTrace(stack) + .build(); +} else { + return ErrorResponse.builder().build(); +} Review Comment: I maybe missing something but this is `ErrorResponseParser` no? I'd expect whatever JSON that gets passed to this to have the error. I just assumed the response model isn't marked as required in the REST spec, because it depends on if an error is thrown as part of the call. If it is thrown, I'd expect all these fields to be set. Maybe there's a better way to convey it in the spec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Flaky test/env TestFlinkParquetReader, TestFlinkParquetWriter, TestIcebergSourceBoundedSql [iceberg]
nk1506 opened a new issue, #8761: URL: https://github.com/apache/iceberg/issues/8761 ### Apache Iceberg version 1.4.0 (latest release) ### Query engine Flink ### Please describe the bug 🐞 Flink 1.16 ` at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2188) at org.apache.flink.table.planner.delegation.DefaultExecutor.executeAsync(DefaultExecutor.java:95) at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentImpl.java:884) ... 4 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:292) ... 13 more Caused by: java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(ByteBuffer.java:334) at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:795) at org.apache.hadoop.io.Text.encode(Text.java:451) at org.apache.hadoop.io.Text.encode(Text.java:431) at org.apache.hadoop.io.Text.writeString(Text.java:480) at org.apache.hadoop.conf.Configuration.write(Configuration.java:2889) at org.apache.iceberg.hadoop.SerializableConfiguration.writeObject(SerializableConfiguration.java:38) at sun.reflect.GeneratedMethodAccessor87.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1154) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:632) at org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:548) at org.apache.flink.streaming.api.graph.StreamConfig.lambda$serializeAllConfigs$1(StreamConfig.java:195) at java.util.HashMap.forEach(HashMap.java:1290) at org.apache.flink.streaming.api.graph.StreamConfig.serializeAllConfigs(StreamConfig.java:192) at org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:169) at java.util.concurrent.CompletableFuture.uniAccept
Re: [I] Some questions about Iceberg's capabilities in Flink [iceberg]
jonathf commented on issue #8754: URL: https://github.com/apache/iceberg/issues/8754#issuecomment-1754998172 Okay, that explains it. Last two questions: * Will #8553 support some sort of ordering guaranty? * Is the streaminig feature associated with icebergs tagging features? This might be a weird question, but I have heared some people mentioning it, though I can not see it written anywhere. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add logic for table format-version updates [iceberg-python]
Fokko merged PR #55: URL: https://github.com/apache/iceberg-python/pull/55 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Allow missing object in ErrorResponse [iceberg]
Fokko commented on code in PR #8760: URL: https://github.com/apache/iceberg/pull/8760#discussion_r1352239032 ## core/src/main/java/org/apache/iceberg/rest/responses/ErrorResponseParser.java: ## @@ -76,17 +76,20 @@ public static ErrorResponse fromJson(JsonNode jsonNode) { jsonNode != null && jsonNode.isObject(), "Cannot parse error response from non-object value: %s", jsonNode); -Preconditions.checkArgument(jsonNode.has(ERROR), "Cannot parse missing field: error"); -JsonNode error = JsonUtil.get(ERROR, jsonNode); -String message = JsonUtil.getStringOrNull(MESSAGE, error); -String type = JsonUtil.getStringOrNull(TYPE, error); -Integer code = JsonUtil.getIntOrNull(CODE, error); -List stack = JsonUtil.getStringListOrNull(STACK, error); -return ErrorResponse.builder() -.withMessage(message) -.withType(type) -.responseCode(code) -.withStackTrace(stack) -.build(); +if (jsonNode.has(ERROR)) { + JsonNode error = JsonUtil.get(ERROR, jsonNode); + String message = JsonUtil.getStringOrNull(MESSAGE, error); + String type = JsonUtil.getStringOrNull(TYPE, error); + Integer code = JsonUtil.getIntOrNull(CODE, error); + List stack = JsonUtil.getStringListOrNull(STACK, error); + return ErrorResponse.builder() + .withMessage(message) + .withType(type) + .responseCode(code) + .withStackTrace(stack) + .build(); +} else { + return ErrorResponse.builder().build(); +} Review Comment: We can also update the spec, but it looks like not all systems (EMR) are sending the full message. The question is, do we want to fail at parsing the error message, or just return an empty message (or throw an exception somewhere else). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Allow missing object in ErrorResponse [iceberg]
Fokko commented on code in PR #8760: URL: https://github.com/apache/iceberg/pull/8760#discussion_r1352243492 ## core/src/main/java/org/apache/iceberg/rest/responses/ErrorResponse.java: ## @@ -22,18 +22,17 @@ import java.io.StringWriter; import java.util.Arrays; import java.util.List; -import org.apache.iceberg.relocated.com.google.common.base.Preconditions; import org.apache.iceberg.rest.RESTResponse; /** Standard response body for all API errors */ public class ErrorResponse implements RESTResponse { private String message; private String type; - private int code; + private Integer code; Review Comment: Hmm, we only require `code` to be there, we also don't check for `message` and `type` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] is there anyway to rewrite onto a specific branch? [iceberg]
zinking opened a new issue, #8762: URL: https://github.com/apache/iceberg/issues/8762 ### Query engine Spark ### Question I thought this might do ``` val table = s"iceberg_catalog.${tableIdentifier}.branch_${branch}" val t = Spark3Util.loadIcebergTable(spark, table) val start = System.currentTimeMillis() try { SparkActions.get() .rewriteDataFiles(t) .skipPlanDeletes(skipPlanDeletes) .filter(Expressions.equal("ds", 20230923)) .execute() ``` I was assuming the data is read from the branch, and the rewrite the result is written onto the branch but it is not, seems the change is still visible on main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352295604 ## pyiceberg/avro/resolver.py: ## @@ -233,7 +255,107 @@ def skip(self, decoder: BinaryDecoder) -> None: pass -class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]): +class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]): +def schema(self, schema: Schema, expected_schema: Optional[IcebergType], result: Writer) -> Writer: +return result + +def struct(self, struct: StructType, provided_struct: Optional[IcebergType], field_writers: List[Writer]) -> Writer: +if not isinstance(provided_struct, StructType): +raise ResolveError(f"File/write schema are not aligned for struct, got {provided_struct}") + +provided_struct_positions: Dict[int, int] = {field.field_id: pos for pos, field in enumerate(provided_struct.fields)} + +results: List[Tuple[Optional[int], Writer]] = [] +iter(field_writers) + +for pos, write_field in enumerate(struct.fields): +if write_field.field_id in provided_struct_positions: + results.append((provided_struct_positions[write_field.field_id], field_writers[pos])) +else: +# There is a default value +if isinstance(write_field, NestedField) and write_field.write_default is not None: +# The field is not in the record, but there is a write default value +default_writer = DefaultWriter( +writer=visit(write_field.field_type, CONSTRUCT_WRITER_VISITOR), value=write_field.write_default +) +results.append((None, default_writer)) +elif write_field.required: +raise ValueError(f"Field is required, and there is no write default: {write_field}") +else: +results.append((pos, NoneWriter())) + +return StructWriter(field_writers=tuple(results)) + +def field(self, field: NestedField, expected_field: Optional[IcebergType], field_writer: Writer) -> Writer: +return field_writer if field.required else OptionWriter(field_writer) + +def list(self, list_type: ListType, expected_list: Optional[IcebergType], element_reader: Writer) -> Writer: +if expected_list and not isinstance(expected_list, ListType): +raise ResolveError(f"File/read schema are not aligned for list, got {expected_list}") Review Comment: Created an issue for this: https://github.com/apache/iceberg-python/issues/58 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Pass in the correct type for the VisitorWithParent [iceberg-python]
Fokko opened a new issue, #58: URL: https://github.com/apache/iceberg-python/issues/58 ### Feature Request / Improvement So we can avoid the checks, see: https://github.com/apache/iceberg-python/pull/40#discussion_r1349776857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352299345 ## pyiceberg/avro/resolver.py: ## @@ -192,7 +195,26 @@ def visit_binary(self, binary_type: BinaryType) -> Writer: return BinaryWriter() -def resolve( +CONSTRUCT_WRITER_VISITOR = ConstructWriter() + + +def resolve_writer( +struct_schema: Union[Schema, IcebergType], +write_schema: Union[Schema, IcebergType], +) -> Writer: +"""Resolve the file and read schema to produce a reader. + +Args: +struct_schema (Schema | IcebergType): The schema of the Avro file. +write_schema (Schema | IcebergType): The requested read schema which is equal, subset or superset of the file schema. Review Comment: This is very subjective :D > I think the names are still confusing here. When I see data_schema I would expect it to be the schema of the data that is being written. For me, I would assume that the `data_schema` is in memory. `record_schema` and `file_schema` sounds the most natural to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Disable merge-commit and enforce linear history [iceberg-python]
liurenjie1024 commented on code in PR #57: URL: https://github.com/apache/iceberg-python/pull/57#discussion_r1352301058 ## .asf.yaml: ## @@ -28,6 +28,16 @@ github: - apache - hacktoberfest - pyiceberg + enabled_merge_buttons: +merge: false +squash: true +rebase: trueB Review Comment: ```suggestion rebase: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352300965 ## pyiceberg/avro/resolver.py: ## @@ -192,7 +194,28 @@ def visit_binary(self, binary_type: BinaryType) -> Writer: return BinaryWriter() -def resolve( +CONSTRUCT_WRITER_VISITOR = ConstructWriter() + + +def resolve_writer( +data_schema: Union[Schema, IcebergType], +write_schema: Union[Schema, IcebergType], +) -> Writer: +"""Resolve the file and read schema to produce a reader. + +Args: +data_schema (Schema | IcebergType): The schema of the Avro file. +write_schema (Schema | IcebergType): The requested read schema which is equal, subset or superset of the file schema. + +Raises: +NotImplementedError: If attempting to resolve an unrecognized object type. +""" +if write_schema == data_schema: +return construct_writer(write_schema) +return visit_with_partner(write_schema, data_schema, WriteSchemaResolver(), SchemaPartnerAccessor()) # type: ignore Review Comment: Yes, this is because the arguments to the function, feel most natural from left to right. You have the data in some kind of schema, and you want to project that to some write schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352299345 ## pyiceberg/avro/resolver.py: ## @@ -192,7 +195,26 @@ def visit_binary(self, binary_type: BinaryType) -> Writer: return BinaryWriter() -def resolve( +CONSTRUCT_WRITER_VISITOR = ConstructWriter() + + +def resolve_writer( +struct_schema: Union[Schema, IcebergType], +write_schema: Union[Schema, IcebergType], +) -> Writer: +"""Resolve the file and read schema to produce a reader. + +Args: +struct_schema (Schema | IcebergType): The schema of the Avro file. +write_schema (Schema | IcebergType): The requested read schema which is equal, subset or superset of the file schema. Review Comment: Missed this one. Thanks and this is very subjective :D > I think the names are still confusing here. When I see data_schema I would expect it to be the schema of the data that is being written. For me, I would assume that the `data_schema` is in memory. `record_schema` and `file_schema` sounds the most natural to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352318212 ## pyiceberg/avro/resolver.py: ## @@ -233,7 +256,93 @@ def skip(self, decoder: BinaryDecoder) -> None: pass -class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]): +class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]): +def schema(self, write_schema: Schema, data_schema: Optional[IcebergType], result: Writer) -> Writer: +return result + +def struct(self, write_schema: StructType, data_struct: Optional[IcebergType], field_writers: List[Writer]) -> Writer: +if not isinstance(data_struct, StructType): +raise ResolveError(f"File/write schema are not aligned for struct, got {data_struct}") + +data_positions: Dict[int, int] = {field.field_id: pos for pos, field in enumerate(data_struct.fields)} +results: List[Tuple[Optional[int], Writer]] = [] + +for writer, write_field in zip(field_writers, write_schema.fields): +if write_field.field_id in data_positions: +results.append((data_positions[write_field.field_id], writer)) +else: +# There is a default value +if write_field.write_default is not None: +# The field is not in the record, but there is a write default value +results.append((None, DefaultWriter(writer=writer, value=write_field.write_default))) # type: ignore +elif write_field.required: +raise ValueError(f"Field is required, and there is no write default: {write_field}") + +return StructWriter(field_writers=tuple(results)) + +def field(self, write_field: NestedField, data_type: Optional[IcebergType], field_writer: Writer) -> Writer: +return field_writer if write_field.required else OptionWriter(field_writer) + +def list(self, write_list_type: ListType, write_list: Optional[IcebergType], element_reader: Writer) -> Writer: Review Comment: Nice, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352317115 ## pyiceberg/avro/resolver.py: ## @@ -233,7 +256,93 @@ def skip(self, decoder: BinaryDecoder) -> None: pass -class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]): +class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]): +def schema(self, write_schema: Schema, data_schema: Optional[IcebergType], result: Writer) -> Writer: +return result + +def struct(self, write_schema: StructType, data_struct: Optional[IcebergType], field_writers: List[Writer]) -> Writer: +if not isinstance(data_struct, StructType): +raise ResolveError(f"File/write schema are not aligned for struct, got {data_struct}") + +data_positions: Dict[int, int] = {field.field_id: pos for pos, field in enumerate(data_struct.fields)} +results: List[Tuple[Optional[int], Writer]] = [] + +for writer, write_field in zip(field_writers, write_schema.fields): +if write_field.field_id in data_positions: +results.append((data_positions[write_field.field_id], writer)) +else: +# There is a default value +if write_field.write_default is not None: +# The field is not in the record, but there is a write default value +results.append((None, DefaultWriter(writer=writer, value=write_field.write_default))) # type: ignore +elif write_field.required: +raise ValueError(f"Field is required, and there is no write default: {write_field}") Review Comment: I think this is correct, and let me illustrate this with an example:  All the three branches: - `if`: The field is in the `record_schema` and is part of the write schema. It will produce a `(0, IntegerWriter())` for the `0: status`. - `elif`: The field is not in the `record_schema`, but has a write default (we use this to write the `block_size_in_bytes` since it is required:  - `else`: The else is not there anymore, and this branch is taken for the `sequence_number` and `file_sequence_number` where the field is part of the `record_schema`, but not part of the `file_schema`. Therefore we don't need to write any null bytes. For the read-case, this is different, and we would need a reader since we need to skip over the data in the file, but for the write case, we can just ignore certain fields because they are not part of the `file_schema`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352317115 ## pyiceberg/avro/resolver.py: ## @@ -233,7 +256,93 @@ def skip(self, decoder: BinaryDecoder) -> None: pass -class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]): +class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]): +def schema(self, write_schema: Schema, data_schema: Optional[IcebergType], result: Writer) -> Writer: +return result + +def struct(self, write_schema: StructType, data_struct: Optional[IcebergType], field_writers: List[Writer]) -> Writer: +if not isinstance(data_struct, StructType): +raise ResolveError(f"File/write schema are not aligned for struct, got {data_struct}") + +data_positions: Dict[int, int] = {field.field_id: pos for pos, field in enumerate(data_struct.fields)} +results: List[Tuple[Optional[int], Writer]] = [] + +for writer, write_field in zip(field_writers, write_schema.fields): +if write_field.field_id in data_positions: +results.append((data_positions[write_field.field_id], writer)) +else: +# There is a default value +if write_field.write_default is not None: +# The field is not in the record, but there is a write default value +results.append((None, DefaultWriter(writer=writer, value=write_field.write_default))) # type: ignore +elif write_field.required: +raise ValueError(f"Field is required, and there is no write default: {write_field}") Review Comment: I think this is correct, and let me illustrate this with an example:  All the three branches: - `if`: The field is in the `record_schema` and is part of the write schema. It will produce a `(0, IntegerWriter())` for the `0: status`. - `elif`: The field is not in the `record_schema`, but has a write default (we use this to write the `block_size_in_bytes` since it is required:  - `else`: The else is not there anymore, and this branch is taken for the `sequence_number` and `file_sequence_number` where the field is part of the `record_schema`, but not part of the `file_schema`. Therefore we don't need to write any null bytes. For the read-case, this is different, and we would need a reader since we need to skip over the data in the file, but for the write case, we can just ignore certain fields because they are not part of the `file_schema`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Build: Fix compiler warnings [iceberg]
nk1506 opened a new pull request, #8763: URL: https://github.com/apache/iceberg/pull/8763 Fixing few xlint related warnings. Before: https://github.com/apache/iceberg/assets/4146188/c3666c0d-f879-4dbc-8fe4-89ab91b93079";> After: https://github.com/apache/iceberg/assets/4146188/2e107c9d-796c-4556-b46b-0b49919ecd9b";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on code in PR #40: URL: https://github.com/apache/iceberg-python/pull/40#discussion_r1352346498 ## pyiceberg/avro/resolver.py: ## @@ -233,7 +256,93 @@ def skip(self, decoder: BinaryDecoder) -> None: pass -class SchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Reader]): +class WriteSchemaResolver(PrimitiveWithPartnerVisitor[IcebergType, Writer]): +def schema(self, write_schema: Schema, data_schema: Optional[IcebergType], result: Writer) -> Writer: +return result + +def struct(self, write_schema: StructType, data_struct: Optional[IcebergType], field_writers: List[Writer]) -> Writer: +if not isinstance(data_struct, StructType): +raise ResolveError(f"File/write schema are not aligned for struct, got {data_struct}") + +data_positions: Dict[int, int] = {field.field_id: pos for pos, field in enumerate(data_struct.fields)} +results: List[Tuple[Optional[int], Writer]] = [] + +for writer, write_field in zip(field_writers, write_schema.fields): +if write_field.field_id in data_positions: +results.append((data_positions[write_field.field_id], writer)) +else: +# There is a default value +if write_field.write_default is not None: +# The field is not in the record, but there is a write default value +results.append((None, DefaultWriter(writer=writer, value=write_field.write_default))) # type: ignore +elif write_field.required: +raise ValueError(f"Field is required, and there is no write default: {write_field}") Review Comment: Yes, you're right! This would apply to `file_ordinal` and `sort_columns`:  However, we don't write those. Updated the code and added a test-case: ```python def test_writer_missing_optional_in_read_schema() -> None: actual = resolve_writer( record_schema=Schema(), file_schema=Schema( NestedField(field_id=1, name="str", type=StringType(), required=False), ), ) expected = StructWriter(field_writers=((None, OptionWriter(option=OptionWriter(option=StringWriter(,)) assert actual == expected ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Disable merge-commit and enforce linear history [iceberg-python]
Fokko commented on code in PR #57: URL: https://github.com/apache/iceberg-python/pull/57#discussion_r1352350208 ## .asf.yaml: ## @@ -28,6 +28,16 @@ github: - apache - hacktoberfest - pyiceberg + enabled_merge_buttons: +merge: false +squash: true +rebase: trueB Review Comment: Oops, nice catch @liurenjie1024 ! 🙌 ## .asf.yaml: ## @@ -28,6 +28,16 @@ github: - apache - hacktoberfest - pyiceberg + enabled_merge_buttons: +merge: false +squash: true +rebase: trueB Review Comment: Oops, nice catch @liurenjie1024 ! 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix minor compilation warnings [iceberg]
nastra merged PR #8758: URL: https://github.com/apache/iceberg/pull/8758 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Optimize metadata tables? [iceberg]
ajantha-bhat commented on issue #8714: URL: https://github.com/apache/iceberg/issues/8714#issuecomment-1755285785 @RussellSpitzer and @aokolnychyi: WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1352477202 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). +Partition statistics are not required for reading or planning and readers may ignore them. +Each table snapshot may be associated with at most one partition statistic file. +A writer can optionally write the partition statistics file during each write operation, and +it must be registered in the table metadata file to be considered as a valid statistics file for the reader. + +`partition-statistics` field of table metadata is an optional list of struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +||||--|-| +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. | +| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of the partition statistics file. See [Partition Statistics file](#partition-statistics-file). | +| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the partition statistics file. | + + Partition Statistics file + +Statistics information for each unique partition tuple is stored as a row in the default data file format of the table (for example, Parquet or ORC). +These rows must be sorted (in ascending manner with NULL FIRST) by `partition` field to optimize filtering rows while scanning. + +The schema of the partition statistics file is as follows: + +| v1 | v2 | Field id, name | Type | Description | +||||--|-| +| _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table | +| _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id | +| _required_ | _required_ | **`3 data_record_count`** | `long` | Count of records in data files | +| _required_ | _required_ | **`4 data_file_count`** | `int` | Count of data files | +| _required_ | _required_ | **`5 total_data_file_size_in_bytes`** | `long` | Total size of data files in bytes | +| _optional_ | _optional_ | **`6 position_delete_record_count`** | `long` | Count of records in position delete files | +| _optional_ | _optional_ | **`7 position_delete_file_count`** | `int` | Count of position delete files | +| _optional_ | _optional_ | **`8 equality_delete_record_count`** | `long` | Count of records in equality delete files | +| _optional_ | _optional_ | **`9 equality_delete_file_count`** | `int` | Count of equality delete files | +| _optional_ | _optional_ | **`10 total_record_count`** | `long` | Accurate count of records in a partition after applying the delete files if any | Review Comment: You are right. That is why schema is kept optional. Implementation will not populate this by default (can be controlled by a property or the way of writing. For example, async write can compute it but not the incremental sync writes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1352482149 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). +Partition statistics are not required for reading or planning and readers may ignore them. +Each table snapshot may be associated with at most one partition statistic file. +A writer can optionally write the partition statistics file during each write operation, and +it must be registered in the table metadata file to be considered as a valid statistics file for the reader. + +`partition-statistics` field of table metadata is an optional list of struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +||||--|-| +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. | +| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of the partition statistics file. See [Partition Statistics file](#partition-statistics-file). | +| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the partition statistics file. | + + Partition Statistics file + +Statistics information for each unique partition tuple is stored as a row in the default data file format of the table (for example, Parquet or ORC). Review Comment: Russell gave a comment to explicitly mention the format type. I have removed the "default" word and reworded a bit. Implementation can take a call whether to use the default table's format or the one specified in a table property. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1352483805 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). +Partition statistics are not required for reading or planning and readers may ignore them. +Each table snapshot may be associated with at most one partition statistic file. +A writer can optionally write the partition statistics file during each write operation, and Review Comment: added with some rewording -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1352485101 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). Review Comment: True, changed to keep capital only for headers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1352490767 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). +Partition statistics are not required for reading or planning and readers may ignore them. +Each table snapshot may be associated with at most one partition statistic file. +A writer can optionally write the partition statistics file during each write operation, and +it must be registered in the table metadata file to be considered as a valid statistics file for the reader. + +`partition-statistics` field of table metadata is an optional list of struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +||||--|-| +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. | +| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of the partition statistics file. See [Partition Statistics file](#partition-statistics-file). | +| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the partition statistics file. | + + Partition Statistics file + +Statistics information for each unique partition tuple is stored as a row in the default data file format of the table (for example, Parquet or ORC). +These rows must be sorted (in ascending manner with NULL FIRST) by `partition` field to optimize filtering rows while scanning. + +The schema of the partition statistics file is as follows: + +| v1 | v2 | Field id, name | Type | Description | +||||--|-| +| _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table | +| _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id | Review Comment: We can discuss the interest from the community for the synchronous writes. Some of them might be intersted. Agree that we should first go with async implementation to make things easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1352492103 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). +Partition statistics are not required for reading or planning and readers may ignore them. +Each table snapshot may be associated with at most one partition statistic file. +A writer can optionally write the partition statistics file during each write operation, and +it must be registered in the table metadata file to be considered as a valid statistics file for the reader. + +`partition-statistics` field of table metadata is an optional list of struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +||||--|-| +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. | +| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of the partition statistics file. See [Partition Statistics file](#partition-statistics-file). | Review Comment: ok. Updated as suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on PR #7105: URL: https://github.com/apache/iceberg/pull/7105#issuecomment-1755391052 @aokolnychyi: Thanks for the detailed review and also going through the POC PRs. I have addressed all the comments. Please have a look again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1352500873 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). +Partition statistics are not required for reading or planning and readers may ignore them. +Each table snapshot may be associated with at most one partition statistic file. +A writer can optionally write the partition statistics file during each write operation, and +it must be registered in the table metadata file to be considered as a valid statistics file for the reader. + +`partition-statistics` field of table metadata is an optional list of struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +||||--|-| +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. | +| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of the partition statistics file. See [Partition Statistics file](#partition-statistics-file). | +| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the partition statistics file. | + + Partition Statistics file Review Comment: yes. updated. Also updated the header of Table statistics -> Table Statistics -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Optimize metadata tables? [iceberg]
RussellSpitzer commented on issue #8714: URL: https://github.com/apache/iceberg/issues/8714#issuecomment-1755413260 I don't see any particular reason but I also don't see any reason to change an existing public api here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Optimize metadata tables? [iceberg]
ajantha-bhat commented on issue #8714: URL: https://github.com/apache/iceberg/issues/8714#issuecomment-1755421403 > I don't see any particular reason but I also don't see any reason to change an existing public api here. 🙃🙃🙃 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Optimize metadata tables? [iceberg]
ajantha-bhat commented on issue #8714: URL: https://github.com/apache/iceberg/issues/8714#issuecomment-1755423504 > I don't see any particular reason but I also don't see any reason to change an existing public api here. My case for removing is to keep things simple and less metadata tables for users to understand and remember. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
ajantha-bhat commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1352528356 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). +Partition statistics are not required for reading or planning and readers may ignore them. +Each table snapshot may be associated with at most one partition statistic file. +A writer can optionally write the partition statistics file during each write operation, and +it must be registered in the table metadata file to be considered as a valid statistics file for the reader. + +`partition-statistics` field of table metadata is an optional list of struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +||||--|-| +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. | +| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of the partition statistics file. See [Partition Statistics file](#partition-statistics-file). | +| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the partition statistics file. | + + Partition Statistics file + +Statistics information for each unique partition tuple is stored as a row in the default data file format of the table (for example, Parquet or ORC). +These rows must be sorted (in ascending manner with NULL FIRST) by `partition` field to optimize filtering rows while scanning. + +The schema of the partition statistics file is as follows: + +| v1 | v2 | Field id, name | Type | Description | +||||--|-| +| _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table | +| _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id | Review Comment: Also, Trino is currently writing Puffin in both sync and async way. Dremio is also intersted in sync stats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add ASF DOAP rdf file [iceberg]
jbonofre commented on code in PR #8586: URL: https://github.com/apache/iceberg/pull/8586#discussion_r1352566918 ## doap.rdf: ## @@ -0,0 +1,55 @@ + + +http://usefulinc.com/ns/doap#"; + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; + xmlns:asfext="http://projects.apache.org/ns/asfext#"; + xmlns:foaf="http://xmlns.com/foaf/0.1/";> + + https://iceberg.apache.org";> +2023-09-14 +https://spdx.org/licenses/Apache-2.0"; /> +Apache Iceberg +https://iceberg.apache.org"; /> +https://iceberg.apache.org"; /> +Iceberg is a high-performance format for huge analytic tables. +Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. +https://github.com/apache/iceberg/issues"; /> +https://iceberg.apache.org/community/"; /> +https://iceberg.apache.org/releases/"; /> +Java +Python Review Comment: DOAP file only accept one Git repository, so I put `https://github.com/apache/iceberg`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add ASF DOAP rdf file [iceberg]
jbonofre commented on code in PR #8586: URL: https://github.com/apache/iceberg/pull/8586#discussion_r1352567808 ## doap.rdf: ## @@ -0,0 +1,55 @@ + + +http://usefulinc.com/ns/doap#"; + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; + xmlns:asfext="http://projects.apache.org/ns/asfext#"; + xmlns:foaf="http://xmlns.com/foaf/0.1/";> + + https://iceberg.apache.org";> +2023-09-14 +https://spdx.org/licenses/Apache-2.0"; /> +Apache Iceberg +https://iceberg.apache.org"; /> +https://iceberg.apache.org"; /> +Iceberg is a high-performance format for huge analytic tables. +Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. +https://github.com/apache/iceberg/issues"; /> +https://iceberg.apache.org/community/"; /> +https://iceberg.apache.org/releases/"; /> +Java +Python +https://projects.apache.org/category/big-data"; /> +https://projects.apache.org/category/database"; /> +https://projects.apache.org/category/data-engineering"; /> + + +1.3.1 Review Comment: I updated to 1.4.0 (NB: only one release is accepted in the DOAP, the latest). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add ASF DOAP rdf file [iceberg]
jbonofre commented on code in PR #8586: URL: https://github.com/apache/iceberg/pull/8586#discussion_r1352568422 ## doap.rdf: ## @@ -0,0 +1,55 @@ + + +http://usefulinc.com/ns/doap#"; + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; + xmlns:asfext="http://projects.apache.org/ns/asfext#"; + xmlns:foaf="http://xmlns.com/foaf/0.1/";> + + https://iceberg.apache.org";> +2023-09-14 +https://spdx.org/licenses/Apache-2.0"; /> +Apache Iceberg +https://iceberg.apache.org"; /> +https://iceberg.apache.org"; /> +Iceberg is a high-performance format for huge analytic tables. +Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. +https://github.com/apache/iceberg/issues"; /> +https://iceberg.apache.org/community/"; /> +https://iceberg.apache.org/releases/"; /> +Java +Python Review Comment: I added all languages in the DOAP, and only the "main" repository. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump slf4j from 1.7.36 to 2.0.9 [iceberg]
jbonofre commented on PR #8737: URL: https://github.com/apache/iceberg/pull/8737#issuecomment-1755484959 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump slf4j from 1.7.36 to 2.0.9 [iceberg]
jbonofre commented on PR #8737: URL: https://github.com/apache/iceberg/pull/8737#issuecomment-1755484513 It should work with most of the engines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump slf4j from 1.7.36 to 2.0.9 [iceberg]
dependabot[bot] commented on PR #8737: URL: https://github.com/apache/iceberg/pull/8737#issuecomment-1755485022 Sorry, only users with push access can use that command. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Upgrade to Apache Arrow 13.0.0 [iceberg]
Fokko commented on issue #8764: URL: https://github.com/apache/iceberg/issues/8764#issuecomment-1755497790 Curious why this hasn't been picked up by dependabot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Upgrade to Apache Arrow 13.0.0 [iceberg]
jbonofre commented on issue #8764: URL: https://github.com/apache/iceberg/issues/8764#issuecomment-1755501456 @Fokko I found a few dependencies not detected by dependabot. I'm doing the updates and checking why dependabot didn't find it :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Investigate why dependabot didn't detect upgrades [iceberg]
ajantha-bhat commented on issue #8764: URL: https://github.com/apache/iceberg/issues/8764#issuecomment-1755583628 @snazy: Hi, Do you already have analyzed or have an info on this dependabot + version catalog problems? Do you recommend [renovateBot](https://github.com/renovatebot) to address this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Investigate why dependabot didn't detect upgrades [iceberg]
Fokko commented on issue #8764: URL: https://github.com/apache/iceberg/issues/8764#issuecomment-1755598241 I think this is because we limit to 5 PRs: https://github.com/apache/iceberg/blob/master/.github/dependabot.yml#L32 It looks like all five are open: https://github.com/apache/iceberg/pulls?q=is%3Apr+is%3Aopen+label%3Adependencies I would be in favor of just removing this limit. I would just set this to -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Allow missing object in ErrorResponse [iceberg]
Fokko commented on PR #8760: URL: https://github.com/apache/iceberg/pull/8760#issuecomment-1755602836 Let's do this the other way around -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Allow missing object in ErrorResponse [iceberg]
Fokko closed pull request #8760: Core: Allow missing object in ErrorResponse URL: https://github.com/apache/iceberg/pull/8760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Open-API: Make error required [iceberg]
Fokko opened a new pull request, #8765: URL: https://github.com/apache/iceberg/pull/8765 I think we want to make `error` required, otherwise it would just be an empty document `{}`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Investigate why dependabot didn't detect upgrades [iceberg]
jbonofre commented on issue #8764: URL: https://github.com/apache/iceberg/issues/8764#issuecomment-1755627197 @Fokko yes, I think we use the 5 PRs pool. +1 to upgrade. I'm doing it in a PR attached to this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix missing semicolons in SQL snippets. [iceberg]
Fokko commented on code in PR #8748: URL: https://github.com/apache/iceberg/pull/8748#discussion_r1352755607 ## docs/spark-getting-started.md: ## @@ -69,7 +69,7 @@ To create your first Iceberg table in Spark, use the `spark-sql` shell or `spark ```sql -- local is the path-based catalog defined above -CREATE TABLE local.db.table (id bigint, data string) USING iceberg +CREATE TABLE local.db.table (id bigint, data string) USING iceberg; Review Comment: Would be even better to have some kind of linter for this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Unable to write to iceberg table using spark [iceberg]
RussellSpitzer commented on issue #8419: URL: https://github.com/apache/iceberg/issues/8419#issuecomment-1755633933 Pyspark I think has some issues with setting "packages" in the Spark conf since the py4j execution means that the Spark Context has to be started a bit weirdly. I would try use --packages on the cli instead of configuring within the context to see what happens. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix missing semicolons in SQL snippets. [iceberg]
Fokko commented on PR #8748: URL: https://github.com/apache/iceberg/pull/8748#issuecomment-1755637911 Great work @Priyansh121096 If we find more we can create a new PR. (I also noticed that some blocks start with: ``` ```SQL ``` For consistency it would be nice to have everything lowercase, but I think that works as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Fix missing semicolons in SQL snippets. [iceberg]
Fokko merged PR #8748: URL: https://github.com/apache/iceberg/pull/8748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Use more permissive check when registering existing table [iceberg]
Fokko merged PR #8759: URL: https://github.com/apache/iceberg/pull/8759 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump slf4j from 1.7.36 to 2.0.9 [iceberg]
nastra commented on PR #8737: URL: https://github.com/apache/iceberg/pull/8737#issuecomment-1755681588 @jbonofre there's an issue with Spark that needs some investigation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Parquet.write to S3 with GlueCatalog requires commit [iceberg]
djchapm opened a new issue, #8767: URL: https://github.com/apache/iceberg/issues/8767 ### Feature Request / Improvement Hi, writing this in an effort to improve documentation - I spent a crazy amount of time writing to glue catalog and parquet-avro files in S3 with Iceberg, but could never query the data using Athena. I thought it had to do with all the missing metadata on the glue tables - but this was a red herring. Problem was that writing files does not automatically update metadata. According to the API, if you use Table.io():  This made me think using an OutputFile via Table.io() would update metadata. My usage: ``` OutputFile outputFile = table.io().newOutputFile(location); appenderLocation.put(messageType, location); FileAppender appender = Parquet.write(outputFile) .forTable(table) .setAll(propsBuilder) .createWriterFunc(ParquetAvroWriter::buildWriter) .build(); ``` On closing the appender - the file writes but there are no updates to metadata. My table is from GlueCatalog.loadTable(). I'm new - but I could not find anywhere that you have to then lookup the file again as an InputFile, create a transcation on the table and commit it: ``` log.info("Closing appender for message type {}", key); value.close(); //Appender from above //one attempt, does nothing: // tables.get(key).rewriteManifests(); log.info("Commiting {} file {}", key, appenderLocation.get(key)); InputFile inputFile = tables.get(key).io().newInputFile(appenderLocation.get(key)); DataFile dataFile = DataFiles.builder(tables.get(key).spec()) .withInputFile(inputFile) .withMetrics(value.metrics()) .withFormat(FileFormat.PARQUET) .build(); Transaction t = tables.get(key).newTransaction(); t.newAppend().appendFile(dataFile).commit(); // commit all changes to the table t.commitTransaction(); ``` So would like improvements with respect to documentation and AWS integration for writing Parquet data using GlueCatalog. Or at least a test or example people could follow for writing files and updating corresponding catalog metadata using public APIs (Junits do all kinds of metadata updates but with protected APIs we cannot access). Let me know your thoughts. ### Query engine Athena -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
ajantha-bhat commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352796153 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) +.operation(Operation.Put.of(key, newTable)) Review Comment: Oh wait, https://github.com/projectnessie/nessie/pull/6438 says that we may still need expected content for V1. But the testcases are passing. @snazy, @dimas-b : WDYT? Can it be removed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
ajantha-bhat commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352798678 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) +.operation(Operation.Put.of(key, newTable)) Review Comment: Since that API is a deprecated usage, Author of the PR has fixed it with another API I guess. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump slf4j from 1.7.36 to 2.0.9 [iceberg]
jbonofre commented on PR #8737: URL: https://github.com/apache/iceberg/pull/8737#issuecomment-1755700578 @nastra on a specific Spark version or any ? I can take a look if you want :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Add note about running tests/itests on MacOS [iceberg]
ajantha-bhat commented on PR #8766: URL: https://github.com/apache/iceberg/pull/8766#issuecomment-1755702793 > LGTM, but would be great if somebody with OSX could confirm this I can confirm. I use MAC. Not just for Iceberg project, any project that uses `TestContainers` on MAC will throw an error for running tests that "Could not find a valid Docker environment." I used to google and use the command from below answer. Which matches the doc update. https://stackoverflow.com/questions/61108655/test-container-test-cases-are-failing-due-to-could-not-find-a-valid-docker-envi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Add note about running tests/itests on MacOS [iceberg]
jbonofre commented on PR #8766: URL: https://github.com/apache/iceberg/pull/8766#issuecomment-1755703566 @nastra actually it's the workaround I have to do on my Mac :) I'm using MacOS (tested both on 13 & 14 with Docker Desktop) on M1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: increase open-pull-requests-limit to 50 [iceberg]
Fokko commented on code in PR #8768: URL: https://github.com/apache/iceberg/pull/8768#discussion_r1352807655 ## .github/dependabot.yml: ## @@ -28,6 +28,6 @@ updates: directory: "/" schedule: interval: "weekly" - day: "sunday" -open-pull-requests-limit: 5 + day: "wednesday" Review Comment: We went for Sunday initially to not queue the CI during workdays. I think once the PR gets merged, it will retrigger anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: increase open-pull-requests-limit to 50 [iceberg]
jbonofre commented on code in PR #8768: URL: https://github.com/apache/iceberg/pull/8768#discussion_r1352811489 ## .github/dependabot.yml: ## @@ -28,6 +28,6 @@ updates: directory: "/" schedule: interval: "weekly" - day: "sunday" -open-pull-requests-limit: 5 + day: "wednesday" Review Comment: OK, let me back on Sunday. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump slf4j from 1.7.36 to 2.0.9 [iceberg]
nastra commented on PR #8737: URL: https://github.com/apache/iceberg/pull/8737#issuecomment-1755713705 https://github.com/apache/iceberg/actions/runs/6455472887/job/17523030796?pr=8737 contains a CI run with failures -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Upgrade to gradle 8.4 [iceberg]
jbonofre commented on PR #8486: URL: https://github.com/apache/iceberg/pull/8486#issuecomment-1755722580 Unfortunately `gradle-revapi-plugin` doesn't seem super active (https://github.com/palantir/gradle-revapi). I think it's important to be up to date in regards of Gradle. I will propose a fix to `gradle-revapi`. If the change is not merge and/or it's hard to have a new release, I will propose an alternative plan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
dimas-b commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352823020 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) Review Comment: This change LGTM, but it's a non-trivial change in the Nessie Catalog, certainly not a simple "compiler warning" kind of change... Would you mind moving it into a separate PR for the sake of clarity? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
dimas-b commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352824568 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) +.operation(Operation.Put.of(key, newTable)) Review Comment: This change LGTM, but it's a non-trivial change in the Nessie Catalog, certainly not a simple "compiler warning" kind of change... @nk1506 : Would you mind moving it into a separate PR for the sake of clarity? ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) Review Comment: This change LGTM, but it's a non-trivial change in the Nessie Catalog, certainly not a simple "compiler warning" kind of change... Would you mind moving it into a separate PR for the sake of clarity? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
ajantha-bhat commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352827194 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) +.operation(Operation.Put.of(key, newTable)) Review Comment: +1 for separate PR. I think we can even refactor the method to not pass `expectedContent` but just pass contentID as the whole content is unused now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
ajantha-bhat commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352827194 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) +.operation(Operation.Put.of(key, newTable)) Review Comment: +1 for separate PR. I think we can even refactor the `commitTable` method to not accept `expectedContent` but just accept contentID as the whole content is unused now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
dimas-b commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352831841 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) +.operation(Operation.Put.of(key, newTable)) Review Comment: Please reference https://github.com/projectnessie/nessie/pull/6438 as the rationale for removing the third parameter here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
dimas-b commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352831841 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) +.operation(Operation.Put.of(key, newTable)) Review Comment: Please reference https://github.com/projectnessie/nessie/pull/6438 as the rationale for removing the third parameter in the new PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Upgrade to gradle 8.4 [iceberg]
ajantha-bhat commented on PR #8486: URL: https://github.com/apache/iceberg/pull/8486#issuecomment-1755743335 > Unfortunately gradle-revapi-plugin doesn't seem super active (https://github.com/palantir/gradle-revapi). I think it's important to be up to date in regards of Gradle. I will propose a fix to gradle-revapi. If the change is not merge and/or it's hard to have a new release, I will propose an alternative plan. totally agree, worst case we can drop rev-api and find alternatives. But we can't keep gradle out of date. cc: @nastra, @Fokko, @danielcweeks, @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
dimas-b commented on code in PR #8763: URL: https://github.com/apache/iceberg/pull/8763#discussion_r1352835064 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -477,7 +477,7 @@ public void commitTable( Branch branch = getApi() .commitMultipleOperations() -.operation(Operation.Put.of(key, newTable, expectedContent)) +.operation(Operation.Put.of(key, newTable)) Review Comment: I believe Nessie API v1 only needs to support older clients to serialize the `expectedContent` parameters in JSON. Actual values are not used in Nessie Servers 0.54.0 and later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Upsert support for keyless Apache Flink tables [iceberg]
Ge commented on issue #8719: URL: https://github.com/apache/iceberg/issues/8719#issuecomment-1755752149 `SELECT word, COUNT(*) FROM word_table GROUP BY word;` is the retract stream: ``` Flink SQL> SELECT word, COUNT(*) FROM word_table GROUP BY word; +++--+ | op | word | EXPR$1 | +++--+ | +I | 6 |1 | | +I | 8 |1 | | +I | f |1 | | +I | c |1 | | +I | b |1 | | -U | 8 |1 | | +U | 8 |2 | | +I | 1 |1 | | +I | a |1 | | -U | 8 |2 | | +U | 8 |3 | | -U | 6 |1 | | +U | 6 |2 | | +I | 9 |1 | | +I | e |1 | ``` Can you please elaborate on what is missing @pvary ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: increase open-pull-requests-limit to 50 [iceberg]
jbonofre commented on PR #8768: URL: https://github.com/apache/iceberg/pull/8768#issuecomment-1755758543 @ajantha-bhat maybe we have 5 pending PRs not closed/merged, so blocking any new PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: increase open-pull-requests-limit to 50 [iceberg]
ajantha-bhat commented on PR #8768: URL: https://github.com/apache/iceberg/pull/8768#issuecomment-1755762122 > @ajantha-bhat maybe we have 5 pending PRs not closed/merged, so blocking any new PR. Yeah, anyways this change will definitely give a clarity if that was the problem. So, a huge +1 for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Document all metadata tables. [iceberg]
nastra merged PR #8709: URL: https://github.com/apache/iceberg/pull/8709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Document all metadata tables [iceberg]
nastra closed issue #757: Document all metadata tables URL: https://github.com/apache/iceberg/issues/757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Construct a writer tree [iceberg-python]
Fokko commented on PR #40: URL: https://github.com/apache/iceberg-python/pull/40#issuecomment-1755767706 Forgot to push, just pushed the latest changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Disable merge-commit and enforce linear history [iceberg-python]
rdblue merged PR #57: URL: https://github.com/apache/iceberg-python/pull/57 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Unable to write to iceberg table using spark [iceberg]
di2mot commented on issue #8419: URL: https://github.com/apache/iceberg/issues/8419#issuecomment-1755796639 This works for me in general it works: ``` ("spark.jars.packages", "org.apache.iceberg:iceberg-spark3:0.11.0"), ("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"), ("spark.sql.catalog.iceberg", "org.apache.iceberg.spark.spark.SparkCatalog"), ("spark.sql.catalog.iceberg.catalog.iceberg.type", "hadoop"), ("spark.sql.catalog.iceberg.warehouse", self.path) ``` But it when I use local, not in docker/kuber, and on the server, on the Airflow we use this one without spark.jars.packages: ``` ... ("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"), ("spark.sql.catalog.iceberg", "org.apache.iceberg.spark.spark.SparkCatalog"), ("spark.sql.catalog.iceberg.catalog.iceberg.type", "hadoop"), ("spark.sql.catalog.iceberg.warehouse", self.path) ... ``` Becouse we add them in .yaml file: packages: - org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1 - org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] push down min/max/count to iceberg [iceberg]
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1755857764 @huaxingao I was executing max/count query on iceberg table version 1.3.0 and Spark3.3.1 but unable to see aggregate pushdown i.e. LocalTableScan Cc: @RussellSpitzer `spark.sql(f""" select max(page_view_dtm) from schema.table1where page_view_dtm between '2020-01-01 00:00:00' and '2021-12-31 23:59:59' """).explain()` and explain plan generated is ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[], functions=[max(page_view_dtm#139)]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=62] +- HashAggregate(keys=[], functions=[partial_max(page_view_dtm#139)]) +- Filter ((page_view_dtm#139 >= 2020-01-01 00:00:00) AND (page_view_dtm#139 <= 2021-12-31 23:59:59)) +- BatchScan[page_view_dtm#139] spark_catalog.schema.table1(branch=null) [filters=page_view_dtm IS NOT NULL, page_view_dtm >= 15778548, page_view_dtm <= 164101319900, groupedBy=] RuntimeFilters: [] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
aokolnychyi commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1353203145 ## format/spec.md: ## @@ -702,6 +703,58 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). +Partition statistics are not required for reading or planning and readers may ignore them. +Each table snapshot may be associated with at most one partition statistic file. +A writer can optionally write the partition statistics file during each write operation, and +it must be registered in the table metadata file to be considered as a valid statistics file for the reader. + +`partition-statistics` field of table metadata is an optional list of struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +||||--|-| +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. | +| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of the partition statistics file. See [Partition Statistics file](#partition-statistics-file). | +| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the partition statistics file. | + + Partition Statistics file + +Statistics information for each unique partition tuple is stored as a row in the default data file format of the table (for example, Parquet or ORC). +These rows must be sorted (in ascending manner with NULL FIRST) by `partition` field to optimize filtering rows while scanning. + +The schema of the partition statistics file is as follows: + +| v1 | v2 | Field id, name | Type | Description | +||||--|-| +| _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table | +| _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id | +| _required_ | _required_ | **`3 data_record_count`** | `long` | Count of records in data files | +| _required_ | _required_ | **`4 data_file_count`** | `int` | Count of data files | +| _required_ | _required_ | **`5 total_data_file_size_in_bytes`** | `long` | Total size of data files in bytes | +| _optional_ | _optional_ | **`6 position_delete_record_count`** | `long` | Count of records in position delete files | +| _optional_ | _optional_ | **`7 position_delete_file_count`** | `int` | Count of position delete files | +| _optional_ | _optional_ | **`8 equality_delete_record_count`** | `long` | Count of records in equality delete files | +| _optional_ | _optional_ | **`9 equality_delete_file_count`** | `int` | Count of equality delete files | +| _optional_ | _optional_ | **`10 total_record_count`** | `long` | Accurate count of records in a partition after applying the delete files if any | Review Comment: This makes sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Add partition stats spec [iceberg]
aokolnychyi commented on PR #7105: URL: https://github.com/apache/iceberg/pull/7105#issuecomment-1756079392 I added this PR to our community sync. I am not sure I will be there this week but I'll sync with Russell and Yufei afterwards. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Fix column rename doc example to reflect correct API [iceberg-python]
cabhishek opened a new pull request, #59: URL: https://github.com/apache/iceberg-python/pull/59 * Rename column example in [this](https://py.iceberg.apache.org/api/#rename-column) doc is incorrect. * This PR updates the example to use `update.rename_column(...)` instead of `update.rename(...)`. After PR ``` with table.update_schema() as update: update.rename_column("retries", "num_retries") # This will rename `confirmed_by` to `exchange` update.rename_column("properties.confirmed_by", "exchange") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Avro: Add Avro-assisted name mapping [iceberg]
wmoustafa commented on code in PR #7392: URL: https://github.com/apache/iceberg/pull/7392#discussion_r1353271608 ## core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java: ## @@ -93,14 +94,23 @@ private static T visitRecord( private static T visitUnion( P type, Schema union, AvroWithPartnerByStructureVisitor visitor) { List types = union.getTypes(); -Preconditions.checkArgument( -AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union); List options = Lists.newArrayListWithExpectedSize(types.size()); -for (Schema branch : types) { - if (branch.getType() == Schema.Type.NULL) { -options.add(visit(visitor.nullType(), branch, visitor)); - } else { -options.add(visit(type, branch, visitor)); +if (AvroSchemaUtil.isOptionSchema(union)) { + for (Schema branch : types) { +if (branch.getType() == Schema.Type.NULL) { + options.add(visit(visitor.nullType(), branch, visitor)); +} else { + options.add(visit(type, branch, visitor)); +} + } +} else { + List nonNullTypes = + types.stream().filter(t -> t.getType() != Schema.Type.NULL).collect(Collectors.toList()); + for (int i = 0; i < nonNullTypes.size(); i++) { +// In the case of complex union, the corresponding "type" is a struct. Non-null type i in +// the union maps to struct filed i + 1 because the first struct field is the "tag". +options.add( +visit(visitor.fieldNameAndType(type, i + 1).second(), nonNullTypes.get(i), visitor)); Review Comment: Visited null but did not add to the returned options since it does not correspond to a struct field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Avro: Add Avro-assisted name mapping [iceberg]
wmoustafa commented on PR #7392: URL: https://github.com/apache/iceberg/pull/7392#issuecomment-1756153921 > I think this is ready. Just a few minor updates needed; mostly https://github.com/apache/iceberg/pull/7392/files#r1224853756. Addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Data: Support reading default values from generic Avro readers [iceberg]
wmoustafa commented on code in PR #6004: URL: https://github.com/apache/iceberg/pull/6004#discussion_r1353272532 ## .palantir/revapi.yml: ## @@ -451,6 +451,15 @@ acceptedBreaks: - code: "java.field.removedWithConstant" old: "field org.apache.iceberg.TableProperties.HMS_TABLE_OWNER" justification: "Removing deprecations for 1.3.0" +- code: "java.method.numberOfParametersChanged" + old: "method void org.apache.iceberg.avro.ValueReaders.StructReader::(java.util.List>,\ +\ org.apache.iceberg.types.Types.StructType, java.util.Map)" + new: "method void org.apache.iceberg.avro.ValueReaders.StructReader::(java.util.List>,\ Review Comment: Yes. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Data: Support reading default values from generic Avro readers [iceberg]
wmoustafa commented on PR #6004: URL: https://github.com/apache/iceberg/pull/6004#issuecomment-1756154580 > @wmoustafa, looks like there are test failures. Can you take a look? Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org