Re: [PR] Enable more error-prone checks [iceberg]

2024-09-04 Thread via GitHub
findepi commented on PR #11078: URL: https://github.com/apache/iceberg/pull/11078#issuecomment-2330691491 as they saying goes -- two heads are better than one 🙁 thanks again for review! -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Enable more error-prone checks [iceberg]

2024-09-04 Thread via GitHub
findepi commented on PR #11078: URL: https://github.com/apache/iceberg/pull/11078#issuecomment-2330676932 thanks @sfc-gh-ygu . also cc @flyrain for potential approve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Materialized View Spec [iceberg]

2024-09-04 Thread via GitHub
JanKaul commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1744845733 ## format/view-spec.md: ## @@ -158,6 +173,59 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when t

Re: [PR] Materialized View Spec [iceberg]

2024-09-04 Thread via GitHub
JanKaul commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1744844818 ## format/view-spec.md: ## @@ -81,9 +93,12 @@ Each version in `versions` is a struct with the following fields: | _required_ | `representations` | A list of [re

Re: [PR] Materialized View Spec [iceberg]

2024-09-04 Thread via GitHub
bennychow commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1744837644 ## format/view-spec.md: ## @@ -81,9 +93,12 @@ Each version in `versions` is a struct with the following fields: | _required_ | `representations` | A list of [

Re: [PR] Iceberg/Comet integration POC [iceberg]

2024-09-04 Thread via GitHub
PaulLiang1 commented on PR #9841: URL: https://github.com/apache/iceberg/pull/9841#issuecomment-2330600257 > @PaulLiang1 Thanks! I'll check with my colleague tomorrow to find out where we are in the binary release process. got it, thanks for letting me know. please feel free to let us

Re: [PR] Updating SparkScan to only read Apache DataSketches [iceberg]

2024-09-04 Thread via GitHub
jeesou commented on PR #11035: URL: https://github.com/apache/iceberg/pull/11035#issuecomment-2330597057 Hi @karuppayya , @aokolnychyi , @huaxingao kindly review this PR once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Spark 3.5: Mandate identifier fields when create_changelog_view for table contain unsortable columns [iceberg]

2024-09-04 Thread via GitHub
karuppayya commented on code in PR #11045: URL: https://github.com/apache/iceberg/pull/11045#discussion_r1744645623 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/CreateChangelogViewProcedure.java: ## @@ -146,10 +147,16 @@ public InternalRow[] call(Interna

Re: [PR] Build: Fix BrotliCodec class not found failure when using brotli as compression codec [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7531: URL: https://github.com/apache/iceberg/pull/7531#issuecomment-2330357771 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Spark 3.4: Time range rewrite data files [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7460: URL: https://github.com/apache/iceberg/pull/7460#issuecomment-2330357754 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] AWS: Add SQS MetricsReporter [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7444: URL: https://github.com/apache/iceberg/pull/7444#issuecomment-2330357732 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Core, Spark: Incremental scan return empty when start timestamp equals the end [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7435: URL: https://github.com/apache/iceberg/pull/7435#issuecomment-2330357706 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Flink 1.19: Run without Hadoop [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7369: URL: https://github.com/apache/iceberg/pull/7369#issuecomment-2330357652 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Flink 1.19: Run without Hadoop [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7369: Flink 1.19: Run without Hadoop URL: https://github.com/apache/iceberg/pull/7369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Allow sparksql to override target split size with session property [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7430: Allow sparksql to override target split size with session property URL: https://github.com/apache/iceberg/pull/7430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Allow sparksql to override target split size with session property [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7430: URL: https://github.com/apache/iceberg/pull/7430#issuecomment-2330357682 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Flink 1.17: Supports batch queries using time ranges [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7362: URL: https://github.com/apache/iceberg/pull/7362#issuecomment-2330357632 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Fix IcebergGenerics::read to read metadata tables [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7352: URL: https://github.com/apache/iceberg/pull/7352#issuecomment-2330357613 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] [AWS] S3AsyncFileIO Client integration [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7318: [AWS] S3AsyncFileIO Client integration URL: https://github.com/apache/iceberg/pull/7318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Core: Fanout equality/position delete writer [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7313: URL: https://github.com/apache/iceberg/pull/7313#issuecomment-2330357516 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Support case insensitive id assignment for applyNameMapping when reading parquet [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7299: URL: https://github.com/apache/iceberg/pull/7299#issuecomment-2330357467 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Spark: Show Create Round trip tests [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7300: URL: https://github.com/apache/iceberg/pull/7300#issuecomment-2330357497 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Support case insensitive id assignment for applyNameMapping when reading parquet [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7299: Support case insensitive id assignment for applyNameMapping when reading parquet URL: https://github.com/apache/iceberg/pull/7299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Flink: support table comment [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7236: Flink: support table comment URL: https://github.com/apache/iceberg/pull/7236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Spark 3.3: SQL Extensions for CREATE BRANCH AS OF TAG [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7294: URL: https://github.com/apache/iceberg/pull/7294#issuecomment-2330357416 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Support timestamp type in partition string when importing files [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7291: URL: https://github.com/apache/iceberg/pull/7291#issuecomment-2330357395 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] [Parquet] Eagerly fetch row groups when reading parquet [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7279: [Parquet] Eagerly fetch row groups when reading parquet URL: https://github.com/apache/iceberg/pull/7279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Spark 3.3: drop_namespace with CASCADE support [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7275: URL: https://github.com/apache/iceberg/pull/7275#issuecomment-2330357345 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Flink: support table comment [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7236: URL: https://github.com/apache/iceberg/pull/7236#issuecomment-2330357291 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] [Draft][HiveCatalog] Skip updating column schema when filed schema string is larger than maxHiveTablePropertySize [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7222: [Draft][HiveCatalog] Skip updating column schema when filed schema string is larger than maxHiveTablePropertySize URL: https://github.com/apache/iceberg/pull/7222 -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Fix for Drop SQL issue when attempting to drop an Iceberg table [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7228: Fix for Drop SQL issue when attempting to drop an Iceberg table URL: https://github.com/apache/iceberg/pull/7228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Updated python-integration.yml [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7210: Updated python-integration.yml URL: https://github.com/apache/iceberg/pull/7210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Updated python-integration.yml [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7210: URL: https://github.com/apache/iceberg/pull/7210#issuecomment-2330357183 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Build: Remove services files introduced by third-party jars [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7209: Build: Remove services files introduced by third-party jars URL: https://github.com/apache/iceberg/pull/7209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Refactoring metadata location and adding API to get data and metadata location #7187 [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7188: URL: https://github.com/apache/iceberg/pull/7188#issuecomment-2330357154 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Refactoring metadata location and adding API to get data and metadata location #7187 [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7188: Refactoring metadata location and adding API to get data and metadata location #7187 URL: https://github.com/apache/iceberg/pull/7188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Core: Add metrics reporter for serializable table [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7144: URL: https://github.com/apache/iceberg/pull/7144#issuecomment-2330357123 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Core, Spark: Fix delete with filter on nested columns [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7132: URL: https://github.com/apache/iceberg/pull/7132#issuecomment-2330357106 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Spec: metadata file (-.metadata.json) naming convention [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7107: Spec: metadata file (-.metadata.json) naming convention URL: https://github.com/apache/iceberg/pull/7107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Core, Spark: Fix delete with filter on nested columns [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] closed pull request #7132: Core, Spark: Fix delete with filter on nested columns URL: https://github.com/apache/iceberg/pull/7132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Spec: metadata file (-.metadata.json) naming convention [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7107: URL: https://github.com/apache/iceberg/pull/7107#issuecomment-2330357076 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Spark: Close auto broadcast join in delete orphan action [iceberg]

2024-09-04 Thread via GitHub
github-actions[bot] commented on PR #7096: URL: https://github.com/apache/iceberg/pull/7096#issuecomment-2330357061 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Schema Evolution RecordBatch processor [iceberg-rust]

2024-09-04 Thread via GitHub
sdd commented on code in PR #602: URL: https://github.com/apache/iceberg-rust/pull/602#discussion_r1744602811 ## crates/iceberg/src/arrow/record_batch_evolution_processor.rs: ## @@ -0,0 +1,408 @@ +use std::sync::Arc; + +use arrow::compute::cast; +use arrow_array::{ +Array as

[PR] Schema Evolution RecordBatch processor [iceberg-rust]

2024-09-04 Thread via GitHub
sdd opened a new pull request, #602: URL: https://github.com/apache/iceberg-rust/pull/602 Addresses parts 2 and 3 of https://github.com/apache/iceberg-rust/issues/405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Inconsistent row count across versions [iceberg-python]

2024-09-04 Thread via GitHub
dev-goyal commented on issue #1132: URL: https://github.com/apache/iceberg-python/issues/1132#issuecomment-2329832230 No position deletes, it has equality deletes! Exact delete query: ``` res = spark.sql( f"DELETE FROM ml_recommendations.users_v2 AS users WHERE EXISTS ("

Re: [I] Inconsistent row count across versions [iceberg-python]

2024-09-04 Thread via GitHub
sungwy commented on issue #1132: URL: https://github.com/apache/iceberg-python/issues/1132#issuecomment-2329826804 Hi @dev-goyal thank you for raising this issue, that looks like a critical issue we want to resolve asap. Based on what you mentioned here: > Can confirm that the .plan_

Re: [PR] Doc: Add warning for create_changelog_view when columns are unorderable [iceberg]

2024-09-04 Thread via GitHub
karuppayya commented on PR #11045: URL: https://github.com/apache/iceberg/pull/11045#issuecomment-2329789217 @dramaticlly Should we also validate the schema for un-orderable in the procedure and throw a `org.apache.iceberg.exceptions.ValidationException`? -- This is an automated message f

Re: [I] Inconsistent row count across versions [iceberg-python]

2024-09-04 Thread via GitHub
dev-goyal commented on issue #1132: URL: https://github.com/apache/iceberg-python/issues/1132#issuecomment-2329781521 Nothing too exotic in the `row_filter` btw: `'row_filter': And(left=GreaterThanOrEqual(term=Reference(name='last_session'), literal=literal('2024-06-06T00:00:00+00:0

Re: [PR] Doc: Add warning for create_changelog_view when columns are unorderable [iceberg]

2024-09-04 Thread via GitHub
dramaticlly commented on PR #11045: URL: https://github.com/apache/iceberg/pull/11045#issuecomment-2329762019 Had some offline discussion with @flyrain and decided to update documentation instead of the original code change, given the [row lineage proposal in v3](https://docs.google.com/do

Re: [PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2024-09-04 Thread via GitHub
jacobmarble commented on PR #9008: URL: https://github.com/apache/iceberg/pull/9008#issuecomment-2329506863 Thank you for helping us get across the finish line @rdblue! Thank you for all the effort reviewing @nastra @Fokko @amogh-jahagirdar! -- This is an automated message from the Apac

Re: [PR] fix: fixing tests to work with s3Express [iceberg]

2024-09-04 Thread via GitHub
fuatbasik commented on code in PR #11021: URL: https://github.com/apache/iceberg/pull/11021#discussion_r1743477834 ## aws/src/integration/java/org/apache/iceberg/aws/AwsIntegTestUtil.java: ## @@ -106,7 +109,7 @@ public static String testMultiRegionAccessPointAlias() { retur

Re: [PR] feat: support create partition table for non REST catalog [iceberg-rust]

2024-09-04 Thread via GitHub
liurenjie1024 commented on code in PR #577: URL: https://github.com/apache/iceberg-rust/pull/577#discussion_r1743392719 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -301,12 +302,7 @@ impl TableMetadataBuilder { } = table_creation; let partition_specs =

Re: [PR] feat: support create partition table for non REST catalog [iceberg-rust]

2024-09-04 Thread via GitHub
liurenjie1024 commented on PR #577: URL: https://github.com/apache/iceberg-rust/pull/577#issuecomment-2328362065 > > Thanks @FANNG1 for this pr. However I think there are some prepartion work before we can actually finished this pr. If we can narrow down the goal of this pr to change type f

Re: [PR] feat: support projection pushdown for datafusion iceberg [iceberg-rust]

2024-09-04 Thread via GitHub
liurenjie1024 commented on code in PR #594: URL: https://github.com/apache/iceberg-rust/pull/594#discussion_r1743337943 ## crates/integrations/datafusion/src/physical_plan/scan.rs: ## @@ -138,3 +156,18 @@ async fn get_batch_stream( Ok(Box::pin(stream)) } + +fn get_column

Re: [I] Regression in 0.7.0 due to type coercion from "string" to "large_string" [iceberg-python]

2024-09-04 Thread via GitHub
maxfirman commented on issue #1128: URL: https://github.com/apache/iceberg-python/issues/1128#issuecomment-2328309497 Thanks @kevinjqliu. I can confirm that the workaround resolves the problem when using latest main branch but not v0.7.0 or v0.7.1. Setting `PYARROW_USE_LARGE_TYPES_ON

[I] AWS Glue Apache Iceberg Data Recovery [iceberg]

2024-09-04 Thread via GitHub
SamRaza356 opened a new issue, #11077: URL: https://github.com/apache/iceberg/issues/11077 ### Query engine AWS ATHENA ### Question Done full migration iceberg table into another isolated table. Issue: Deletion done in (first) iceberg table does'nt reflects on iceberg

Re: [PR] Spark: Deprecate SparkAppenderFactory [iceberg]

2024-09-04 Thread via GitHub
ajantha-bhat commented on code in PR #11076: URL: https://github.com/apache/iceberg/pull/11076#discussion_r1743221318 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkAppenderFactory.java: ## @@ -48,6 +48,10 @@ import org.apache.spark.sql.types.StructType;

Re: [I] CLI list not working [iceberg-python]

2024-09-04 Thread via GitHub
TiansuYu commented on issue #1122: URL: https://github.com/apache/iceberg-python/issues/1122#issuecomment-2328085721 I would say `load_catalog` lazily until actually needed, in the main command. -- This is an automated message from the Apache Git Service. To respond to the message, plea