Re: [I] Calling `rewrite_position_delete_files` fails on tables with more than 1k columns [iceberg]

2024-03-21 Thread via GitHub
xiaoxuandev commented on issue #9923: URL: https://github.com/apache/iceberg/issues/9923#issuecomment-2011383283 From the stack trace, essentially the error is caused by duplicated keys put into the ImmutableMap.Builder. But looking at the implementation, ``` public Map byId() {

Re: [PR] [core] fix #9997 - Handle s3a file upload interrupt which results in table metadata pointing to files that doesn't exist [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #9998: URL: https://github.com/apache/iceberg/pull/9998#discussion_r1533362669 ## core/src/test/java/org/apache/iceberg/hadoop/HadoopStreamsTest.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [PR] [core] fix #9997 - Handle s3a file upload interrupt which results in table metadata pointing to files that doesn't exist [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #9998: URL: https://github.com/apache/iceberg/pull/9998#discussion_r1533366925 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopStreams.java: ## @@ -185,8 +185,21 @@ public void flush() throws IOException { @Override public void cl

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533369953 ## core/src/main/java/org/apache/iceberg/BaseMetastoreOperations.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533370990 ## core/src/main/java/org/apache/iceberg/BaseMetastoreOperations.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533372773 ## core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java: ## @@ -291,6 +284,8 @@ public long newSnapshotId() { }; } + /** @deprecated Use

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533376150 ## core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java: ## @@ -309,65 +304,39 @@ protected enum CommitStatus { * @return Commit Status of Succe

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533376457 ## core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java: ## @@ -309,65 +304,39 @@ protected enum CommitStatus { * @return Commit Status of Succe

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533378394 ## core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java: ## @@ -309,65 +304,39 @@ protected enum CommitStatus { * @return Commit Status of Succe

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533378904 ## core/src/main/java/org/apache/iceberg/CatalogUtil.java: ## @@ -136,6 +138,18 @@ public static void dropTableData(FileIO io, TableMetadata metadata) { deleteFi

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533411887 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -273,6 +282,21 @@ public void renameTable(TableIdentifier from, TableIdentifier origin

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533412107 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -222,9 +211,29 @@ public boolean dropTable(TableIdentifier identifier, boolean purge)

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533414139 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -222,9 +211,29 @@ public boolean dropTable(TableIdentifier identifier, boolean purge)

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533415487 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -250,7 +259,7 @@ public void renameTable(TableIdentifier from, TableIdentifier origina

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533419675 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveOperationsBase.java: ## @@ -139,22 +212,40 @@ static StorageDescriptor storageDescriptor(TableMetadata met

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533422080 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveOperationsBase.java: ## @@ -139,22 +212,40 @@ static StorageDescriptor storageDescriptor(TableMetadata met

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533430710 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveOperationsBase.java: ## @@ -181,4 +272,203 @@ default Table newHmsTable(String hmsTableOwner) { retu

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533432188 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -166,168 +159,36 @@ protected void doRefresh() { refreshFromMetadataLocati

Re: [PR] Aws: Add Iceberg version to UserAgent in S3 requests [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #9963: URL: https://github.com/apache/iceberg/pull/9963#discussion_r1533455052 ## aws/src/main/java/org/apache/iceberg/aws/s3/DefaultS3FileIOAwsClientFactory.java: ## @@ -54,6 +56,11 @@ public S3Client s3() { awsClientPropertie

Re: [PR] Aws: Add Iceberg version to UserAgent in S3 requests [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #9963: URL: https://github.com/apache/iceberg/pull/9963#discussion_r1533456221 ## aws/src/main/java/org/apache/iceberg/aws/s3/DefaultS3FileIOAwsClientFactory.java: ## @@ -54,6 +56,11 @@ public S3Client s3() { awsClientPropertie

Re: [PR] Spark 3.5: Spark action to compute the partition stats [iceberg]

2024-03-21 Thread via GitHub
ajantha-bhat closed pull request #9437: Spark 3.5: Spark action to compute the partition stats URL: https://github.com/apache/iceberg/pull/9437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Spark 3.5: Spark action to compute the partition stats [iceberg]

2024-03-21 Thread via GitHub
ajantha-bhat commented on PR #9437: URL: https://github.com/apache/iceberg/pull/9437#issuecomment-2011672902 retriggring build due to flaky test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] `system.add_files` utility does not support updated Partition Spec [iceberg]

2024-03-21 Thread via GitHub
nastra commented on issue #10008: URL: https://github.com/apache/iceberg/issues/10008#issuecomment-2011678115 @sfc-gh-asudhakar I believe you need to update the schema of the Iceberg table yourself. The [docs](https://iceberg.apache.org/docs/latest/spark-procedures/#add_files) of `add_file

Re: [I] how to run several spark ddl with transaction [iceberg]

2024-03-21 Thread via GitHub
nastra closed issue #10012: how to run several spark ddl with transaction URL: https://github.com/apache/iceberg/issues/10012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] how to run several spark ddl with transaction [iceberg]

2024-03-21 Thread via GitHub
nastra commented on issue #10012: URL: https://github.com/apache/iceberg/issues/10012#issuecomment-2011686927 @madeirak Spark itself doesn't provide any transactional mechanisms that can be expressed via SQL to execute mutiple DDL within a single transaction -- This is an automated messag

Re: [I] How to use pyiceberg to operate partition field and rename table [iceberg]

2024-03-21 Thread via GitHub
nastra closed issue #10013: How to use pyiceberg to operate partition field and rename table URL: https://github.com/apache/iceberg/issues/10013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] How to use pyiceberg to operate partition field and rename table [iceberg]

2024-03-21 Thread via GitHub
nastra commented on issue #10013: URL: https://github.com/apache/iceberg/issues/10013#issuecomment-2011689280 Probably best if this question is moved to https://github.com/apache/iceberg-python -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Spark 3.2: Support arbitrary scans in SparkBatchQueryScan [iceberg]

2024-03-21 Thread via GitHub
nastra commented on PR #10011: URL: https://github.com/apache/iceberg/pull/10011#issuecomment-2011692406 @puchengy what's the reason for backporting this to the 1.3 line? I don't think we're going to do a patch release for 1.3.x unless this is a critical fix. -- This is an automated messa

Re: [PR] Spark 3.2: Add RewritePositionDeleteFilesSparkAction [iceberg]

2024-03-21 Thread via GitHub
nastra commented on PR #10009: URL: https://github.com/apache/iceberg/pull/10009#issuecomment-2011694517 @puchengy what's the reason for backporting this to the 1.3 line? I don't think we're going to do a patch release for 1.3.x unless this is a critical fix. -- This is an automated messa

Re: [PR] feat: add transform_literal [iceberg-rust]

2024-03-21 Thread via GitHub
liurenjie1024 commented on code in PR #287: URL: https://github.com/apache/iceberg-rust/pull/287#discussion_r1533295727 ## crates/iceberg/src/transform/mod.rs: ## @@ -31,6 +34,8 @@ pub trait TransformFunction: Send { /// The implementation of this function will need to chec

Re: [I] Implement covnersion from `ArrowSchema` to iceberg `Schema`. [iceberg-rust]

2024-03-21 Thread via GitHub
liurenjie1024 closed issue #252: Implement covnersion from `ArrowSchema` to iceberg `Schema`. URL: https://github.com/apache/iceberg-rust/issues/252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] feat: Implement the conversion from Arrow Schema to Iceberg Schema [iceberg-rust]

2024-03-21 Thread via GitHub
liurenjie1024 merged PR #258: URL: https://github.com/apache/iceberg-rust/pull/258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] feat: init iceberg writer [iceberg-rust]

2024-03-21 Thread via GitHub
liurenjie1024 commented on code in PR #275: URL: https://github.com/apache/iceberg-rust/pull/275#discussion_r1533547101 ## crates/iceberg/src/writer/base_writer/data_file_writer.rs: ## @@ -0,0 +1,310 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [I] Calling `rewrite_position_delete_files` fails on tables with more than 1k columns [iceberg]

2024-03-21 Thread via GitHub
bk-mz commented on issue #9923: URL: https://github.com/apache/iceberg/issues/9923#issuecomment-2011826944 @xiaoxuandev it's because nameToId is inverted to the result: ```nameToId.forEach((key, value) -> builder.put(value, key));``` you take key, value and remap it to value, ke

Re: [PR] feat: implement prune column for schema [iceberg-rust]

2024-03-21 Thread via GitHub
liurenjie1024 commented on code in PR #261: URL: https://github.com/apache/iceberg-rust/pull/261#discussion_r1531709362 ## crates/iceberg/src/spec/schema.rs: ## @@ -1338,4 +1533,430 @@ table { ); } } +#[test] +fn test_schema_prune_columns_strin

[PR] [WIP] Migrate Scan, Schema and remaining Partition files in Core to JUnit5 [iceberg]

2024-03-21 Thread via GitHub
tomtongue opened a new pull request, #10014: URL: https://github.com/apache/iceberg/pull/10014 Migrate the following test classes in iceberg-core to JUnit 5 and AssertJ style for https://github.com/apache/iceberg/issues/9085. ## Current Progress Scan - [x] `TestScanDataFile

Re: [PR] Core: Prevent duplicate data files [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10007: URL: https://github.com/apache/iceberg/pull/10007#discussion_r1533633016 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -43,6 +44,7 @@ class FastAppend extends SnapshotProducer implements AppendFiles { private final Par

Re: [PR] Core: Prevent duplicate data files [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10007: URL: https://github.com/apache/iceberg/pull/10007#discussion_r1533634773 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -80,6 +80,8 @@ abstract class MergingSnapshotProducer extends SnapshotProducer { //

Re: [PR] Core: Prevent duplicate data files [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10007: URL: https://github.com/apache/iceberg/pull/10007#discussion_r1533635845 ## core/src/test/java/org/apache/iceberg/TestBaseIncrementalAppendScan.java: ## @@ -67,13 +67,13 @@ public void fromSnapshotInclusiveWithTag() { table.manageSnaps

Re: [I] iceberg reports an error after upgrading to 1.4.2 [iceberg]

2024-03-21 Thread via GitHub
nastra commented on issue #9018: URL: https://github.com/apache/iceberg/issues/9018#issuecomment-2011941518 might be related to https://issues.apache.org/jira/browse/SPARK-46847. When switching the Iceberg version, did you also switch the Spark version? Because that Spark issue started to h

Re: [I] [Flink] CTAS data isn't returned in Flink query [iceberg]

2024-03-21 Thread via GitHub
rmoff closed issue #9947: [Flink] CTAS data isn't returned in Flink query URL: https://github.com/apache/iceberg/issues/9947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] [Flink] CTAS data isn't returned in Flink query [iceberg]

2024-03-21 Thread via GitHub
rmoff commented on issue #9947: URL: https://github.com/apache/iceberg/issues/9947#issuecomment-2011958487 Thanks @pvary, this was 💯 the cause :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Core: Prevent duplicate data files [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10007: URL: https://github.com/apache/iceberg/pull/10007#discussion_r1533675374 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -83,9 +85,13 @@ protected Map summary() { @Override public FastAppend appendFile(DataFile file)

Re: [PR] feat: implement prune column for schema [iceberg-rust]

2024-03-21 Thread via GitHub
Dysprosium0626 commented on code in PR #261: URL: https://github.com/apache/iceberg-rust/pull/261#discussion_r1533685816 ## crates/iceberg/src/spec/schema.rs: ## @@ -1338,4 +1533,430 @@ table { ); } } +#[test] +fn test_schema_prune_columns_stri

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nk1506 commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533702231 ## core/src/main/java/org/apache/iceberg/CatalogUtil.java: ## @@ -136,6 +138,18 @@ public static void dropTableData(FileIO io, TableMetadata metadata) { deleteFi

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nk1506 commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533712833 ## core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java: ## @@ -309,65 +304,39 @@ protected enum CommitStatus { * @return Commit Status of Succe

Re: [PR] feat: implement prune column for schema [iceberg-rust]

2024-03-21 Thread via GitHub
Dysprosium0626 commented on code in PR #261: URL: https://github.com/apache/iceberg-rust/pull/261#discussion_r1533718813 ## crates/iceberg/src/spec/schema.rs: ## @@ -642,6 +644,199 @@ impl SchemaVisitor for IndexByName { } } +struct PruneColumn { +selected: HashSet,

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nk1506 commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533719987 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -222,9 +211,29 @@ public boolean dropTable(TableIdentifier identifier, boolean purge)

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533723838 ## core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java: ## @@ -309,65 +304,39 @@ protected enum CommitStatus { * @return Commit Status of Succe

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nk1506 commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533738195 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -240,9 +239,9 @@ protected void doCommit(TableMetadata base, TableMetadata met

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nk1506 commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533741453 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -304,30 +305,16 @@ protected void doCommit(TableMetadata base, TableMetadata m

Re: [I] Convert a StringLiteral into a DecimalLiteral [iceberg-rust]

2024-03-21 Thread via GitHub
Fokko closed issue #288: Convert a StringLiteral into a DecimalLiteral URL: https://github.com/apache/iceberg-rust/issues/288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Convert a StringLiteral into a DecimalLiteral [iceberg-rust]

2024-03-21 Thread via GitHub
Fokko commented on issue #288: URL: https://github.com/apache/iceberg-rust/issues/288#issuecomment-2012074380 Wrong repo, sorry! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Structured streaming writes to partitioned table fails when spark.sql.extensions is set to IcebergSparkSessionExtensions [iceberg]

2024-03-21 Thread via GitHub
greg-roberts-bbc commented on issue #7226: URL: https://github.com/apache/iceberg/issues/7226#issuecomment-2012098543 We've found a workaround in our use case. (Iceberg 1.4.3, Spark 3.3.0 on Glue 4.0). Our previous flow was: ``` # set up readStream read_stream = spark.rea

Re: [I] Convert a StringLiteral into a DecimalLiteral [iceberg-python]

2024-03-21 Thread via GitHub
Dysprosium0626 commented on issue #538: URL: https://github.com/apache/iceberg-python/issues/538#issuecomment-2012144995 Hi @Fokko I'd like to have I try but I do not know where to put these code. It seems that we already have https://github.com/apache/iceberg-python/blob/bbc7e7c8d095b4afea

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nk1506 commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533818010 ## core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java: ## @@ -309,65 +304,39 @@ protected enum CommitStatus { * @return Commit Status of Succe

Re: [PR] Hive: Arrange common part of the code for Iceberg View. [iceberg]

2024-03-21 Thread via GitHub
nk1506 commented on code in PR #10001: URL: https://github.com/apache/iceberg/pull/10001#discussion_r1533819440 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -282,7 +281,12 @@ protected void doCommit(TableMetadata base, TableMetadata me

Re: [I] Integrate with datafusion [iceberg-rust]

2024-03-21 Thread via GitHub
marvinlanhenke commented on issue #242: URL: https://github.com/apache/iceberg-rust/issues/242#issuecomment-2012273359 @ZENOTME I'm interested in your approach, perhaps you can outline what you are going to do (high-level). I'm just curious and want to understand / research where those

[PR] Add Strict projection [iceberg-python]

2024-03-21 Thread via GitHub
Fokko opened a new pull request, #539: URL: https://github.com/apache/iceberg-python/pull/539 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] feat: implement prune column for schema [iceberg-rust]

2024-03-21 Thread via GitHub
liurenjie1024 commented on code in PR #261: URL: https://github.com/apache/iceberg-rust/pull/261#discussion_r1533942476 ## crates/iceberg/src/spec/schema.rs: ## @@ -1338,4 +1533,430 @@ table { ); } } +#[test] +fn test_schema_prune_columns_strin

Re: [PR] feat: implement prune column for schema [iceberg-rust]

2024-03-21 Thread via GitHub
liurenjie1024 commented on code in PR #261: URL: https://github.com/apache/iceberg-rust/pull/261#discussion_r1533943697 ## crates/iceberg/src/spec/schema.rs: ## @@ -642,6 +644,199 @@ impl SchemaVisitor for IndexByName { } } +struct PruneColumn { +selected: HashSet, +

Re: [PR] Spark 3.2: Add RewritePositionDeleteFilesSparkAction [iceberg]

2024-03-21 Thread via GitHub
puchengy commented on PR #10009: URL: https://github.com/apache/iceberg/pull/10009#issuecomment-2012358990 @nastra we internally still maintain Spark 3.2 so we want to port this to internal. Having this available in upstream first can have pairs of eyes to make sure the change is right, and

Re: [I] iceberg reports an error after upgrading to 1.4.2 [iceberg]

2024-03-21 Thread via GitHub
zachdisc commented on issue #9018: URL: https://github.com/apache/iceberg/issues/9018#issuecomment-2012391871 That appears to be it. I didn't switch spark versions knowingly - I observed this when upgrading from EMR 6.14 (Spark 3.4.1, Iceberg 1.3.1-amzn-0) to EMR 6.15+ (Spark 3.4.1, Iceberg

Re: [PR] Spark 3.2: Add RewritePositionDeleteFilesSparkAction [iceberg]

2024-03-21 Thread via GitHub
nastra commented on PR #10009: URL: https://github.com/apache/iceberg/pull/10009#issuecomment-2012416324 If there won't be a patch release for that particular version, then I don't think it makes sense to port this to Iceberg's 1.3.x branch -- This is an automated message from the Apache

Re: [PR] feat: add builder to TableMetadata interface [iceberg-rust]

2024-03-21 Thread via GitHub
liurenjie1024 commented on PR #62: URL: https://github.com/apache/iceberg-rust/pull/62#issuecomment-2012426148 cc @y0psolo Should we close this now? I think it's resolved by #262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: init iceberg writer [iceberg-rust]

2024-03-21 Thread via GitHub
ZENOTME commented on code in PR #275: URL: https://github.com/apache/iceberg-rust/pull/275#discussion_r1534012662 ## crates/iceberg/src/writer/base_writer/data_file_writer.rs: ## @@ -0,0 +1,310 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [I] Integrate with datafusion [iceberg-rust]

2024-03-21 Thread via GitHub
ZENOTME commented on issue #242: URL: https://github.com/apache/iceberg-rust/issues/242#issuecomment-2012508866 Thanks for raising this discussion @marvinlanhenke! The basic idea for the integration is to provide the wrap struct using type in iceberg-rust so that users can use them to conne

Re: [PR] Spark 3.5: Spark action to compute the partition stats [iceberg]

2024-03-21 Thread via GitHub
ajantha-bhat commented on PR #9437: URL: https://github.com/apache/iceberg/pull/9437#issuecomment-2012543899 ping @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Core: Prevent duplicate data/delete files [iceberg]

2024-03-21 Thread via GitHub
danielcweeks commented on code in PR #10007: URL: https://github.com/apache/iceberg/pull/10007#discussion_r1534110571 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -43,6 +44,7 @@ class FastAppend extends SnapshotProducer implements AppendFiles { private fin

[PR] Hive: Use base table metadata to create HiveLock [iceberg]

2024-03-21 Thread via GitHub
lirui-apache opened a new pull request, #10016: URL: https://github.com/apache/iceberg/pull/10016 Use base (instead of new) table metadata to create the lock object, so that concurrent commits use the same lock mechanism. Fixes #10006 -- This is an automated message from the Apach

Re: [I] How to insert overwrite with a single commit [iceberg]

2024-03-21 Thread via GitHub
difin commented on issue #9720: URL: https://github.com/apache/iceberg/issues/9720#issuecomment-2012606418 CC: @gaborkaszab -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Convert row filter to arrow filter [iceberg-rust]

2024-03-21 Thread via GitHub
a-agmon commented on issue #265: URL: https://github.com/apache/iceberg-rust/issues/265#issuecomment-2012751222 Hi @viirya Perhaps a bit off-topic but wondering what you think. I have been testing this a bit, and while I have always seen performance improvements in using `ParquetReco

Re: [PR] Spark 3.2: Add RewritePositionDeleteFilesSparkAction [iceberg]

2024-03-21 Thread via GitHub
nastra closed pull request #10009: Spark 3.2: Add RewritePositionDeleteFilesSparkAction URL: https://github.com/apache/iceberg/pull/10009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Spark 3.2: Support arbitrary scans in SparkBatchQueryScan [iceberg]

2024-03-21 Thread via GitHub
nastra closed pull request #10011: Spark 3.2: Support arbitrary scans in SparkBatchQueryScan URL: https://github.com/apache/iceberg/pull/10011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[I] Implement transforms projection [iceberg-rust]

2024-03-21 Thread via GitHub
Fokko opened a new issue, #289: URL: https://github.com/apache/iceberg-rust/issues/289 For evaluating the hibben partition filters, we need to have column projections. For example, this will translate `dt <= 2024-02-01 and dt < 2024-03-01` to the partition filter `month(dt) = 2024-02`.

Re: [PR] OpenAPI: Express server capabilities via /config endpoint [iceberg]

2024-03-21 Thread via GitHub
snazy commented on code in PR #9940: URL: https://github.com/apache/iceberg/pull/9940#discussion_r1534219425 ## open-api/rest-catalog-open-api.yaml: ## @@ -1559,6 +1578,22 @@ components: type: string description: Properties that should be use

Re: [PR] feat: Implement the conversion from Arrow Schema to Iceberg Schema [iceberg-rust]

2024-03-21 Thread via GitHub
viirya commented on PR #258: URL: https://github.com/apache/iceberg-rust/pull/258#issuecomment-2012837449 Thanks @liurenjie1024 @ZENOTME @waynexia @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Migrate Scan, Schema and remaining Partition files in Core to JUnit5 [iceberg]

2024-03-21 Thread via GitHub
tomtongue commented on PR #10014: URL: https://github.com/apache/iceberg/pull/10014#issuecomment-2012876402 @nastra Could you review this PR when you have time? (one more PR should be needed to complete the migration of core files in `org/apache/iceberg` to JUnit5) -- This is an au

Re: [PR] Migrate Scan, Schema and remaining Partition files in Core to JUnit5 [iceberg]

2024-03-21 Thread via GitHub
nastra commented on code in PR #10014: URL: https://github.com/apache/iceberg/pull/10014#discussion_r1534263933 ## core/src/test/java/org/apache/iceberg/TestSchemaUpdate.java: ## @@ -1733,22 +1706,19 @@ public void testRemoveIdentifierFields() { .setIdentifierFields

Re: [PR] Migrate Scan, Schema and remaining Partition files in Core to JUnit5 [iceberg]

2024-03-21 Thread via GitHub
tomtongue commented on code in PR #10014: URL: https://github.com/apache/iceberg/pull/10014#discussion_r1534269319 ## core/src/test/java/org/apache/iceberg/TestSchemaUpdate.java: ## @@ -1733,22 +1706,19 @@ public void testRemoveIdentifierFields() { .setIdentifierFie

Re: [PR] Kafka Connect: Record converters [iceberg]

2024-03-21 Thread via GitHub
bryanck commented on PR #9641: URL: https://github.com/apache/iceberg/pull/9641#issuecomment-2012955949 I was planning on merging this, unless someone wants to give more feedback, cc @fqaiser94 @danielcweeks -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Migrate Scan, Schema and remaining Partition files in Core to JUnit5 [iceberg]

2024-03-21 Thread via GitHub
nastra merged PR #10014: URL: https://github.com/apache/iceberg/pull/10014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Migrate Scan, Schema and remaining Partition files in Core to JUnit5 [iceberg]

2024-03-21 Thread via GitHub
tomtongue commented on code in PR #10014: URL: https://github.com/apache/iceberg/pull/10014#discussion_r1534284283 ## core/src/test/java/org/apache/iceberg/TestSchemaUpdate.java: ## @@ -1733,22 +1706,19 @@ public void testRemoveIdentifierFields() { .setIdentifierFie

Re: [PR] Add local nightly build to test current docs changes [iceberg]

2024-03-21 Thread via GitHub
rdblue commented on code in PR #9943: URL: https://github.com/apache/iceberg/pull/9943#discussion_r1534304249 ## site/nav.yml: ## @@ -21,6 +21,7 @@ nav: - Spark: spark-quickstart.md - Hive: hive-quickstart.md - Docs: +- nightly: '!include docs/docs/nightly/mkdo

Re: [I] `system.add_files` utility does not support updated Partition Spec [iceberg]

2024-03-21 Thread via GitHub
sfc-gh-asudhakar commented on issue #10008: URL: https://github.com/apache/iceberg/issues/10008#issuecomment-2013045412 > @sfc-gh-asudhakar I believe you need to update the schema of the Iceberg table yourself. The [docs](https://iceberg.apache.org/docs/latest/spark-procedures/#add_files) o

Re: [I] `system.add_files` utility does not support updated Partition Spec [iceberg]

2024-03-21 Thread via GitHub
nastra commented on issue #10008: URL: https://github.com/apache/iceberg/issues/10008#issuecomment-2013093976 Sorry I must have missed step 2 when reading the description. I'll take a closer look and will update the issue once I know more. -- This is an automated message from the Apache G

Re: [PR] Add Snapshots table metadata [iceberg-python]

2024-03-21 Thread via GitHub
Gowthami03B commented on PR #524: URL: https://github.com/apache/iceberg-python/pull/524#issuecomment-2013119342 @Fokko Can we merge this? I am almost done with "Files" table, so I can rebase my code before creating a PR. -- This is an automated message from the Apache Git Service. To re

Re: [I] Integrate with datafusion [iceberg-rust]

2024-03-21 Thread via GitHub
marvinlanhenke commented on issue #242: URL: https://github.com/apache/iceberg-rust/issues/242#issuecomment-2013174861 > The datafusion provides the following trait to manage the table: > > * CatalogProviderList > * CatalogProvider > * SchemaProvider > * TableProvider T

Re: [I] Implement transforms projection [iceberg-rust]

2024-03-21 Thread via GitHub
marvinlanhenke commented on issue #289: URL: https://github.com/apache/iceberg-rust/issues/289#issuecomment-2013198395 #264 as ref for implementing `project()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [core] fix #9997 - Handle s3a file upload interrupt which results in table metadata pointing to files that doesn't exist [iceberg]

2024-03-21 Thread via GitHub
abmo-x commented on code in PR #9998: URL: https://github.com/apache/iceberg/pull/9998#discussion_r1534516571 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopStreams.java: ## @@ -185,8 +185,21 @@ public void flush() throws IOException { @Override public void cl

Re: [PR] [core] fix #9997 - Handle s3a file upload interrupt which results in table metadata pointing to files that doesn't exist [iceberg]

2024-03-21 Thread via GitHub
abmo-x commented on code in PR #9998: URL: https://github.com/apache/iceberg/pull/9998#discussion_r1534517549 ## core/src/test/java/org/apache/iceberg/hadoop/HadoopStreamsTest.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [PR] [core] fix #9997 - Handle s3a file upload interrupt which results in table metadata pointing to files that doesn't exist [iceberg]

2024-03-21 Thread via GitHub
abmo-x commented on code in PR #9998: URL: https://github.com/apache/iceberg/pull/9998#discussion_r1534518373 ## core/src/test/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] [core] fix #9997 - Handle s3a file upload interrupt which results in table metadata pointing to files that doesn't exist [iceberg]

2024-03-21 Thread via GitHub
abmo-x commented on code in PR #9998: URL: https://github.com/apache/iceberg/pull/9998#discussion_r1534520860 ## core/src/test/java/org/apache/iceberg/hadoop/HadoopStreamsTest.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[I] Cannot create table if location is s3 with "secure" Minio server [iceberg-python]

2024-03-21 Thread via GitHub
thinkORo opened a new issue, #540: URL: https://github.com/apache/iceberg-python/issues/540 ### Apache Iceberg version 0.6.0 (latest release) ### Please describe the bug 🐞 I've create a .pyiceberg.yaml file with the following content: ``` catalog: default:

Re: [PR] [core] fix #9997 - Handle s3a file upload interrupt which results in table metadata pointing to files that doesn't exist [iceberg]

2024-03-21 Thread via GitHub
abmo-x commented on code in PR #9998: URL: https://github.com/apache/iceberg/pull/9998#discussion_r1534534023 ## core/src/test/java/org/apache/iceberg/hadoop/HadoopStreamsTest.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [I] Delete/Update fails for tables with more than 1000 columns [iceberg]

2024-03-21 Thread via GitHub
xiaoxuandev commented on issue #6368: URL: https://github.com/apache/iceberg/issues/6368#issuecomment-2013502300 Getting a similar error for UPDATE in Iceberg 1.4.3 release, stack track below: ``` java.lang.AssertionError: Expecting code not to raise a throwable but caught "ja

Re: [I] select distinct on table scan [iceberg-python]

2024-03-21 Thread via GitHub
Fokko closed issue #403: select distinct on table scan URL: https://github.com/apache/iceberg-python/issues/403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Convert row filter to arrow filter [iceberg-rust]

2024-03-21 Thread via GitHub
viirya commented on issue #265: URL: https://github.com/apache/iceberg-rust/issues/265#issuecomment-2013554756 Hmm, I wonder if the filtering takes too much time cost on so called common values? Is the predicate filter very complicated? Normally I think filtering on scan can boost performan

[PR] Modify `Bind` calls so that they don't consume `self` and instead return a new struct, leaving the original unmoved [iceberg-rust]

2024-03-21 Thread via GitHub
sdd opened a new pull request, #290: URL: https://github.com/apache/iceberg-rust/pull/290 This is a pre-requisite to https://github.com/apache/iceberg-rust/pull/241 and was a part of that PR but has been pulled into it's own PR after discussions with @liurenjie1024. The existing Pred

Re: [PR] Add Snapshots table metadata [iceberg-python]

2024-03-21 Thread via GitHub
Fokko commented on PR #524: URL: https://github.com/apache/iceberg-python/pull/524#issuecomment-2013620848 > Just have one question: I was thinking if later we need those metadata table classes, StaticTableScan, and StaticDataTask like what Java did. These may become useful when other engin

  1   2   >