Re: [PR] create_table with PyArrow Schema [iceberg-python]

2024-01-26 Thread via GitHub
HonahX commented on code in PR #305: URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1467303335 ## pyiceberg/io/pyarrow.py: ## @@ -906,6 +986,76 @@ def after_map_value(self, element: pa.Field) -> None: self._field_names.pop() +class _ConvertToIce

Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

2024-01-26 Thread via GitHub
tustvold commented on issue #172: URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-1911697303 Thank you both for the responses. > In iceberg's design, all file ios are hidden under the FileIO interface, and the backends, i.e. OpenDAL or object_store are not directly

Re: [PR] Flink: Adds the ability to read from a branch on the Flink Iceberg Source [iceberg]

2024-01-26 Thread via GitHub
pvary commented on code in PR #9547: URL: https://github.com/apache/iceberg/pull/9547#discussion_r1467392787 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/StreamingMonitorFunction.java: ## @@ -195,7 +192,10 @@ void monitorAndForwardSplits() { // Refresh

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1467405697 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -886,6 +886,202 @@ private String viewName(String viewName) {

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1467293803 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -886,6 +886,202 @@ private String viewName(String viewName) {

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1467415508 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-26 Thread via GitHub
jbonofre commented on PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#issuecomment-1911787495 I agree, +1 to merge this PR 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

2024-01-26 Thread via GitHub
alamb commented on issue #172: URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-1911845190 Thank you all -- this is a great conversation. > I entirely agree, I guess I was more suggesting that the IO abstraction mirror object_store as this is what both the upstream

Re: [I] Slowness when loading table from S3 [iceberg-python]

2024-01-26 Thread via GitHub
anechii commented on issue #220: URL: https://github.com/apache/iceberg-python/issues/220#issuecomment-1911858579 @itaise currently facing the same issue, but on a larger scale (bigger table, bigger metadata). Did you find a way to speed this up ? Thanks. -- This is an automated mes

Re: [I] Reading large data through Glue Catalog is SLOW [iceberg]

2024-01-26 Thread via GitHub
anechii commented on issue #9559: URL: https://github.com/apache/iceberg/issues/9559#issuecomment-1911871427 linking associate issue: https://github.com/apache/iceberg-python/issues/220 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] Reading large data through Glue Catalog is SLOW [iceberg]

2024-01-26 Thread via GitHub
anechii closed issue #9559: Reading large data through Glue Catalog is SLOW URL: https://github.com/apache/iceberg/issues/9559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-26 Thread via GitHub
Fokko commented on PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#issuecomment-1911896187 Fully Agree! Thanks @Xuanwo for working on this! And @jbonofre and @liurenjie1024 for the reviews 🙏 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-26 Thread via GitHub
Fokko merged PR #147: URL: https://github.com/apache/iceberg-rust/pull/147 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1467525182 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -40,6 +46,19 @@ case class RewriteViewCommands(spar

Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

2024-01-26 Thread via GitHub
liurenjie1024 commented on issue #172: URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-1911945629 Thanks everyone for this very nice discussion. > I'd be happy to help out with this, if you're open to contributions, both myself and my employer are very interested i

Re: [I] about report [iceberg]

2024-01-26 Thread via GitHub
nastra commented on issue #9560: URL: https://github.com/apache/iceberg/issues/9560#issuecomment-1911962143 @lpy148145 are you referring to https://iceberg.apache.org/docs/latest/metrics-reporting/#commitreport? -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] about query snapshotId [iceberg]

2024-01-26 Thread via GitHub
nastra commented on issue #9558: URL: https://github.com/apache/iceberg/issues/9558#issuecomment-1911963307 @lpy148145 can you please be a little bit more specific in what you'd like to achieve exactly? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [I] Apache Iceberg - Branch cannot be merged using the fast_forward procedure [iceberg]

2024-01-26 Thread via GitHub
nastra commented on issue #9553: URL: https://github.com/apache/iceberg/issues/9553#issuecomment-1911967675 I'm surprised that the procedure can't be found, given that you have defined `org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions`. Do any of the other procedures work i

Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

2024-01-26 Thread via GitHub
tustvold commented on issue #172: URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-1911993011 > I think if users are judicious and provide sufficients hints, and buffer the reads the performance difference will be negligible. If primarily performing sequential IO I

Re: [PR] fix: Ignore negative statistics value [iceberg-rust]

2024-01-26 Thread via GitHub
Fokko merged PR #173: URL: https://github.com/apache/iceberg-rust/pull/173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] fix: Ignore negative statistics value [iceberg-rust]

2024-01-26 Thread via GitHub
Fokko commented on PR #173: URL: https://github.com/apache/iceberg-rust/pull/173#issuecomment-1912042306 Thanks for the fix @liurenjie1024, thanks for the review @Xuanwo, and thanks for reporting @Samrose-Ahmed -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] two bugs [iceberg-rust]

2024-01-26 Thread via GitHub
Fokko closed issue #165: two bugs URL: https://github.com/apache/iceberg-rust/issues/165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr

Re: [PR] Add UnionByName functionality [iceberg-python]

2024-01-26 Thread via GitHub
Fokko commented on code in PR #296: URL: https://github.com/apache/iceberg-python/pull/296#discussion_r1467632051 ## pyiceberg/table/__init__.py: ## @@ -1995,6 +2020,159 @@ def primitive(self, primitive: PrimitiveType) -> Optional[IcebergType]: return primitive +cl

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1467647161 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -886,6 +886,202 @@ private String viewName(String viewName) {

Re: [I] Add unionByName visitor to update schema [iceberg-python]

2024-01-26 Thread via GitHub
Fokko closed issue #284: Add unionByName visitor to update schema URL: https://github.com/apache/iceberg-python/issues/284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Add UnionByName functionality [iceberg-python]

2024-01-26 Thread via GitHub
Fokko merged PR #296: URL: https://github.com/apache/iceberg-python/pull/296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Add UnionByName functionality [iceberg-python]

2024-01-26 Thread via GitHub
Fokko commented on PR #296: URL: https://github.com/apache/iceberg-python/pull/296#issuecomment-1912081292 Thanks for the review @HonahX -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Arrow: Don't copy the list/map when not needed [iceberg-python]

2024-01-26 Thread via GitHub
Fokko commented on PR #252: URL: https://github.com/apache/iceberg-python/pull/252#issuecomment-1912095834 @HonahX I agree, thanks for merging. This PR is already an improvement over the previous situation, so I think it is good to have it in. -- This is an automated message from the Apac

Re: [I] Hive Catalog: Support upgrading table version [iceberg-python]

2024-01-26 Thread via GitHub
Fokko closed issue #274: Hive Catalog: Support upgrading table version URL: https://github.com/apache/iceberg-python/issues/274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Hive Catalog: Support upgrading table version [iceberg-python]

2024-01-26 Thread via GitHub
Fokko commented on issue #274: URL: https://github.com/apache/iceberg-python/issues/274#issuecomment-1912102523 This has been merged in #294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Allow Table version upgrades through Hive [iceberg-python]

2024-01-26 Thread via GitHub
Fokko commented on issue #205: URL: https://github.com/apache/iceberg-python/issues/205#issuecomment-1912105149 Fixed in #274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Allow Table version upgrades through Hive [iceberg-python]

2024-01-26 Thread via GitHub
Fokko closed issue #205: Allow Table version upgrades through Hive URL: https://github.com/apache/iceberg-python/issues/205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] When will the 0.6.0 version be released? [iceberg-python]

2024-01-26 Thread via GitHub
Fokko commented on issue #192: URL: https://github.com/apache/iceberg-python/issues/192#issuecomment-1912106219 I've sent out a `[DISCUSS]` thread: https://lists.apache.org/thread/6hsqmdlv6q3f56syopfjfoprf9por6rx -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

2024-01-26 Thread via GitHub
Fokko commented on issue #172: URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-1912130317 Thanks @tustvold for raising this and please don't hesitate to open an issue or PR. > For example Spark has had a very hard time getting a performant S3 integration, with pr

Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

2024-01-26 Thread via GitHub
tustvold commented on issue #172: URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-1912160916 > It looks to me that object_store and FileIO aim to solve the same problem That's awesome, thank you for the link. That is exactly what object_store is, an opinionated abs

Re: [PR] create_table with PyArrow Schema [iceberg-python]

2024-01-26 Thread via GitHub
syun64 commented on code in PR #305: URL: https://github.com/apache/iceberg-python/pull/305#discussion_r1467757335 ## pyiceberg/io/pyarrow.py: ## @@ -906,6 +986,76 @@ def after_map_value(self, element: pa.Field) -> None: self._field_names.pop() +class _ConvertToIce

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1467807645 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -886,6 +886,202 @@ private String viewName(String viewName) {

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1467808305 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1467810466 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala: ## @@ -107,6 +108,21 @@ case class Extend

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
rdblue merged PR #9423: URL: https://github.com/apache/iceberg/pull/9423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-26 Thread via GitHub
rdblue commented on PR #9423: URL: https://github.com/apache/iceberg/pull/9423#issuecomment-1912276296 Merged! Thanks for all the work on this, @nastra! It's great to have this done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Spark: Support altering views [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9510: URL: https://github.com/apache/iceberg/pull/9510#discussion_r1467911499 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala: ## @@ -36,6 +38,9 @@ object CheckViews extends (LogicalPlan => Uni

Re: [PR] Spark: Support altering views [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9510: URL: https://github.com/apache/iceberg/pull/9510#discussion_r1467911970 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -87,15 +86,6 @@ case class ResolveViews(spark: SparkSessio

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912416306 @Fokko Thank you! These 2 points of supporting hidden partitioning and extracting metrics efficiently during writing are very insightful! For using pyarrow.dataset.write_da

Re: [PR] Spark: Support altering views [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9510: URL: https://github.com/apache/iceberg/pull/9510#discussion_r1467925434 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1149,10 +1148,343 @@ public void createViewWithSubqueryExpressio

Re: [PR] Spark: Support altering views [iceberg]

2024-01-26 Thread via GitHub
nastra commented on code in PR #9510: URL: https://github.com/apache/iceberg/pull/9510#discussion_r1467933159 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1149,10 +1148,343 @@ public void createViewWithSubqueryExpressio

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912495464 Right, as @jqin61 mentioned, if we only had to support **Transformed Partitions**, we could have employed some hack to add partition column to the dataset, which gets consumed by

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
zeroshade commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468013700 ## catalog/catalog.go: ## @@ -0,0 +1,65 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

Re: [I] iceberg v2 table cannot expire delete files after rewrite datafile action [iceberg]

2024-01-26 Thread via GitHub
pnain commented on issue #5058: URL: https://github.com/apache/iceberg/issues/5058#issuecomment-1912518683 I can still see this issue is not resolved 100%. while querying manifest$files, I can see less file compared to number of files on S3 -- This is an automated message from the Apache

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
zeroshade commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468016854 ## catalog/catalog.go: ## @@ -0,0 +1,65 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

[PR] Spark: Cleanup the code branch for merge distribution mode conf which is no longer needed [iceberg]

2024-01-26 Thread via GitHub
allod opened a new pull request, #9561: URL: https://github.com/apache/iceberg/pull/9561 Fix for `SparkWriteConf` where Iceberg table has `write.distribution-mode=range` set in the table properties but MERGE operation incorrectly resolves distribution mode as `HASH` for partitioned tables i

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
wolfeidau commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468038569 ## catalog/catalog.go: ## @@ -0,0 +1,65 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

Re: [PR] Spark: Cleanup the code branch for merge distribution mode conf which is no longer needed [iceberg]

2024-01-26 Thread via GitHub
RussellSpitzer commented on code in PR #9561: URL: https://github.com/apache/iceberg/pull/9561#discussion_r1468042548 ## spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkDistributionAndOrderingUtil.java: ## @@ -1708,14 +1708,12 @@ public void testDefaultCopyOnWr

[I] gc.enabled property is set to false by default for Apache Iceberg table created in Nessie Catalog [iceberg]

2024-01-26 Thread via GitHub
Ashwin07 opened a new issue, #9562: URL: https://github.com/apache/iceberg/issues/9562 I have created the below Apache Iceberg table in Nessie. Nessie version: 0.58.1 Spark version: 3.3 Apache Iceberg version: 1.3.0 Table Name: gold_layer.CORP_DRM_PKG.PHYS_GEO_HIER Loca

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912560771 Maybe another approach we could take if we want to use existing PyArrow functions is: 1. table.sort_by (all partitions) 2. figure out the row index for each permutation of p

Re: [I] Apache Iceberg - Branch cannot be merged using the fast_forward procedure [iceberg]

2024-01-26 Thread via GitHub
Ashwin07 commented on issue #9553: URL: https://github.com/apache/iceberg/issues/9553#issuecomment-1912566095 I have tried Expire snapshot but it seems to throw different error, at least I did not get a blanket statement like the procedure cannot be found. [https://github.com/apache/icebe

Re: [PR] Spark 3.4: Cleanup the code branch for merge distribution mode conf which is no longer needed [iceberg]

2024-01-26 Thread via GitHub
allod commented on code in PR #9561: URL: https://github.com/apache/iceberg/pull/9561#discussion_r1468060129 ## spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkDistributionAndOrderingUtil.java: ## @@ -1708,14 +1708,12 @@ public void testDefaultCopyOnWriteMergeP

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912636932 @jqin61 I have also seen this behavior pyarrow.dataset.write_dataset(), its behavior removes the partition columns in the written-out parquet files. @syun64 above approac

Re: [I] Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

2024-01-26 Thread via GitHub
Fokko closed issue #8902: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger URL: https://github.com/apache/iceberg/issues/8902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

2024-01-26 Thread via GitHub
Fokko merged PR #8980: URL: https://github.com/apache/iceberg/pull/8980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Partition Evolution [iceberg-python]

2024-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1465805383 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2317,244 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec:

Re: [PR] Spark 3.4: Cleanup the code branch for merge distribution mode conf which is no longer needed [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9561: URL: https://github.com/apache/iceberg/pull/9561#discussion_r1468127155 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java: ## @@ -354,10 +354,6 @@ private DistributionMode copyOnWriteMergeDistributionMode()

Re: [PR] Spark 3.4: Cleanup the code branch for merge distribution mode conf which is no longer needed [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9561: URL: https://github.com/apache/iceberg/pull/9561#discussion_r1468127155 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java: ## @@ -354,10 +354,6 @@ private DistributionMode copyOnWriteMergeDistributionMode()

Re: [PR] Spark 3.4: Cleanup the code branch for merge distribution mode conf which is no longer needed [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9561: URL: https://github.com/apache/iceberg/pull/9561#discussion_r1468127155 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java: ## @@ -354,10 +354,6 @@ private DistributionMode copyOnWriteMergeDistributionMode()

Re: [I] Structured streaming writes to partitioned table fails when spark.sql.extensions is set to IcebergSparkSessionExtensions [iceberg]

2024-01-26 Thread via GitHub
kaijiezhang0319 commented on issue #7226: URL: https://github.com/apache/iceberg/issues/7226#issuecomment-1912673470 hello @aokolnychyi . We are meeting similar issue. Spark 3.3.1 and iceberg 1.1.0. We previously use spark3.2.1 and iceberg 1.1.0 it works fine. And we find the issue w

Re: [PR] Spark 3.4: Cleanup the code branch for merge distribution mode conf which is no longer needed [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on PR #9561: URL: https://github.com/apache/iceberg/pull/9561#issuecomment-1912673526 I am not sure I agree with this change. @allod @RussellSpitzer, could you provide a bit more context? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Partition Evolution [iceberg-python]

2024-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1468130973 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2325,240 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec:

Re: [PR] Spark 3.4: Cleanup the code branch for merge distribution mode conf which is no longer needed [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9561: URL: https://github.com/apache/iceberg/pull/9561#discussion_r1468131135 ## spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkDistributionAndOrderingUtil.java: ## @@ -1708,14 +1708,12 @@ public void testDefaultCopyOnWrite

Re: [PR] Partition Evolution [iceberg-python]

2024-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1468130973 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2325,240 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec:

Re: [PR] Partition Evolution [iceberg-python]

2024-01-26 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1468131844 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2317,244 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec:

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1912692894 This change seems reasonable to me. @advancedxy, could you also post to the dev list that this was merged to get any input from folks who did not review before we release 1.5? I feel

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912738795 @Fokko @syun64 @syun64 another option I can think is use polars to do it simple example below with hashing and partitioning sorting in partition. Where all the partition is

Re: [I] Support partitioned writes [iceberg-python]

2024-01-26 Thread via GitHub
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1912742015 @jqin61 and I discussed this a great deal offline, and we just wanted to follow up on step (2). If we wanted to use existing PyArrow functions, I think we could use a 2 pass algo

[PR] Build: Bump coverage from 7.4.0 to 7.4.1 [iceberg-python]

2024-01-26 Thread via GitHub
dependabot[bot] opened a new pull request, #307: URL: https://github.com/apache/iceberg-python/pull/307 Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.4.0 to 7.4.1. Changelog Sourced from https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst";>coverage's chang

[PR] Build: Bump boto3 from 1.34.22 to 1.34.27 [iceberg-python]

2024-01-26 Thread via GitHub
dependabot[bot] opened a new pull request, #308: URL: https://github.com/apache/iceberg-python/pull/308 Bumps [boto3](https://github.com/boto/boto3) from 1.34.22 to 1.34.27. Changelog Sourced from https://github.com/boto/boto3/blob/develop/CHANGELOG.rst";>boto3's changelog.

Re: [PR] Spark 3.5: Support specifying filter in RewriteManifestsProcedure [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9447: URL: https://github.com/apache/iceberg/pull/9447#discussion_r1468199919 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteManifestsProcedure.java: ## @@ -118,4 +126,15 @@ private InternalRow[] toOutputRows(Rew

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1468201922 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(Snapsho

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1468204713 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -131,12 +133,12 @@ public SparkTable(Table icebergTable, boolean refreshEa

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on PR #9455: URL: https://github.com/apache/iceberg/pull/9455#issuecomment-1912779549 I think I got the problem, let me also take another look next week. I'll need fresh eyes. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Spark 3.5: Fix testDeleteFileThenMetadataDelete failure due to table not refreshed [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9551: URL: https://github.com/apache/iceberg/pull/9551#discussion_r1468207547 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java: ## @@ -166,6 +166,28 @@ public static Object[

Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on PR #9556: URL: https://github.com/apache/iceberg/pull/9556#issuecomment-1912790467 Let me see. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Flaky test: TestSparkExecutorCache > testMergeOnReadDelete() [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on issue #9511: URL: https://github.com/apache/iceberg/issues/9511#issuecomment-1912791701 I'll look into it, probably related to some cache invalidation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] Build: Bump pypa/cibuildwheel from 2.16.2 to 2.16.3 [iceberg-python]

2024-01-26 Thread via GitHub
dependabot[bot] opened a new pull request, #309: URL: https://github.com/apache/iceberg-python/pull/309 Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.16.2 to 2.16.3. Release notes Sourced from https://github.com/pypa/cibuildwheel/releases";>pypa/cibuildwhee

Re: [PR] Core: Fix setting updated parquet compression property [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on PR #9503: URL: https://github.com/apache/iceberg/pull/9503#issuecomment-1912799517 Is there a problem in persisting these properties even if the underlying file format is not Parquet? -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
zeroshade commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468226714 ## catalog/catalog.go: ## @@ -0,0 +1,65 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

[PR] Open-API: Add table updates for statistics [iceberg]

2024-01-26 Thread via GitHub
mrcnc opened a new pull request, #9564: URL: https://github.com/apache/iceberg/pull/9564 Table statistics were added in https://github.com/apache/iceberg/pull/5450 and it would be helpful to have them in the OpenAPI spec for REST catalog implementations that don't use the Java parsers from

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
zeroshade commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468227475 ## catalog/glue.go: ## @@ -0,0 +1,186 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE fil

Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9556: URL: https://github.com/apache/iceberg/pull/9556#discussion_r1468232951 ## spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala: ## @@ -214,6 +214,8 @@ object RewriteMergeIntoTabl

Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on code in PR #9556: URL: https://github.com/apache/iceberg/pull/9556#discussion_r1468233102 ## spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java: ## @@ -523,6 +523,78 @@ public void testMergeWithOnlyUpdateClauseAn

Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]

2024-01-26 Thread via GitHub
aokolnychyi commented on PR #9556: URL: https://github.com/apache/iceberg/pull/9556#issuecomment-1912823558 I think you identified the problem correctly, @amogh-jahagirdar. We have to take into account the read attributes when determining the nullability of `MergeRows` output. This matches

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
wolfeidau commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468239204 ## catalog/glue.go: ## @@ -0,0 +1,186 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE fil

[I] Consolidate FileIO [iceberg-python]

2024-01-26 Thread via GitHub
kevinjqliu opened a new issue, #310: URL: https://github.com/apache/iceberg-python/issues/310 ### Feature Request / Improvement **Can we consolidate and standardize FileIO to the PyArrow implementation?** There are currently two different FileIO implementations, `ARROW_FILE_IO`

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
wolfeidau commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468250828 ## catalog/catalog.go: ## @@ -0,0 +1,65 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

Re: [I] Add retry framework to Hadoop table load [iceberg]

2024-01-26 Thread via GitHub
github-actions[bot] commented on issue #758: URL: https://github.com/apache/iceberg/issues/758#issuecomment-1912854704 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-26 Thread via GitHub
szehon-ho commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1468254899 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(SnapshotP

Re: [PR] Core: rewrite should drop delete files by data sequence number partition wise [iceberg]

2024-01-26 Thread via GitHub
szehon-ho commented on code in PR #9454: URL: https://github.com/apache/iceberg/pull/9454#discussion_r1468254899 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -289,13 +321,38 @@ private void invalidateFilteredCache() { cleanUncommitted(SnapshotP

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
zeroshade commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468258774 ## catalog/catalog.go: ## @@ -0,0 +1,65 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-26 Thread via GitHub
zeroshade commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1468259033 ## catalog/glue.go: ## @@ -0,0 +1,186 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE fil

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-26 Thread via GitHub
advancedxy commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1912873339 > This change seems reasonable to me. @advancedxy, could you also post to the dev list that this was merged to get any input from folks who did not review before we release 1.5? I feel

Re: [I] [Proposal] Iceberg Materialized View Spec [iceberg]

2024-01-26 Thread via GitHub
szehon-ho commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1912878826 Hi @JanKaul . Thanks for putting this together. I went through the detailed discussion, and see the general consensus to the "Open Questions" in the design docs are: 1. T

  1   2   >