Re: [PR] [FEAT]register table using iceberg metadata file via pyiceberg [iceberg-python]

2024-05-22 Thread via GitHub
MehulBatra commented on PR #711: URL: https://github.com/apache/iceberg-python/pull/711#issuecomment-2126351033 > @MehulBatra Thanks for updating the test!. Thanks @kevinjqliu for reviewing! Thank you @kevinjqliu and @HonahX for the great collaboration! -- This is an automated mess

Re: [PR] Add Files metadata table [iceberg-python]

2024-05-22 Thread via GitHub
geruh commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1611039641 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,106 @@ def update_partitions_map( schema=table_schema, ) +def files(self, snapshot_id

Re: [PR] Add Files metadata table [iceberg-python]

2024-05-22 Thread via GitHub
geruh commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1611039641 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,106 @@ def update_partitions_map( schema=table_schema, ) +def files(self, snapshot_id

Re: [PR] modify doc(backward compatibility) typo [iceberg-python]

2024-05-22 Thread via GitHub
HonahX merged PR #757: URL: https://github.com/apache/iceberg-python/pull/757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Flink: Maintenance - MonitorSource [iceberg]

2024-05-22 Thread via GitHub
pvary commented on code in PR #10308: URL: https://github.com/apache/iceberg/pull/10308#discussion_r1610991960 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/MonitorSource.java: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] #9073 Junit 4 tests switched to JUnit 5 [iceberg]

2024-05-22 Thread via GitHub
igoradulian commented on PR #9793: URL: https://github.com/apache/iceberg/pull/9793#issuecomment-2126211802 @nastra please review last changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2024-05-22 Thread via GitHub
zinking commented on code in PR #8797: URL: https://github.com/apache/iceberg/pull/8797#discussion_r1610904400 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -146,13 +148,33 @@ public RewriteDataFilesSparkAction filter(

Re: [I] remove orphan file question [iceberg]

2024-05-22 Thread via GitHub
manuzhang commented on issue #10363: URL: https://github.com/apache/iceberg/issues/10363#issuecomment-2126086018 All maintenance procedures are one-time and you need to set up periodical jobs with a scheduler like Airflow -- This is an automated message from the Apache Git Service. To res

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2024-05-22 Thread via GitHub
szehon-ho commented on code in PR #8797: URL: https://github.com/apache/iceberg/pull/8797#discussion_r1610874491 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java: ## @@ -85,6 +87,66 @@ public void testRewriteData

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2024-05-22 Thread via GitHub
szehon-ho commented on code in PR #8797: URL: https://github.com/apache/iceberg/pull/8797#discussion_r1610849033 ## api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java: ## @@ -171,6 +171,17 @@ default RewriteDataFiles zOrder(String... columns) { */ RewriteD

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2024-05-22 Thread via GitHub
szehon-ho commented on code in PR #8797: URL: https://github.com/apache/iceberg/pull/8797#discussion_r1610846416 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java: ## @@ -254,6 +258,15 @@ private static void apply(UpdateSchema pendingUpdate, TableChange

Re: [I] remove orphan file question [iceberg]

2024-05-22 Thread via GitHub
JunseoChoJJ commented on issue #10363: URL: https://github.com/apache/iceberg/issues/10363#issuecomment-2126025783 @manuzhang thanks a lot can i ask you one more question -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Make AzureProperties w/ shared-key creds serializable [iceberg]

2024-05-22 Thread via GitHub
simonykq commented on PR #10045: URL: https://github.com/apache/iceberg/pull/10045#issuecomment-2126015016 any eta on when this would be merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Views, Spark: Add support for Materialized Views; Integrate with Spark SQL [iceberg]

2024-05-22 Thread via GitHub
wmoustafa commented on PR #9830: URL: https://github.com/apache/iceberg/pull/9830#issuecomment-2125945879 > @wmoustafa, Read this today, was wondering if there is something we can utilize from CDC (considering iceberg has support for that) perspective ? how expensive the refreshes of a PB s

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-05-22 Thread via GitHub
stevenzwu commented on code in PR #10179: URL: https://github.com/apache/iceberg/pull/10179#discussion_r1605607730 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java: ## @@ -0,0 +1,780 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Flink: Maintenance - MonitorSource [iceberg]

2024-05-22 Thread via GitHub
stevenzwu commented on code in PR #10308: URL: https://github.com/apache/iceberg/pull/10308#discussion_r1608720543 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/maintenance/operator/SingleThreadedIteratorSource.java: ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apa

[PR] Bump mkdocstrings-python from 1.10.2 to 1.10.3 [iceberg-python]

2024-05-22 Thread via GitHub
dependabot[bot] opened a new pull request, #762: URL: https://github.com/apache/iceberg-python/pull/762 Bumps [mkdocstrings-python](https://github.com/mkdocstrings/python) from 1.10.2 to 1.10.3. Release notes Sourced from https://github.com/mkdocstrings/python/releases";>mkdocstrin

Re: [PR] Fix aggregate pushdown when optional DataFile stats are null [iceberg]

2024-05-22 Thread via GitHub
szehon-ho commented on code in PR #10273: URL: https://github.com/apache/iceberg/pull/10273#discussion_r1610731511 ## api/src/test/java/org/apache/iceberg/expressions/TestAggregateEvaluator.java: ## @@ -95,6 +95,24 @@ public class TestAggregateEvaluator { FILE, MISSING_SOME

Re: [PR] [FEAT]register table using iceberg metadata file via pyiceberg [iceberg-python]

2024-05-22 Thread via GitHub
HonahX merged PR #711: URL: https://github.com/apache/iceberg-python/pull/711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Add Files metadata table [iceberg-python]

2024-05-22 Thread via GitHub
Gowthami03B commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1610653224 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,106 @@ def update_partitions_map( schema=table_schema, ) +def files(self, snaps

Re: [PR] Add Files metadata table [iceberg-python]

2024-05-22 Thread via GitHub
Gowthami03B commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1610651660 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,106 @@ def update_partitions_map( schema=table_schema, ) +def files(self, snaps

Re: [PR] Add Files metadata table [iceberg-python]

2024-05-22 Thread via GitHub
Gowthami03B commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1610651660 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,106 @@ def update_partitions_map( schema=table_schema, ) +def files(self, snaps

Re: [PR] Spark: Add SparkSQLProperty to control split-size [iceberg]

2024-05-22 Thread via GitHub
szehon-ho commented on PR #10336: URL: https://github.com/apache/iceberg/pull/10336#issuecomment-2125733798 Yea its really something that would be great to fix in Spark. I hacked together another attempt https://github.com/apache/spark/pull/46707 based on the last comment in https://github

Re: [PR] REST: convert RESTException to CommitStateUnknownException to avoid incorrect cleanup of metadata files due to network error [iceberg]

2024-05-22 Thread via GitHub
stevenzwu closed pull request #10366: REST: convert RESTException to CommitStateUnknownException to avoid incorrect cleanup of metadata files due to network error URL: https://github.com/apache/iceberg/pull/10366 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] REST: convert RESTException to CommitStateUnknownException to avoid incorrect cleanup of metadata files due to network error [iceberg]

2024-05-22 Thread via GitHub
stevenzwu commented on PR #10366: URL: https://github.com/apache/iceberg/pull/10366#issuecomment-2125658429 figure out why we can't reproduce the issue with the unit test. it is already fixed by @amogh-jahagirdar in PR #8397 . cleanup is protected by exception type check. ``` if

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.25.50 to 2.25.55 [iceberg]

2024-05-22 Thread via GitHub
dependabot[bot] commented on PR #10355: URL: https://github.com/apache/iceberg/pull/10355#issuecomment-2125600130 Superseded by #10367. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.25.50 to 2.25.55 [iceberg]

2024-05-22 Thread via GitHub
dependabot[bot] closed pull request #10355: Build: Bump software.amazon.awssdk:bom from 2.25.50 to 2.25.55 URL: https://github.com/apache/iceberg/pull/10355 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Build: Bump software.amazon.awssdk:bom from 2.25.50 to 2.25.57 [iceberg]

2024-05-22 Thread via GitHub
dependabot[bot] opened a new pull request, #10367: URL: https://github.com/apache/iceberg/pull/10367 Bumps software.amazon.awssdk:bom from 2.25.50 to 2.25.57. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=soft

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.25.50 to 2.25.55 [iceberg]

2024-05-22 Thread via GitHub
Fokko commented on PR #10355: URL: https://github.com/apache/iceberg/pull/10355#issuecomment-2125598841 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] REST: convert RESTException to CommitStateUnknownException to avoid incorrect cleanup of metadata files due to network error [iceberg]

2024-05-22 Thread via GitHub
stevenzwu commented on PR #10366: URL: https://github.com/apache/iceberg/pull/10366#issuecomment-2125463620 Actually may need to investigate this a little more. When change the code back to RESTException, metadata files weren't cleaned up. So this may not be the real problem. -- This is

Re: [PR] View: add property to describe advisory read mode [iceberg]

2024-05-22 Thread via GitHub
jackye1995 commented on PR #10362: URL: https://github.com/apache/iceberg/pull/10362#issuecomment-2125460867 > It feels like this is less an attribute of the view definition and more of a decision based on policy yes that is correct. This is a type of access decision, and it is mentio

[I] Flink: Make Hadoop an optional dependency [iceberg]

2024-05-22 Thread via GitHub
Fokko opened a new issue, #7332: URL: https://github.com/apache/iceberg/issues/7332 ### Feature Request / Improvement Playing around with `pyflink` and noticed that the Hadoop dependency is required when using the REST catalog: ```python ➜ ~ python3.9

[PR] REST: convert RESTException to CommitStateUnknownException to avoid incorrect cleanup of metadata files due to network error [iceberg]

2024-05-22 Thread via GitHub
stevenzwu opened a new pull request, #10366: URL: https://github.com/apache/iceberg/pull/10366 Otherwise, network I/O exception can lead to incorrect cleanup of metadata files (like snapshot) when the commit on the REST server side completed successfully. -- This is an automated message

Re: [PR] View: add property to describe advisory read mode [iceberg]

2024-05-22 Thread via GitHub
danielcweeks commented on PR #10362: URL: https://github.com/apache/iceberg/pull/10362#issuecomment-2125327982 @jackye1995 it seems like the mode would be based on the policy applied to the subject performing the query. For example, if someone has full access to the data, all optimizations

Re: [I] Is there any way on Flink to read newly appended data only (NOT in current Iceberg table snapshot)? [iceberg]

2024-05-22 Thread via GitHub
vmaksimenko commented on issue #9955: URL: https://github.com/apache/iceberg/issues/9955#issuecomment-2125315703 I think the keyword for you is a checkpoint of your Flink application. If you are reading in streaming mode and restarting your application then restart it from the latest checkp

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-05-22 Thread via GitHub
marvinlanhenke commented on code in PR #363: URL: https://github.com/apache/iceberg-rust/pull/363#discussion_r1610308227 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -0,0 +1,819 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-05-22 Thread via GitHub
marvinlanhenke commented on code in PR #363: URL: https://github.com/apache/iceberg-rust/pull/363#discussion_r1610308227 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -0,0 +1,819 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-05-22 Thread via GitHub
marvinlanhenke commented on code in PR #363: URL: https://github.com/apache/iceberg-rust/pull/363#discussion_r1610294455 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -0,0 +1,819 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-05-22 Thread via GitHub
marvinlanhenke commented on code in PR #363: URL: https://github.com/apache/iceberg-rust/pull/363#discussion_r1610294455 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -0,0 +1,819 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-05-22 Thread via GitHub
marvinlanhenke commented on code in PR #363: URL: https://github.com/apache/iceberg-rust/pull/363#discussion_r1610284297 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -0,0 +1,819 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-05-22 Thread via GitHub
marvinlanhenke commented on code in PR #363: URL: https://github.com/apache/iceberg-rust/pull/363#discussion_r1610241256 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -0,0 +1,819 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-05-22 Thread via GitHub
marvinlanhenke commented on code in PR #363: URL: https://github.com/apache/iceberg-rust/pull/363#discussion_r1610241256 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -0,0 +1,819 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [I] PyIceberg Near-Term Roadmap [iceberg-python]

2024-05-22 Thread via GitHub
tusharchou commented on issue #736: URL: https://github.com/apache/iceberg-python/issues/736#issuecomment-2125031765 @kevinjqliu @Fokko where would something like https://github.com/apache/iceberg-python/issues/402 go? -- This is an automated message from the Apache Git Service. To respon

Re: [PR] docs: installation of the new `iceberg_catalog_rest` added to the docs [iceberg-rust]

2024-05-22 Thread via GitHub
liurenjie1024 commented on PR #355: URL: https://github.com/apache/iceberg-rust/pull/355#issuecomment-2124977027 cc @nishant-sachdeva Are you still working on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] feat: Implement data file metrics evaluator to prune data files using filter. [iceberg-rust]

2024-05-22 Thread via GitHub
liurenjie1024 closed issue #152: feat: Implement data file metrics evaluator to prune data files using filter. URL: https://github.com/apache/iceberg-rust/issues/152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] feat: Implement data file metrics evaluator to prune data files using filter. [iceberg-rust]

2024-05-22 Thread via GitHub
liurenjie1024 commented on issue #152: URL: https://github.com/apache/iceberg-rust/issues/152#issuecomment-2124972687 > @liurenjie1024 this is now done, since #347 got merged. Thanks @sdd 's effort. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] feat: add `ExpressionEvaluator` [iceberg-rust]

2024-05-22 Thread via GitHub
liurenjie1024 commented on code in PR #363: URL: https://github.com/apache/iceberg-rust/pull/363#discussion_r1610110676 ## crates/iceberg/src/expr/visitors/expression_evaluator.rs: ## @@ -0,0 +1,819 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] Improve CLI Text by Adding Verbose Text for Commands [iceberg-go]

2024-05-22 Thread via GitHub
zeroshade commented on code in PR #68: URL: https://github.com/apache/iceberg-go/pull/68#discussion_r1610113609 ## cmd/iceberg/main.go: ## @@ -34,16 +34,21 @@ import ( const usage = `iceberg. Usage: - iceberg list [options] [PARENT] - iceberg describe [options] [namespace

Re: [PR] View: add property to describe advisory read mode [iceberg]

2024-05-22 Thread via GitHub
jackye1995 commented on PR #10362: URL: https://github.com/apache/iceberg/pull/10362#issuecomment-2124943243 Here is another doc in BigQuery explaining possible attacks: https://cloud.google.com/bigquery/docs/best-practices-row-level-security#limit-side-channel-attacks And to enforce

Re: [PR] View: add property to describe advisory read mode [iceberg]

2024-05-22 Thread via GitHub
jackye1995 commented on PR #10362: URL: https://github.com/apache/iceberg/pull/10362#issuecomment-2124810239 Pushdown might be not a good word to use here, if there is a better wording let me know. The Snowflake doc provides a good case: https://docs.snowflake.com/en/user-guide/view

Re: [PR] Support creating tags by adding `set_ref_snapshot` API [iceberg-python]

2024-05-22 Thread via GitHub
chinmay-bhat commented on PR #728: URL: https://github.com/apache/iceberg-python/pull/728#issuecomment-2124649841 Hi @HonahX , I've updated the PR based on your suggestions. I've made `set_ref_snapshot()` protected and created an inner class that takes care of snapshot management operations

Re: [PR] support python 3.12 [iceberg-python]

2024-05-22 Thread via GitHub
MehulBatra commented on PR #254: URL: https://github.com/apache/iceberg-python/pull/254#issuecomment-2124481627 PR being worked on for Ray 3.12: https://github.com/ray-project/ray/issues/45477 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] Support Snapshot Management Operations [iceberg-python]

2024-05-22 Thread via GitHub
chinmay-bhat commented on issue #737: URL: https://github.com/apache/iceberg-python/issues/737#issuecomment-2124351316 @Honah thank you for your response. I agree that we should hide the `set_ref_snapshot` from the public API. I also like the idea of creating a `ManageSnapshots` inner cl

Re: [PR] feat: Add equality delete writer [iceberg-rust]

2024-05-22 Thread via GitHub
Dysprosium0626 commented on code in PR #372: URL: https://github.com/apache/iceberg-rust/pull/372#discussion_r1609490106 ## crates/iceberg/src/writer/base_writer/equality_delete_writer.rs: ## @@ -0,0 +1,438 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [I] Error reading version hint file [iceberg]

2024-05-22 Thread via GitHub
BhavanaRK17 commented on issue #7537: URL: https://github.com/apache/iceberg/issues/7537#issuecomment-2124095454 Did you find a solution to this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Implement BoundPredicateVisitor trait for ManifestFilterVisitor [iceberg-rust]

2024-05-22 Thread via GitHub
sdd commented on code in PR #367: URL: https://github.com/apache/iceberg-rust/pull/367#discussion_r1609404309 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -103,98 +106,245 @@ impl BoundPredicateVisitor for ManifestFilterVisitor<'_> { reference: &Bou