Re: [PR] [Spec] Add Iceberg Materialized View Spec [iceberg]

2024-05-20 Thread via GitHub
wmoustafa commented on PR #10280: URL: https://github.com/apache/iceberg/pull/10280#issuecomment-2121885795 > @wmoustafa how is this linked to #10043? This addresses the spec aspect of #10043 using the property model. More context and discussion in [this dev list thread](https://lis

Re: [I] [Proposal] Iceberg Materialized View Spec [iceberg]

2024-05-20 Thread via GitHub
JanKaul closed issue #6420: [Proposal] Iceberg Materialized View Spec URL: https://github.com/apache/iceberg/issues/6420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[PR] Create rollback and set snapshot APIs [iceberg-python]

2024-05-20 Thread via GitHub
chinmay-bhat opened a new pull request, #758: URL: https://github.com/apache/iceberg-python/pull/758 Creates APIs supported in Spark for snapshot manageent operations Relevant issue - #737 PR depends on #728 and #748. Ready to review once they are merged. - [x] create APIs

Re: [PR] [Spec] Add Iceberg Materialized View Spec [iceberg]

2024-05-20 Thread via GitHub
manuzhang commented on PR #10280: URL: https://github.com/apache/iceberg/pull/10280#issuecomment-2121832322 @wmoustafa how is this linked to #10043? Also, the doc [3] doesn't exist for me. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] modify doc(backward compatibility) typo [iceberg-python]

2024-05-20 Thread via GitHub
SeungyeopShin opened a new pull request, #757: URL: https://github.com/apache/iceberg-python/pull/757 The envrionment variable for backward compatibility have to be `PYICEBERG_LEGACY_CURRENT_SNAPSHOT_ID` not `LEGACY_CURRENT_SNAPSHOT_ID` -- This is an automated message from the Apache Git

Re: [I] check-ordering enablement for flink config [iceberg]

2024-05-20 Thread via GitHub
lei-xian0 commented on issue #10360: URL: https://github.com/apache/iceberg/issues/10360#issuecomment-2121768545 I see, so the flink sinks just doesn't support any schema change. Wondering is this something on the current road map and we can expect in the foreseeable future? -- This is

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-05-20 Thread via GitHub
pvary commented on code in PR #10179: URL: https://github.com/apache/iceberg/pull/10179#discussion_r1607625268 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java: ## @@ -0,0 +1,780 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [I] check-ordering enablement for flink config [iceberg]

2024-05-20 Thread via GitHub
pvary commented on issue #10360: URL: https://github.com/apache/iceberg/issues/10360#issuecomment-2121731857 You could create an operator to check for schema changes and fail the job if needed -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] View: add property to describe advisory read mode [iceberg]

2024-05-20 Thread via GitHub
jackye1995 opened a new pull request, #10362: URL: https://github.com/apache/iceberg/pull/10362 This is basically the concept of secure view, but separated into different levels based on our experience of implementing it in EMR Spark. I also considered making this a spec change, but using a

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2024-05-20 Thread via GitHub
zinking commented on PR #8797: URL: https://github.com/apache/iceberg/pull/8797#issuecomment-2121627469 ping @nastra for another review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: Add equality delete writer [iceberg-rust]

2024-05-20 Thread via GitHub
ZENOTME commented on code in PR #372: URL: https://github.com/apache/iceberg-rust/pull/372#discussion_r1607504466 ## crates/iceberg/src/writer/base_writer/equality_delete_writer.rs: ## @@ -0,0 +1,438 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [I] Spark rewrite Files Action OOM [iceberg]

2024-05-20 Thread via GitHub
pdames commented on issue #10054: URL: https://github.com/apache/iceberg/issues/10054#issuecomment-2121567721 Any updates here @Zhanxiao-Ma? Would love to take a look at what you've implemented if you've got a pending PR to link back to this issue, and see if there's an opportunity to work

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-05-20 Thread via GitHub
jtchen-study commented on code in PR #10179: URL: https://github.com/apache/iceberg/pull/10179#discussion_r1607483056 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java: ## @@ -0,0 +1,780 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[PR] Build: Bump pypa/cibuildwheel from 2.18.0 to 2.18.1 [iceberg-python]

2024-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #756: URL: https://github.com/apache/iceberg-python/pull/756 Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.18.0 to 2.18.1. Release notes Sourced from https://github.com/pypa/cibuildwheel/releases";>pypa/cibuildwhee

[PR] Build: Bump requests from 2.31.0 to 2.32.1 [iceberg-python]

2024-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #755: URL: https://github.com/apache/iceberg-python/pull/755 Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.1. Release notes Sourced from https://github.com/psf/requests/releases";>requests's releases. v2.32.0

Re: [PR] Build: Bump mkdocs-material from 9.5.22 to 9.5.23 [iceberg-python]

2024-05-20 Thread via GitHub
dependabot[bot] closed pull request #747: Build: Bump mkdocs-material from 9.5.22 to 9.5.23 URL: https://github.com/apache/iceberg-python/pull/747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Build: Bump mkdocs-material from 9.5.22 to 9.5.23 [iceberg-python]

2024-05-20 Thread via GitHub
dependabot[bot] commented on PR #747: URL: https://github.com/apache/iceberg-python/pull/747#issuecomment-2121320382 Superseded by #754. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] Build: Bump mkdocs-material from 9.5.22 to 9.5.24 [iceberg-python]

2024-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #754: URL: https://github.com/apache/iceberg-python/pull/754 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.22 to 9.5.24. Release notes Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mk

Re: [PR] Add ManifestFile Stats in snapshot summary. [iceberg]

2024-05-20 Thread via GitHub
findepi commented on code in PR #10246: URL: https://github.com/apache/iceberg/pull/10246#discussion_r1607253850 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -156,6 +156,8 @@ public List apply(TableMetadata base, Snapshot snapshot) { manifests.addAll(s

Re: [I] check-ordering enablement for flink config [iceberg]

2024-05-20 Thread via GitHub
pvary commented on issue #10360: URL: https://github.com/apache/iceberg/issues/10360#issuecomment-2121152930 Currently Flink Sink is not able to handle the schema updates. You need to restart the job to handle schema changes. One hacky solution is to throw a `SuppressRestartsException` ex

[I] check-ordering enablement for flink config [iceberg]

2024-05-20 Thread via GitHub
lei-xian0 opened a new issue, #10360: URL: https://github.com/apache/iceberg/issues/10360 ### Feature Request / Improvement Hi team, can we get `check-ordering` config enabled for Flink writers as well? Currently the input is not tolerating schema order change compare with table sch

Re: [PR] Spec: Add context query parameter for all REST APIs [iceberg]

2024-05-20 Thread via GitHub
danielcweeks commented on PR #10359: URL: https://github.com/apache/iceberg/pull/10359#issuecomment-2121093565 I think my main concern with this is that it just feels like a way to workaround the spec. Even some of the examples are things that should be well defined (e.g. engine). I

[PR] Spec: Add context query parameter for all REST APIs [iceberg]

2024-05-20 Thread via GitHub
jackye1995 opened a new pull request, #10359: URL: https://github.com/apache/iceberg/pull/10359 This PR proposes adding a `context` query parameter for all requests. This was briefly described as something desirable in https://docs.google.com/document/d/14nmuxxfzQsYo59o0Fbpb-pxOlzS6bV

Re: [PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2024-05-20 Thread via GitHub
jacobmarble commented on PR #9008: URL: https://github.com/apache/iceberg/pull/9008#issuecomment-2121078774 @nastra is this a good week for you to review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Encryption integration and test [iceberg]

2024-05-20 Thread via GitHub
RussellSpitzer commented on code in PR #5544: URL: https://github.com/apache/iceberg/pull/5544#discussion_r1607047529 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -137,17 +162,88 @@ protected String tableName() { @Override publi

Re: [PR] [Spec] Add Iceberg Materialized View Spec [iceberg]

2024-05-20 Thread via GitHub
szehon-ho commented on code in PR #10280: URL: https://github.com/apache/iceberg/pull/10280#discussion_r1607015368 ## format/materialized-view-spec.md: ## @@ -0,0 +1,132 @@ + + +# Iceberg Materialized View Spec + +## Background and Motivation +Iceberg views are a powerful tool t

Re: [PR] [Spec] Add Iceberg Materialized View Spec [iceberg]

2024-05-20 Thread via GitHub
szehon-ho commented on code in PR #10280: URL: https://github.com/apache/iceberg/pull/10280#discussion_r1607033800 ## format/materialized-view-spec.md: ## @@ -0,0 +1,132 @@ + + +# Iceberg Materialized View Spec + +## Background and Motivation +Iceberg views are a powerful tool t

Re: [PR] [Spec] Add Iceberg Materialized View Spec [iceberg]

2024-05-20 Thread via GitHub
szehon-ho commented on code in PR #10280: URL: https://github.com/apache/iceberg/pull/10280#discussion_r1607015368 ## format/materialized-view-spec.md: ## @@ -0,0 +1,132 @@ + + +# Iceberg Materialized View Spec + +## Background and Motivation +Iceberg views are a powerful tool t

Re: [PR] Core: Check compatibility of all partition specs when updating schema [iceberg]

2024-05-20 Thread via GitHub
manuzhang closed pull request #10261: Core: Check compatibility of all partition specs when updating schema URL: https://github.com/apache/iceberg/pull/10261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Spark Action to Analyze table [iceberg]

2024-05-20 Thread via GitHub
karuppayya commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1606968313 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/NDVSketchGenerator.java: ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] HADOOP-18679. Add API for bulk/paged object deletion: Iceberg PoC [iceberg]

2024-05-20 Thread via GitHub
steveloughran commented on code in PR #10233: URL: https://github.com/apache/iceberg/pull/10233#discussion_r1606964288 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java: ## @@ -166,23 +178,106 @@ public void deletePrefix(String prefix) { @Override public vo

Re: [PR] Spark Action to Analyze table [iceberg]

2024-05-20 Thread via GitHub
karuppayya commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1606962975 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/AnalyzeTableSparkAction.java: ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Add ManifestFile Stats in snapshot summary. [iceberg]

2024-05-20 Thread via GitHub
ajantha-bhat commented on code in PR #10246: URL: https://github.com/apache/iceberg/pull/10246#discussion_r1594277743 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -180,8 +181,10 @@ public void testBinPackUnpartitionedT

Re: [PR] AWS: add retry logic to S3InputStream [iceberg]

2024-05-20 Thread via GitHub
puchengy commented on PR #4912: URL: https://github.com/apache/iceberg/pull/4912#issuecomment-2120650709 @jackye1995 @xiaoxuandev @danielcweeks @rdblue @amogh-jahagirdar @nastra appreciate your feedback on this, thank you. -- This is an automated message from the Apache Git Service. To re

Re: [PR] [Spec] Add Iceberg Materialized View Spec [iceberg]

2024-05-20 Thread via GitHub
jackye1995 commented on code in PR #10280: URL: https://github.com/apache/iceberg/pull/10280#discussion_r1606896688 ## format/materialized-view-spec.md: ## @@ -0,0 +1,132 @@ + + +# Iceberg Materialized View Spec + +## Background and Motivation +Iceberg views are a powerful tool

Re: [PR] [Spec] Add Iceberg Materialized View Spec [iceberg]

2024-05-20 Thread via GitHub
jackye1995 commented on PR #10280: URL: https://github.com/apache/iceberg/pull/10280#issuecomment-2120624728 > We have a separate properties page for Spark configurations: https://iceberg.apache.org/docs/1.5.0/configuration/. I see, if we conclude that we want to go with the propertie

[PR] HA and kerberos HMS support [iceberg-python]

2024-05-20 Thread via GitHub
awdavidson opened a new pull request, #752: URL: https://github.com/apache/iceberg-python/pull/752 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2024-05-20 Thread via GitHub
amit-cloudinary commented on PR #8797: URL: https://github.com/apache/iceberg/pull/8797#issuecomment-2120206876 any updates on this ?? is this something thats going to be merged soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] AWS: Support S3 DSSE-KMS encryption [iceberg]

2024-05-20 Thread via GitHub
aajisaka commented on PR #8370: URL: https://github.com/apache/iceberg/pull/8370#issuecomment-2119935002 Rebased for the latest main branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] The decimal data type is transformed after the data is inserted. [iceberg-python]

2024-05-20 Thread via GitHub
as10128 opened a new issue, #751: URL: https://github.com/apache/iceberg-python/issues/751 ### Apache Iceberg version 0.6.0 (latest release) ### Please describe the bug 🐞 Version: Pyiceberg 0.6.1 I create a table, there are multiple columns of type decimal, decim

Re: [I] partiallyClusteredDistribution returns duplicate rows from GROUP BY [iceberg]

2024-05-20 Thread via GitHub
kbjorklu closed issue #10357: partiallyClusteredDistribution returns duplicate rows from GROUP BY URL: https://github.com/apache/iceberg/issues/10357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] partiallyClusteredDistribution returns duplicate rows from GROUP BY [iceberg]

2024-05-20 Thread via GitHub
kbjorklu commented on issue #10357: URL: https://github.com/apache/iceberg/issues/10357#issuecomment-2119885482 Looks like this is a Spark issue fixed in 3.4.2: https://issues.apache.org/jira/browse/SPARK-44641 -> closing. -- This is an automated message from the Apache Git Service. To re

[I] partiallyClusteredDistribution returns duplicate rows from GROUP BY [iceberg]

2024-05-20 Thread via GitHub
kbjorklu opened a new issue, #10357: URL: https://github.com/apache/iceberg/issues/10357 ### Apache Iceberg version 1.5.2 (latest release) ### Query engine Spark ### Please describe the bug 🐞 The following code gives `AssertionError: want 1000 rows, got 2000