Re: [PR] Add support for orc format [iceberg-python]

2024-06-05 Thread via GitHub
MehulBatra commented on PR #790: URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2149033357 Hi @Fokko and @HonahX ✅ I have modified the read logic to read the orc file-based iceberg table and wrote an integration test too it is working great. Would love Some g

Re: [PR] Build: Bump Hive 2.3.10 [iceberg]

2024-06-05 Thread via GitHub
pan3793 commented on PR #10447: URL: https://github.com/apache/iceberg/pull/10447#issuecomment-2149064436 Looks like there are issues about Jackson deps, and I'm surprised that `testImplementation enforcedPlatform(libs.jackson212.bom)` does not work ... maybe I need some help from a Gradle

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
nastra commented on code in PR #10442: URL: https://github.com/apache/iceberg/pull/10442#discussion_r1627163414 ## spark/v3.4/spark-runtime/src/integration/java/org/apache/iceberg/spark/SmokeTest.java: ## @@ -165,6 +167,14 @@ public void testCreateTable() { Assert.assertEqu

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
nastra commented on code in PR #10442: URL: https://github.com/apache/iceberg/pull/10442#discussion_r1627163873 ## spark/v3.5/spark-runtime/src/integration/java/org/apache/iceberg/spark/SmokeTest.java: ## @@ -174,6 +176,14 @@ public void testCreateTable() { .hasSize(3);

Re: [PR] support python 3.12 [iceberg-python]

2024-06-05 Thread via GitHub
MehulBatra commented on PR #254: URL: https://github.com/apache/iceberg-python/pull/254#issuecomment-2149132137 > FYI: [Ray issue 45477](https://github.com/ray-project/ray/issues/45477) was recently completed. Thank you so much for the update! @pdpark Soon there will be a release

Re: [PR] Core: Add truncate table API and support fast truncate table [iceberg]

2024-06-05 Thread via GitHub
smallx closed pull request #3844: Core: Add truncate table API and support fast truncate table URL: https://github.com/apache/iceberg/pull/3844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Core: Add truncate table API and support fast truncate table [iceberg]

2024-06-05 Thread via GitHub
smallx commented on PR #3844: URL: https://github.com/apache/iceberg/pull/3844#issuecomment-2149181290 close this outdated pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Rest Catalog - Bug in list namespaces. Namespaces not underneath the parent namespace are returned [iceberg]

2024-06-05 Thread via GitHub
ajantha-bhat commented on issue #10443: URL: https://github.com/apache/iceberg/issues/10443#issuecomment-2149541265 Iceberg doesn't maintain an implementation of the REST catalog server. Only spec is defined at the Iceberg repo. Spec clearly says, listNamespaces should list the namespaces

Re: [PR] Build: Bump Hive 2.3.10 [iceberg]

2024-06-05 Thread via GitHub
nastra commented on code in PR #10447: URL: https://github.com/apache/iceberg/pull/10447#discussion_r1627619028 ## mr/build.gradle: ## @@ -68,8 +67,7 @@ project(':iceberg-mr') { testImplementation libs.avro.avro testImplementation libs.calcite.core testImplementat

Re: [PR] Build: Clean up Jackson dependency usages [iceberg]

2024-06-05 Thread via GitHub
nastra commented on code in PR #10448: URL: https://github.com/apache/iceberg/pull/10448#discussion_r1627650274 ## gradle/libs.versions.toml: ## @@ -142,8 +142,9 @@ hive3-service = { module = "org.apache.hive:hive-service", version.ref = "hive3" httpcomponents-httpclient5 = {

Re: [I] Rest Catalog - Bug in list namespaces. Namespaces not underneath the parent namespace are returned [iceberg]

2024-06-05 Thread via GitHub
jurossiar commented on issue #10443: URL: https://github.com/apache/iceberg/issues/10443#issuecomment-2149695708 We are using tabulario: https://hub.docker.com/r/tabulario/iceberg-rest/tags I've just took a look in the [repo](https://github.com/tabular-io/iceberg-rest-image) and in a firs

Re: [PR] Build: Bump Hive 2.3.10 [iceberg]

2024-06-05 Thread via GitHub
nastra commented on code in PR #10447: URL: https://github.com/apache/iceberg/pull/10447#discussion_r1627683233 ## mr/build.gradle: ## @@ -68,8 +67,7 @@ project(':iceberg-mr') { testImplementation libs.avro.avro testImplementation libs.calcite.core testImplementat

Re: [PR] Introduces the new IcebergSink based on the new V2 Flink Sink Abstraction [iceberg]

2024-06-05 Thread via GitHub
rodmeneses commented on code in PR #10179: URL: https://github.com/apache/iceberg/pull/10179#discussion_r1627770495 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java: ## @@ -0,0 +1,767 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2024-06-05 Thread via GitHub
nastra commented on code in PR #9008: URL: https://github.com/apache/iceberg/pull/9008#discussion_r1627706644 ## api/src/main/java/org/apache/iceberg/util/DateTimeUtil.java: ## @@ -75,6 +77,14 @@ public static long microsToMillis(long micros) { return Math.floorDiv(micros,

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
huaxingao commented on code in PR #10442: URL: https://github.com/apache/iceberg/pull/10442#discussion_r1627947147 ## spark/v3.4/spark-runtime/src/integration/java/org/apache/iceberg/spark/SmokeTest.java: ## @@ -165,6 +167,14 @@ public void testCreateTable() { Assert.assert

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
nastra commented on PR #10442: URL: https://github.com/apache/iceberg/pull/10442#issuecomment-2150287932 @huaxingao could you also please add the below diff to `TestViews.showViews()` ``` --- a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.jav

Re: [PR] Add support for orc format [iceberg-python]

2024-06-05 Thread via GitHub
MehulBatra commented on PR #790: URL: https://github.com/apache/iceberg-python/pull/790#issuecomment-2150324091 I believe we need to make a change here to support ORC writes, please correct me if I am pointing towards the wrong direction https://github.com/apache/iceberg-python/blob/a11

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
huaxingao commented on PR #10442: URL: https://github.com/apache/iceberg/pull/10442#issuecomment-2150333597 > could you also please add the below diff to TestViews.showViews() Added. Thanks for the suggestion! @nastra -- This is an automated message from the Apache Git Service. To

Re: [PR] Build: Clean up Jackson dependency usages [iceberg]

2024-06-05 Thread via GitHub
amogh-jahagirdar merged PR #10448: URL: https://github.com/apache/iceberg/pull/10448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
nastra commented on code in PR #10442: URL: https://github.com/apache/iceberg/pull/10442#discussion_r1627997817 ## gradle.properties: ## @@ -20,7 +20,7 @@ systemProp.defaultFlinkVersions=1.19 systemProp.knownFlinkVersions=1.17,1.18,1.19 systemProp.defaultHiveVersions=2 system

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
nastra commented on code in PR #10442: URL: https://github.com/apache/iceberg/pull/10442#discussion_r1627998403 ## gradle.properties: ## @@ -20,7 +20,7 @@ systemProp.defaultFlinkVersions=1.19 systemProp.knownFlinkVersions=1.17,1.18,1.19 systemProp.defaultHiveVersions=2 system

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
huaxingao commented on code in PR #10442: URL: https://github.com/apache/iceberg/pull/10442#discussion_r1628012768 ## gradle.properties: ## @@ -20,7 +20,7 @@ systemProp.defaultFlinkVersions=1.19 systemProp.knownFlinkVersions=1.17,1.18,1.19 systemProp.defaultHiveVersions=2 sys

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-05 Thread via GitHub
stevenzwu commented on code in PR #10331: URL: https://github.com/apache/iceberg/pull/10331#discussion_r1628013861 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/AggregatedStatisticsTracker.java: ## @@ -30,104 +42,225 @@ * {@link AggregatedStatistics}

[PR] Hive: Return new scan after applying column project parameter [iceberg]

2024-06-05 Thread via GitHub
zhangbutao opened a new pull request, #10449: URL: https://github.com/apache/iceberg/pull/10449 1. Shoud return a new scan applying column project parameter, Otherwise it doesn't make any sense 2. scan.project() and scan.select() can not be specified at the same time Check this:

Re: [PR] Spark: Add CopyTable spark action [iceberg]

2024-06-05 Thread via GitHub
laithalzyoud commented on PR #10024: URL: https://github.com/apache/iceberg/pull/10024#issuecomment-2150442517 > @laithalzyoud : Are you planning to address the comments on this? This feature is definitely useful. If not, I would like to take it up. Hey @ajantha-bhat! Yes I'm planning

[I] Storage partitioned joined fails when >2 tables are joined [iceberg]

2024-06-05 Thread via GitHub
mrbrahman opened a new issue, #10450: URL: https://github.com/apache/iceberg/issues/10450 ### Apache Iceberg version None ### Query engine Spark ### Please describe the bug 🐞 SPJ works great when joining 2 tables. For e.g. ~~~scala // SPJ setup

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-05 Thread via GitHub
stevenzwu merged PR #10331: URL: https://github.com/apache/iceberg/pull/10331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] Implement Other Filesystems Using Go CDK [iceberg-go]

2024-06-05 Thread via GitHub
srilman commented on issue #92: URL: https://github.com/apache/iceberg-go/issues/92#issuecomment-2150580725 @zeroshade I think I have something working on my end, once I get a green-light happy to open a PR. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
szehon-ho merged PR #10442: URL: https://github.com/apache/iceberg/pull/10442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
szehon-ho commented on PR #10442: URL: https://github.com/apache/iceberg/pull/10442#issuecomment-2150596338 Thanks @huaxingao for the pr, @nastra @viirya for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] SHOW VIEWS failed with AssertionError [iceberg]

2024-06-05 Thread via GitHub
huaxingao commented on PR #10442: URL: https://github.com/apache/iceberg/pull/10442#issuecomment-2150600734 Thanks, everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Flink: refactor sink shuffling statistics collection [iceberg]

2024-06-05 Thread via GitHub
stevenzwu commented on PR #10331: URL: https://github.com/apache/iceberg/pull/10331#issuecomment-2150614552 thanks @pvary and @yegangy0718 for the code review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Core, Parquet, ORC: Don't write column sizes when metrics mode is None [iceberg]

2024-06-05 Thread via GitHub
amogh-jahagirdar commented on PR #10440: URL: https://github.com/apache/iceberg/pull/10440#issuecomment-2150669744 Thanks for the reviews @szehon-ho @nastra @danielcweeks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Core, Parquet, ORC: Don't write column sizes when metrics mode is None [iceberg]

2024-06-05 Thread via GitHub
amogh-jahagirdar merged PR #10440: URL: https://github.com/apache/iceberg/pull/10440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Open-API: TableRequirements should use union of subclasses [iceberg]

2024-06-05 Thread via GitHub
anuragmantri commented on code in PR #10434: URL: https://github.com/apache/iceberg/pull/10434#discussion_r1628224793 ## open-api/rest-catalog-open-api.yaml: ## @@ -2597,12 +2598,15 @@ components: assert-last-assigned-partition-id: '#/components/schemas/AssertLastAss

Re: [PR] Hive: Return new scan after applying column project parameter [iceberg]

2024-06-05 Thread via GitHub
pvary commented on PR #10449: URL: https://github.com/apache/iceberg/pull/10449#issuecomment-2150699009 @zhangbutao: Could you please provide a test case which fails without the patch and runs correctly with the patch? -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Open-API: TableRequirements should use union of subclasses [iceberg]

2024-06-05 Thread via GitHub
flyrain commented on code in PR #10434: URL: https://github.com/apache/iceberg/pull/10434#discussion_r1628249233 ## open-api/rest-catalog-open-api.yaml: ## @@ -2597,12 +2598,15 @@ components: assert-last-assigned-partition-id: '#/components/schemas/AssertLastAssigned

Re: [I] Improve remove_orphan_files performance by using "inventory listing" [iceberg]

2024-06-05 Thread via GitHub
anuragmantri commented on issue #10426: URL: https://github.com/apache/iceberg/issues/10426#issuecomment-2150831293 Hi @ajantha-bhat - Don't we already support this after https://github.com/apache/iceberg/pull/4503? -- This is an automated message from the Apache Git Service. To respond

[PR] Bump mypy-boto3-glue from 1.34.115 to 1.34.120 [iceberg-python]

2024-06-05 Thread via GitHub
dependabot[bot] opened a new pull request, #797: URL: https://github.com/apache/iceberg-python/pull/797 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.34.115 to 1.34.120. Commits See full diff in https://github.com/youtype/mypy_boto3_builder/commi

Re: [I] Running iceberg with spark 3 in local mode [iceberg]

2024-06-05 Thread via GitHub
github-actions[bot] commented on issue #2176: URL: https://github.com/apache/iceberg/issues/2176#issuecomment-2151153089 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Core: Reword exception message in RewriteManifests [iceberg]

2024-06-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #10446: URL: https://github.com/apache/iceberg/pull/10446#discussion_r1628610685 ## core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java: ## @@ -275,14 +275,17 @@ private boolean matchesPredicate(ManifestFile manifest) { r

Re: [PR] Spark Action to Analyze table [iceberg]

2024-06-05 Thread via GitHub
szehon-ho commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1628626976 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/NDVSketchGenerator.java: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Spark Action to Analyze table [iceberg]

2024-06-05 Thread via GitHub
szehon-ho commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1626761248 ## api/src/main/java/org/apache/iceberg/actions/AnalyzeTable.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [PR] remove static import from `SmokeTest` [iceberg]

2024-06-05 Thread via GitHub
szehon-ho commented on PR #10451: URL: https://github.com/apache/iceberg/pull/10451#issuecomment-2151241691 Thanks, this fails 'checkstyleIntegration' which for some reason wasnt run in the initial commit. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] remove static import from `SmokeTest` [iceberg]

2024-06-05 Thread via GitHub
szehon-ho merged PR #10451: URL: https://github.com/apache/iceberg/pull/10451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] remove static import from `SmokeTest` [iceberg]

2024-06-05 Thread via GitHub
huaxingao commented on PR #10451: URL: https://github.com/apache/iceberg/pull/10451#issuecomment-2151247179 Thanks @szehon-ho -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Incremental Append Scan [iceberg-python]

2024-06-05 Thread via GitHub
chinmay-bhat commented on code in PR #533: URL: https://github.com/apache/iceberg-python/pull/533#discussion_r1628636085 ## pyiceberg/table/__init__.py: ## @@ -1754,6 +1788,134 @@ def to_arrow(self) -> pa.Table: def to_pandas(self, **kwargs: Any) -> pd.DataFrame: r

Re: [PR] Incremental Append Scan [iceberg-python]

2024-06-05 Thread via GitHub
chinmay-bhat commented on code in PR #533: URL: https://github.com/apache/iceberg-python/pull/533#discussion_r1628643760 ## pyiceberg/table/__init__.py: ## @@ -1754,6 +1788,134 @@ def to_arrow(self) -> pa.Table: def to_pandas(self, **kwargs: Any) -> pd.DataFrame: r

[I] Iceberg supports binlog logs [iceberg]

2024-06-05 Thread via GitHub
smileyboy2019 opened a new issue, #10452: URL: https://github.com/apache/iceberg/issues/10452 ### Feature Request / Improvement When doing real-time data warehousing, it is possible to achieve CDC to Iceberg from the source. How can the data written to Iceberg be perceived by Flink o

Re: [I] Support writing to a table with sort-order [iceberg-python]

2024-06-05 Thread via GitHub
vinjai commented on issue #271: URL: https://github.com/apache/iceberg-python/issues/271#issuecomment-2151365155 Hi @Fokko I would like to give a shot at this if no one has already taken it. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Core: Throw CommitStateUnknownException if RuntimeException that is not marked as cleanable is thrown [iceberg]

2024-06-05 Thread via GitHub
sumedhsakdeo commented on code in PR #10373: URL: https://github.com/apache/iceberg/pull/10373#discussion_r1628766388 ## core/src/main/java/org/apache/iceberg/BaseTransaction.java: ## @@ -333,6 +333,8 @@ private void commitCreateTransaction() { // the commit failed and no

Re: [PR] Spark: Add SparkSQLProperty to control split-size [iceberg]

2024-06-05 Thread via GitHub
sumedhsakdeo commented on PR #10336: URL: https://github.com/apache/iceberg/pull/10336#issuecomment-2151411434 Thanks @szehon-ho appreciate your PR https://github.com/apache/spark/pull/46707 Could you suggest a recommendation for this PR? Will your support for options in Spark SQL b

Re: [PR] Hive: Return new scan after applying column project parameter [iceberg]

2024-06-05 Thread via GitHub
zhangbutao commented on PR #10449: URL: https://github.com/apache/iceberg/pull/10449#issuecomment-2151496203 > @zhangbutao: Could you please provide a test case which fails without the patch and runs correctly with the patch? @pvary Thanks for you comment. Actually, there are no faile