Re: [I] Replace parquet metadata thrift version with in memory version. [iceberg-rust]

2025-03-11 Thread via GitHub
jonathanc-n commented on issue #1004: URL: https://github.com/apache/iceberg-rust/issues/1004#issuecomment-2714985152 Thanks for that! I'll look into it later today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Adds AWS to vendors page [iceberg]

2025-03-11 Thread via GitHub
RussellSpitzer merged PR #12468: URL: https://github.com/apache/iceberg/pull/12468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Adds AWS to vendors page [iceberg]

2025-03-11 Thread via GitHub
RussellSpitzer commented on PR #12468: URL: https://github.com/apache/iceberg/pull/12468#issuecomment-2714997269 Thanks @rbowen for the addition, and thanks to all the reviewers! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] refactor(manifests): consolidate ManifestEntryV1 and V2 [iceberg-go]

2025-03-11 Thread via GitHub
kevinjqliu commented on code in PR #327: URL: https://github.com/apache/iceberg-go/pull/327#discussion_r1989698267 ## manifest.go: ## @@ -501,13 +501,11 @@ func fetchManifestEntries(m ManifestFile, fs iceio.IO, discardDeleted bool) ([]M } fieldNameToID, fieldI

Re: [PR] feat: add support for azure blob with connection string/sas token/account key [iceberg-go]

2025-03-11 Thread via GitHub
kevinjqliu commented on PR #313: URL: https://github.com/apache/iceberg-go/pull/313#issuecomment-2715015163 we spin up an azurite container for azure related integration tests https://github.com/apache/iceberg-python/blob/764880364c94fbc4d29a0677350463de1d94e75c/dev/docker-compose-azurit

Re: [PR] [Do not merge] Iterative `bind` with a stack instead of recursion [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on PR #1783: URL: https://github.com/apache/iceberg-python/pull/1783#issuecomment-2715346868 changing the visitor to an iterative approach seems like a sound solution. are there any reasons we dont want to do this? -- This is an automated message from the Apache Git

Re: [PR] Docs: Update Iceberg talks with recent Iceberg meetup sessions [iceberg]

2025-03-11 Thread via GitHub
sida-shen commented on code in PR #12481: URL: https://github.com/apache/iceberg/pull/12481#discussion_r1989910456 ## site/docs/talks.md: ## @@ -21,6 +21,101 @@ title: "Talks" ## Iceberg Talks Here is a list of talks and other videos related to Iceberg. +### [Supporting S3 T

Re: [I] Support metadata compaction [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on issue #270: URL: https://github.com/apache/iceberg-python/issues/270#issuecomment-2715348757 feel free to help review the PR :) i haven't gotten to it yet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Update-schema: Add support for `initial-default` [iceberg-python]

2025-03-11 Thread via GitHub
malhotrashivam commented on code in PR #1770: URL: https://github.com/apache/iceberg-python/pull/1770#discussion_r1989921471 ## pyiceberg/table/update/schema.py: ## @@ -338,6 +363,7 @@ def _set_column_requirement(self, path: Union[str, Tuple[str, ...]], required: b

Re: [I] [bug] `bind` visitor causes `RecursionError: maximum recursion depth exceeded` [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on issue #1785: URL: https://github.com/apache/iceberg-python/issues/1785#issuecomment-2715331385 Perhaps we'd want to convert the visitor to an iterative approach, for example #1783 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Build: Bump mkdocstrings from 0.28.2 to 0.29.0 [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu merged PR #1781: URL: https://github.com/apache/iceberg-python/pull/1781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Docs: Update Iceberg talks with recent Iceberg meetup sessions [iceberg]

2025-03-11 Thread via GitHub
sida-shen commented on code in PR #12481: URL: https://github.com/apache/iceberg/pull/12481#discussion_r1987933563 ## site/docs/talks.md: ## @@ -21,6 +21,86 @@ title: "Talks" ## Iceberg Talks Here is a list of talks and other videos related to Iceberg. +### [Supporting S3 Ta

Re: [PR] doc: run doc test [iceberg-rust]

2025-03-11 Thread via GitHub
Fokko commented on code in PR #1066: URL: https://github.com/apache/iceberg-rust/pull/1066#discussion_r1986962950 ## crates/iceberg/src/writer/mod.rs: ## @@ -26,23 +26,58 @@ //! 2. IcebergWriter: Focus on the logical format of iceberg table. It will write the data using the Fi

Re: [PR] Add pull-request template [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on PR #1777: URL: https://github.com/apache/iceberg-python/pull/1777#issuecomment-2715103113 Thanks! This should make summarizing the release note easier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] feat(table): Add computation of iceberg stats from parquet files [iceberg-go]

2025-03-11 Thread via GitHub
zeroshade commented on code in PR #329: URL: https://github.com/apache/iceberg-go/pull/329#discussion_r1989770403 ## table/arrow_utils.go: ## @@ -892,3 +899,356 @@ func ToRequestedSchema(ctx context.Context, requested, fileSchema *iceberg.Schem return out, nil } + +f

[PR] feat(table): Add computation of iceberg stats from parquet files [iceberg-go]

2025-03-11 Thread via GitHub
zeroshade opened a new pull request, #329: URL: https://github.com/apache/iceberg-go/pull/329 This one's a big one! Adding initial implementation of deriving and computing the Iceberg metadata statistics from Parquet files in order to facilitate adding files directly. This includes u

[PR] doc: run doc test [iceberg-rust]

2025-03-11 Thread via GitHub
de-sh opened a new pull request, #1066: URL: https://github.com/apache/iceberg-rust/pull/1066 ## Which issue does this PR close? - Closes #1065. ## What changes are included in this PR? ## Are these changes tested? -- This is an automated messag

Re: [PR] Parquet: Support unknown and timestamp(9) in generics and internal model [iceberg]

2025-03-11 Thread via GitHub
rdblue merged PR #12463: URL: https://github.com/apache/iceberg/pull/12463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] feat(table): Adds updateSnapshotSummary internal function [iceberg-go]

2025-03-11 Thread via GitHub
zeroshade commented on PR #317: URL: https://github.com/apache/iceberg-go/pull/317#issuecomment-2711707114 I've addressed @Fokko's comments in general and the bigger point around the V1/V2 handling is shifted to a new PR so I'm going to merge this in the interests of continuing to move forw

Re: [I] Extends Iceberg table stats API to allow publish data and stats atomically [iceberg]

2025-03-11 Thread via GitHub
github-actions[bot] commented on issue #6442: URL: https://github.com/apache/iceberg/issues/6442#issuecomment-2705224394 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] Build: Bump getdaft from 0.4.4 to 0.4.7 [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu merged PR #1780: URL: https://github.com/apache/iceberg-python/pull/1780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [I] pyiceberg with hive and S3 fails even when providing creds [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on issue #1775: URL: https://github.com/apache/iceberg-python/issues/1775#issuecomment-2715173895 Thanks for filing this issue! It looks like we do pass the `s3.role-arn` to the underlying pyarrow S3FileSystem and the issue is in the S3FileSystem itself, as desc

Re: [PR] Build: Bump mkdocs-autorefs from 1.4.0 to 1.4.1 [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu merged PR #1782: URL: https://github.com/apache/iceberg-python/pull/1782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Build: Bump getdaft from 0.4.4 to 0.4.6 [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on PR #1758: URL: https://github.com/apache/iceberg-python/pull/1758#issuecomment-2715175732 Fixed in #1780, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Support metadata compaction [iceberg-python]

2025-03-11 Thread via GitHub
ZENOTME commented on issue #270: URL: https://github.com/apache/iceberg-python/issues/270#issuecomment-2715188414 > Looks like @amitgilad3 has already started a PR for Rewrite manifests in https://github.com/apache/iceberg-python/pull/1661 Thanks @kevinjqliu! It's a good reference.

Re: [I] URI missing, please provide using --uri, the config or environment variable PYICEBERG_CATALOG__DEFAULT__URI even through PYICEBERG_HOME is set to $HOME [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on issue #1771: URL: https://github.com/apache/iceberg-python/issues/1771#issuecomment-2715204948 Thanks @lk-1984 for the detailed debugging session :) I agree that the CLI should provide more user friendly warnings regarding catalog selection and override. I've

[I] improve pyiceberg CLI [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu opened a new issue, #1784: URL: https://github.com/apache/iceberg-python/issues/1784 ### Feature Request / Improvement Based on issues described in #1771 1. We'd want to make it clear that the `default` catalog is used by default when no `--catalog` parameter is give

Re: [I] Support metadata compaction [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on issue #270: URL: https://github.com/apache/iceberg-python/issues/270#issuecomment-2715133884 Looks like @amitgilad3 has already started a PR for Rewrite manifests in #1661 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Core: lazy init workerPool [iceberg]

2025-03-11 Thread via GitHub
pvary commented on code in PR #12427: URL: https://github.com/apache/iceberg/pull/12427#discussion_r1989438499 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -197,7 +198,7 @@ protected String targetBranch() { } protected ExecutorService workerPool(

Re: [PR] Support `wasb://` and `wasbs://` [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on PR #1663: URL: https://github.com/apache/iceberg-python/pull/1663#issuecomment-2715144891 Looks like we have a few adls integration tests against the azurite docker https://github.com/apache/iceberg-python/blob/b86d7d5885c1f9feec86cbffcb818738e41cd6c1/tests/io/tes

Re: [PR] Fix strict projection for `string` and `binary` [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu merged PR #1774: URL: https://github.com/apache/iceberg-python/pull/1774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
SanjayMarreddi commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1989988126 ## aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java: ## @@ -118,6 +119,14 @@ public S3Client s3() { .build(); } +@Overrid

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
SanjayMarreddi commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1989988680 ## kafka-connect/build.gradle: ## Review Comment: Yeah sure, noted. Thanks -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
SanjayMarreddi commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1989992832 ## aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java: ## @@ -255,6 +256,48 @@ public void testNewInputStreamWithMultiRegionAccess

Re: [I] Add files to add existing Parquet files to a table [iceberg-rust]

2025-03-11 Thread via GitHub
mkarbo commented on issue #932: URL: https://github.com/apache/iceberg-rust/issues/932#issuecomment-2710478698 @liurenjie1024 @jonathanc-n should this be closed now that https://github.com/apache/iceberg-rust/pull/960 is in? -- This is an automated message from the Apache Git Service. To

Re: [PR] feat(table): Adds updateSnapshotSummary internal function [iceberg-go]

2025-03-11 Thread via GitHub
zeroshade merged PR #317: URL: https://github.com/apache/iceberg-go/pull/317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] API, Core: Add geometry and geography types support [iceberg]

2025-03-11 Thread via GitHub
szehon-ho commented on PR #12346: URL: https://github.com/apache/iceberg/pull/12346#issuecomment-2705309035 Also, (as can't comment on files that are not in the change) Do we need to add Geo types to following places? 1. Types.java TYPES constant? 2. TestSchemaUnionByFieldNa

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
geruh commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1989952400 ## aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java: ## @@ -118,6 +119,14 @@ public S3Client s3() { .build(); } +@Override +pu

Re: [PR] Update-schema: Add support for `initial-default` [iceberg-python]

2025-03-11 Thread via GitHub
malhotrashivam commented on code in PR #1770: URL: https://github.com/apache/iceberg-python/pull/1770#discussion_r1989951348 ## pyiceberg/table/update/schema.py: ## @@ -414,6 +416,7 @@ def update_column( doc=doc if doc is not None else updated.doc,

Re: [PR] Update-schema: Add support for `initial-default` [iceberg-python]

2025-03-11 Thread via GitHub
Fokko commented on code in PR #1770: URL: https://github.com/apache/iceberg-python/pull/1770#discussion_r1989963103 ## pyiceberg/table/update/schema.py: ## @@ -338,6 +363,7 @@ def _set_column_requirement(self, path: Union[str, Tuple[str, ...]], required: b fiel

Re: [PR] Update-schema: Add support for `initial-default` [iceberg-python]

2025-03-11 Thread via GitHub
Fokko commented on code in PR #1770: URL: https://github.com/apache/iceberg-python/pull/1770#discussion_r1989938216 ## pyiceberg/table/update/schema.py: ## @@ -212,13 +215,34 @@ def add_column( # assign new IDs in order new_id = self.assign_new_column_id() +

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1989966659 ## aws/src/main/java/org/apache/iceberg/aws/AwsClientFactories.java: ## @@ -118,6 +119,14 @@ public S3Client s3() { .build(); } +@Override +

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
kevinjqliu commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1989925583 ## gradle/libs.versions.toml: ## @@ -22,6 +22,7 @@ [versions] activation = "1.1.1" aliyun-sdk-oss = "3.10.2" +analyticsaccelerator = "1.0.0" Review Comment:

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 commented on PR #12299: URL: https://github.com/apache/iceberg/pull/12299#issuecomment-2715451404 > Have you posted this on the iceberg devlist? Not really, I did not really expect it to be a community discussion since this is a very vendor specific integration for S3 (alth

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1989985092 ## kafka-connect/build.gradle: ## Review Comment: We are not adding this to the aws-bundle yet, so it should be fine, but @SanjayMarreddi we should probably

Re: [PR] [Do not merge] Iterative `bind` with a stack instead of recursion [iceberg-python]

2025-03-11 Thread via GitHub
Fokko commented on PR #1783: URL: https://github.com/apache/iceberg-python/pull/1783#issuecomment-2715461531 I like the solution! > changing the visitor to an iterative approach seems like a sound solution. are there any reasons we dont want to do this? My only concern is perfo

Re: [PR] Core, Spark 3.5: Apply Equality Deletes when Doing Copy on Write [iceberg]

2025-03-11 Thread via GitHub
wypoon commented on PR #12479: URL: https://github.com/apache/iceberg/pull/12479#issuecomment-2715528579 @pvary @RussellSpitzer thanks for answering my question! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] feat(table): Basic Transaction and AddFiles [iceberg-go]

2025-03-11 Thread via GitHub
kevinjqliu commented on code in PR #330: URL: https://github.com/apache/iceberg-go/pull/330#discussion_r1990183802 ## table/table_test.go: ## @@ -128,3 +138,235 @@ func (t *TableTestSuite) TestSnapshotByName() { t.True(testSnapshot.Equals(*t.tbl.SnapshotByName("test"))

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
SanjayMarreddi commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1989998534 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -72,6 +72,32 @@ public class S3FileIOProperties implements Serializable { pu

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
kevinjqliu commented on PR #12299: URL: https://github.com/apache/iceberg/pull/12299#issuecomment-2715494642 > If you are using the normal code path today with the feature off, with all the separated code paths, you should not be affected at all. yea looking at `aws/src/main/java/org/

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
kevinjqliu commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1990004839 ## aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java: ## @@ -255,6 +256,48 @@ public void testNewInputStreamWithMultiRegionAccessPoin

Re: [PR] Flink: Support source watermark for flink sql windows [iceberg]

2025-03-11 Thread via GitHub
pvary commented on code in PR #12191: URL: https://github.com/apache/iceberg/pull/12191#discussion_r1988451569 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/source/IcebergTableSource.java: ## @@ -175,6 +178,18 @@ public Result applyFilters(List flinkFilters) {

Re: [PR] Core,Api: Add overwrite option when register external table to catalog [iceberg]

2025-03-11 Thread via GitHub
dramaticlly commented on code in PR #12228: URL: https://github.com/apache/iceberg/pull/12228#discussion_r1990229315 ## api/src/main/java/org/apache/iceberg/catalog/Catalog.java: ## @@ -344,6 +344,24 @@ default void invalidateTable(TableIdentifier identifier) {} * @throws Al

Re: [I] Issue during Upsert [iceberg-python]

2025-03-11 Thread via GitHub
mattmartin14 commented on issue #1759: URL: https://github.com/apache/iceberg-python/issues/1759#issuecomment-2715872005 Hey @kevinjqliu , From my original testing, insert filters were not affected by this problem. It was only the overwrite filters that were an issue. Has somethin

[PR] feat: Add `NameMapping` [iceberg-rust]

2025-03-11 Thread via GitHub
jonathanc-n opened a new pull request, #1072: URL: https://github.com/apache/iceberg-rust/pull/1072 ## Which issue does this PR close? - Related to #1030. ## What changes are included in this PR? Added `NameMapping` implementation. Includes updating, creating, and ap

[PR] Build: Bump mkdocstrings-python from 1.16.2 to 1.16.5 [iceberg-python]

2025-03-11 Thread via GitHub
dependabot[bot] opened a new pull request, #1786: URL: https://github.com/apache/iceberg-python/pull/1786 Bumps [mkdocstrings-python](https://github.com/mkdocstrings/python) from 1.16.2 to 1.16.5. Release notes Sourced from https://github.com/mkdocstrings/python/releases";>mkdocstr

[PR] Build: Bump sqlalchemy from 2.0.38 to 2.0.39 [iceberg-python]

2025-03-11 Thread via GitHub
dependabot[bot] opened a new pull request, #1787: URL: https://github.com/apache/iceberg-python/pull/1787 Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 2.0.38 to 2.0.39. Release notes Sourced from https://github.com/sqlalchemy/sqlalchemy/releases";>sqlalchemy's

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 commented on PR #12299: URL: https://github.com/apache/iceberg/pull/12299#issuecomment-2715732944 @HonahX could you take a look? Given the fact that we plan to refactor the HTTPClientProperties and other related classes as the next step, it's probably good for you to take a look

Re: [PR] Core,Api: Add overwrite option when register external table to catalog [iceberg]

2025-03-11 Thread via GitHub
RussellSpitzer commented on code in PR #12228: URL: https://github.com/apache/iceberg/pull/12228#discussion_r1990157004 ## api/src/main/java/org/apache/iceberg/catalog/Catalog.java: ## @@ -344,6 +344,24 @@ default void invalidateTable(TableIdentifier identifier) {} * @throws

Re: [PR] Core,Api: Add overwrite option when register external table to catalog [iceberg]

2025-03-11 Thread via GitHub
RussellSpitzer commented on code in PR #12228: URL: https://github.com/apache/iceberg/pull/12228#discussion_r1990161331 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -71,23 +70,35 @@ public Table loadTable(TableIdentifier identifier) { } @Over

Re: [PR] Core,Api: Add overwrite option when register external table to catalog [iceberg]

2025-03-11 Thread via GitHub
RussellSpitzer commented on code in PR #12228: URL: https://github.com/apache/iceberg/pull/12228#discussion_r1990162684 ## core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java: ## @@ -71,23 +70,35 @@ public Table loadTable(TableIdentifier identifier) { } @Over

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1990522002 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3InputFile.java: ## @@ -36,20 +37,53 @@ public static S3InputFile fromLocation( MetricsContext metrics) {

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 commented on PR #12299: URL: https://github.com/apache/iceberg/pull/12299#issuecomment-2716569441 Looks like all comments are addressed, thanks @SanjayMarreddi for all the work! Let us know when you have the follow up PRs for async client configs and doc update! -- This is an

[PR] Added `FsspecFileIO` method for OSS, virtual hosted style default to true, standardized key configurations for OSS [iceberg-python]

2025-03-11 Thread via GitHub
helmiazizm opened a new pull request, #1788: URL: https://github.com/apache/iceberg-python/pull/1788 This pull request introduced `FsspecFileIo` for OSS configuration method as a backup when `PyArrowFileIO` fail. Using `S3FileSystem` class, the method should work as long as the virtual host

Re: [PR] Added `FsspecFileIO` method for OSS, virtual hosted style default to true, standardized key configurations for OSS [iceberg-python]

2025-03-11 Thread via GitHub
helmiazizm commented on PR #1788: URL: https://github.com/apache/iceberg-python/pull/1788#issuecomment-2716592898 Local test result for `s3fs.S3FileSystem` ![image](https://github.com/user-attachments/assets/7a78dccf-0cf3-403b-a26a-69a309eb27d9) -- This is an automated message

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 merged PR #12299: URL: https://github.com/apache/iceberg/pull/12299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Core: Use InternalData with avro and common DataIterable for readers. [iceberg]

2025-03-11 Thread via GitHub
pvary commented on code in PR #12476: URL: https://github.com/apache/iceberg/pull/12476#discussion_r1987644681 ## core/src/main/java/org/apache/iceberg/AllManifestsTable.java: ## @@ -192,13 +191,11 @@ public List deletes() { @Override public CloseableIterable rows() {

Re: [PR] Spark: Add some tests for variant fixup [iceberg]

2025-03-11 Thread via GitHub
XBaith commented on code in PR #12497: URL: https://github.com/apache/iceberg/pull/12497#discussion_r1988669139 ## core/src/test/java/org/apache/iceberg/RandomVariants.java: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contr

Re: [PR] API: Speed up Timestamps#toHumanString [iceberg]

2025-03-11 Thread via GitHub
RussellSpitzer commented on PR #12447: URL: https://github.com/apache/iceberg/pull/12447#issuecomment-2715305062 Can you see who the caller is there? I'm just interested in why flink is calling that function so often -- This is an automated message from the Apache Git Service. To respond

Re: [PR] API: Speed up Timestamps#toHumanString [iceberg]

2025-03-11 Thread via GitHub
suneet-s commented on PR #12447: URL: https://github.com/apache/iceberg/pull/12447#issuecomment-2715260800 @RussellSpitzer Thanks for reviewing this change. In a performance test we were running where a flink pipieline was writing data to an iceberg table, we saw that this function was taki

Re: [PR] feat: (catalog/glue) Add support for CreateTable [iceberg-go]

2025-03-11 Thread via GitHub
dttung2905 commented on code in PR #326: URL: https://github.com/apache/iceberg-go/pull/326#discussion_r1987624927 ## catalog/glue/glue_test.go: ## @@ -778,12 +781,103 @@ func TestGlueListNamespacesIntegration(t *testing.T) { } assert := require.New(t) -

Re: [I] Issue during Upsert [iceberg-python]

2025-03-11 Thread via GitHub
kevinjqliu commented on issue #1759: URL: https://github.com/apache/iceberg-python/issues/1759#issuecomment-2715325319 Thanks everyone. i think this is a more generic issue with `bind` and the visitors which i opened #1785 to track. I believe this issue is showing up in `upsert` in

Re: [PR] refactor(manifests): consolidate ManifestEntryV1 and V2 [iceberg-go]

2025-03-11 Thread via GitHub
zeroshade merged PR #327: URL: https://github.com/apache/iceberg-go/pull/327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] feat: (catalog/glue) Add support for CreateTable [iceberg-go]

2025-03-11 Thread via GitHub
zeroshade commented on code in PR #326: URL: https://github.com/apache/iceberg-go/pull/326#discussion_r1987481597 ## catalog/glue/glue.go: ## @@ -582,3 +633,16 @@ func filterDatabaseListByType(databases []types.Database, databaseType string) [ return filtered } + +fu

Re: [PR] Parquet: Implement Variant metrics [iceberg]

2025-03-11 Thread via GitHub
rdblue commented on code in PR #12496: URL: https://github.com/apache/iceberg/pull/12496#discussion_r1988118292 ## parquet/src/main/java/org/apache/iceberg/parquet/TypeWithSchemaVisitor.java: ## @@ -211,13 +211,13 @@ private static List visitFields( } private static T

Re: [PR] Update-schema: Add support for `initial-default` [iceberg-python]

2025-03-11 Thread via GitHub
malhotrashivam commented on code in PR #1770: URL: https://github.com/apache/iceberg-python/pull/1770#discussion_r1989951348 ## pyiceberg/table/update/schema.py: ## @@ -414,6 +416,7 @@ def update_column( doc=doc if doc is not None else updated.doc,

Re: [I] Support metadata compaction [iceberg-python]

2025-03-11 Thread via GitHub
ZENOTME commented on issue #270: URL: https://github.com/apache/iceberg-python/issues/270#issuecomment-2713768454 Hi, recently I'm trying to investigate support rewrite manifest in iceberg-rust. And the design of iceberg-rust is following iceberg-python, basically, but for now, rewrite mani

Re: [PR] feat(table): Add computation of iceberg stats from parquet files [iceberg-go]

2025-03-11 Thread via GitHub
zeroshade merged PR #329: URL: https://github.com/apache/iceberg-go/pull/329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 commented on PR #12299: URL: https://github.com/apache/iceberg/pull/12299#issuecomment-2715582900 > It would also be great to outline the migration path going forward. Yes, I think in general there is data point supporting using async client & CRT client makes the performa

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
kevinjqliu commented on code in PR #12299: URL: https://github.com/apache/iceberg/pull/12299#discussion_r1990064772 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputFile.java: ## @@ -75,7 +95,7 @@ public PositionOutputStream createOrOverwrite() { @Override public I

Re: [PR] feat(table): Basic Transaction and AddFiles [iceberg-go]

2025-03-11 Thread via GitHub
zeroshade commented on PR #330: URL: https://github.com/apache/iceberg-go/pull/330#issuecomment-2715602441 We're in the home stretch @Fokko @kevinjqliu!! Thanks so much for the quick reviews and feedback on all of these. -- This is an automated message from the Apache Git Service. To res

[PR] feat: Add conversion from `FileMetaData` to `ParquetMetadata` [iceberg-rust]

2025-03-11 Thread via GitHub
jonathanc-n opened a new pull request, #1074: URL: https://github.com/apache/iceberg-rust/pull/1074 ## Which issue does this PR close? - Closes #1033 and #1004. ## What changes are included in this PR? Add conversion from filemetadat to parquet metadata using thrift

Re: [I] Consolidate methods of converting parquet file to data file builder. [iceberg-rust]

2025-03-11 Thread via GitHub
jonathanc-n commented on issue #1033: URL: https://github.com/apache/iceberg-rust/issues/1033#issuecomment-2716131268 @mnpw This pull request should be completed by #1074. Sorry about that, the two issues were intertwined. I was only able to test the metadata conversion by completing this a

Re: [PR] Spark: Support singular form of years, months, days, and hours functions [iceberg]

2025-03-11 Thread via GitHub
wypoon commented on code in PR #12117: URL: https://github.com/apache/iceberg/pull/12117#discussion_r1990386780 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestSystemFunctionPushDownDQL.java: ## @@ -84,20 +84,24 @@ public void removeTables()

[PR] feat: (catalog/glue) Add support for CreateTable [iceberg-go]

2025-03-11 Thread via GitHub
dttung2905 opened a new pull request, #326: URL: https://github.com/apache/iceberg-go/pull/326 Hi team, This PR aims to support CreateTable for glue catalog. Below are the list (I think) to be done: - [x] Tested out on a real Glue Catalog .Table was created successfully - [ ] Add un

Re: [PR] Spark: Support singular form of years, months, days, and hours functions [iceberg]

2025-03-11 Thread via GitHub
RussellSpitzer commented on PR #12117: URL: https://github.com/apache/iceberg/pull/12117#issuecomment-2711124370 @nastra I'm a +0 on this, i'm not sure we really are making the situation better since I don't like having two methods that do the same thing (especially when it's just a single

Re: [PR] Spark: Support singular form of years, months, days, and hours functions [iceberg]

2025-03-11 Thread via GitHub
wypoon commented on PR #12117: URL: https://github.com/apache/iceberg/pull/12117#issuecomment-2716148461 @RussellSpitzer thank you for reviewing the PR. I understand that you're not thrilled with the idea of two functions to do the same thing. However, this is already the case with the part

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
kevinjqliu commented on PR #12299: URL: https://github.com/apache/iceberg/pull/12299#issuecomment-2715574083 I verified that the async client should only affect S3 FileIO when the feature flag is enabled. `s3Async()` is the factory function that returns a `S3AsyncClient`. It is ca

Re: [PR] Flink: Support source watermark for flink sql windows [iceberg]

2025-03-11 Thread via GitHub
swapna267 commented on code in PR #12191: URL: https://github.com/apache/iceberg/pull/12191#discussion_r1990246723 ## flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/source/IcebergTableSource.java: ## @@ -175,6 +178,18 @@ public Result applyFilters(List flinkFilters) {

Re: [I] [feat] Ability to read table using `version-hint.txt` [iceberg-python]

2025-03-11 Thread via GitHub
srilman commented on issue #763: URL: https://github.com/apache/iceberg-python/issues/763#issuecomment-2715883331 @Fokko is this issue still open for working on? For context, we had to build a PyIceberg-based Hadoop Catalog with a subset of features for backwards compatibility when moving B

Re: [PR] Flink: Support source watermark for flink sql windows [iceberg]

2025-03-11 Thread via GitHub
swapna267 commented on code in PR #12191: URL: https://github.com/apache/iceberg/pull/12191#discussion_r1990251133 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceSql.java: ## @@ -53,7 +55,11 @@ public class TestIcebergSourceSql extends TestSq

Re: [I] Applying Filter on Top-Level Struct Columns Throws Error [iceberg-python]

2025-03-11 Thread via GitHub
srilman commented on issue #1778: URL: https://github.com/apache/iceberg-python/issues/1778#issuecomment-2715886974 Sounds good, here is the full stacktrace just in case. Sorry about that, I truncated it to keep the issue description short. ``` /Users/slade/bodo/mono/develop/.pix

[I] Spark mistakenly cleanup written file with successful IRC commits [iceberg]

2025-03-11 Thread via GitHub
puchengy opened a new issue, #12499: URL: https://github.com/apache/iceberg/issues/12499 ### Apache Iceberg version 1.3.0 ### Query engine Spark ### Please describe the bug 🐞 When OOM happens with IRC successful commits, Spark will mistakenly cleanup commit

[PR] chore(deps): Bump crate-ci/typos from 1.30.0 to 1.30.2 [iceberg-rust]

2025-03-11 Thread via GitHub
dependabot[bot] opened a new pull request, #1069: URL: https://github.com/apache/iceberg-rust/pull/1069 Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.30.0 to 1.30.2. Release notes Sourced from https://github.com/crate-ci/typos/releases";>crate-ci/typos's release

Re: [PR] feat: add support for azure blob with connection string/sas token/account key [iceberg-go]

2025-03-11 Thread via GitHub
xuhui-lu commented on PR #313: URL: https://github.com/apache/iceberg-go/pull/313#issuecomment-2709438488 > @kevinjqliu @Fokko do you know of any equivalent to running Minio that we could use via a docker image to test the ADLS integration? I am not sure if I could just use the https:

[PR] Build: Bump getdaft from 0.4.4 to 0.4.7 [iceberg-python]

2025-03-11 Thread via GitHub
dependabot[bot] opened a new pull request, #1780: URL: https://github.com/apache/iceberg-python/pull/1780 Bumps [getdaft](https://github.com/Eventual-Inc/Daft) from 0.4.4 to 0.4.7. Release notes Sourced from https://github.com/Eventual-Inc/Daft/releases";>getdaft's releases.

Re: [PR] AWS: Integrate S3 analytics accelerator library [iceberg]

2025-03-11 Thread via GitHub
jackye1995 commented on PR #12299: URL: https://github.com/apache/iceberg/pull/12299#issuecomment-2715593232 > Are there plans to replace the current s3 client with the async client? Maybe after many versions, once we have enough confidence that it is stable. But probably not in the s

Re: [I] Support queries all branches and tags java api [iceberg]

2025-03-11 Thread via GitHub
github-actions[bot] commented on issue #11042: URL: https://github.com/apache/iceberg/issues/11042#issuecomment-2716018745 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] support equality/positional deletes in vectorized arrow reader [iceberg]

2025-03-11 Thread via GitHub
github-actions[bot] commented on issue #11120: URL: https://github.com/apache/iceberg/issues/11120#issuecomment-2716018807 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] Support queries all branches and tags java api [iceberg]

2025-03-11 Thread via GitHub
github-actions[bot] closed issue #11042: Support queries all branches and tags java api URL: https://github.com/apache/iceberg/issues/11042 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

<    1   2   3   >