Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-12 Thread via GitHub
kevinjqliu commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1563580912 ## tests/integration/test_inspect_table.py: ## @@ -186,8 +185,6 @@ def test_inspect_entries( assert df_lhs == df_rhs, f"Difference in dat

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-12 Thread via GitHub
kevinjqliu commented on PR #590: URL: https://github.com/apache/iceberg-python/pull/590#issuecomment-2052939855 +1 to adding this to the 0.7.0 release. The original issue in #584 is already fixed by #597. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-12 Thread via GitHub
kevinjqliu commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1563557100 ## tests/integration/test_writes/test_writes.py: ## @@ -270,6 +270,48 @@ def get_current_snapshot_id(identifier: str) -> int: assert tbl.current_snapshot()

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-12 Thread via GitHub
HonahX commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1563400732 ## tests/integration/test_writes/test_writes.py: ## @@ -270,6 +270,48 @@ def get_current_snapshot_id(identifier: str) -> int: assert tbl.current_snapshot().sna

Re: [I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]

2024-04-12 Thread via GitHub
HonahX commented on issue #584: URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2052803152 The issue has been fixed by this PR #597. Now, pyiceberg could read parquet files with original column names and those with transformed column names. I will leave this issue open

Re: [PR] Read: fetch file_schema directly from pyarrow_to_schema [iceberg-python]

2024-04-12 Thread via GitHub
HonahX merged PR #597: URL: https://github.com/apache/iceberg-python/pull/597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Read: fetch file_schema directly from pyarrow_to_schema [iceberg-python]

2024-04-12 Thread via GitHub
HonahX commented on PR #597: URL: https://github.com/apache/iceberg-python/pull/597#issuecomment-2052761611 Thanks @kevinjqliu and @Fokko for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Spark: Reconcile derived partitioning from source table with target table specs in AddFilesProcedure [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10133: URL: https://github.com/apache/iceberg/pull/10133#discussion_r1563422261 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -948,6 +948,28 @@ public void testAddFiles

Re: [I] Lock remains in HMS if HiveTableOperations gets killed (direct process shutdown - no signals) after lock is acquired [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] commented on issue #2301: URL: https://github.com/apache/iceberg/issues/2301#issuecomment-2052716293 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Consider both delete file size and data file size when planing tasks [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] commented on issue #2298: URL: https://github.com/apache/iceberg/issues/2298#issuecomment-2052716257 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Consider both delete file size and data file size when planing tasks [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] closed issue #2298: Consider both delete file size and data file size when planing tasks URL: https://github.com/apache/iceberg/issues/2298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Lock remains in HMS if HiveTableOperations gets killed (direct process shutdown - no signals) after lock is acquired [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] closed issue #2301: Lock remains in HMS if HiveTableOperations gets killed (direct process shutdown - no signals) after lock is acquired URL: https://github.com/apache/iceberg/issues/2301 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Build a utility to infer partitions at a given path [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] closed issue #2300: Build a utility to infer partitions at a given path URL: https://github.com/apache/iceberg/issues/2300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Build a utility to infer partitions at a given path [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] commented on issue #2300: URL: https://github.com/apache/iceberg/issues/2300#issuecomment-2052716270 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Catalog Migration transaction [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] closed issue #2288: Catalog Migration transaction URL: https://github.com/apache/iceberg/issues/2288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] Catalog Migration transaction [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] commented on issue #2288: URL: https://github.com/apache/iceberg/issues/2288#issuecomment-2052716236 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] hive create external table for iceberg have error. [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] closed issue #2277: hive create external table for iceberg have error. URL: https://github.com/apache/iceberg/issues/2277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] hive create external table for iceberg have error. [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] commented on issue #2277: URL: https://github.com/apache/iceberg/issues/2277#issuecomment-2052716218 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Hive: Support identifiers with catalog [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] commented on issue #2274: URL: https://github.com/apache/iceberg/issues/2274#issuecomment-2052716198 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Hive: Support identifiers with catalog [iceberg]

2024-04-12 Thread via GitHub
github-actions[bot] closed issue #2274: Hive: Support identifiers with catalog URL: https://github.com/apache/iceberg/issues/2274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] AWS: close underlying Scheduler for DynamoDb LockManager [iceberg]

2024-04-12 Thread via GitHub
regadas opened a new pull request, #10132: URL: https://github.com/apache/iceberg/pull/10132 With the new close method added to BaseLockManager, I noticed that DynamoDbLockManager was not reusing inherited `close` and, hence, not closing the underlying scheduler. It's my first PR her

Re: [I] Create iceberg table from existsing parquet files with slightly different schemas (schemas merge is possible). [iceberg-python]

2024-04-12 Thread via GitHub
kevinjqliu commented on issue #601: URL: https://github.com/apache/iceberg-python/issues/601#issuecomment-2052628807 There's a [`Table.add_files` API](https://github.com/apache/iceberg-python/blob/5039b5d70644bc06c98349090912c6e9066d3ea1/mkdocs/docs/api.md#add-files) which supports directly

Re: [PR] Core: Allow manifest file cache to be configurable [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10118: URL: https://github.com/apache/iceberg/pull/10118#discussion_r1563195336 ## core/src/main/java/org/apache/iceberg/io/DefaultContentCache.java: ## @@ -0,0 +1,295 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[PR] Add Partitions Metadata Table [iceberg-python]

2024-04-12 Thread via GitHub
syun64 opened a new pull request, #603: URL: https://github.com/apache/iceberg-python/pull/603 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-12 Thread via GitHub
aokolnychyi commented on PR #9611: URL: https://github.com/apache/iceberg/pull/9611#issuecomment-2052545149 I think we should continue to use `assertEquals` method (which is our custom assert method). It has proper value equality for Spark. -- This is an automated message from the Apache

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-12 Thread via GitHub
aokolnychyi commented on PR #9611: URL: https://github.com/apache/iceberg/pull/9611#issuecomment-2052543360 Looks like there are some related test failures: ``` TestRewriteDataFilesAction > testParallelPartialProgressWithMaxFailedCommits() FAILED org.opentest4j.AssertionFail

Re: [PR] Spark-3.5: Support CTAS and RTAS to preserve schema nullability. [iceberg]

2024-04-12 Thread via GitHub
aokolnychyi commented on PR #10074: URL: https://github.com/apache/iceberg/pull/10074#issuecomment-2052539301 Thanks, @zhongyujiang! Thanks for reviewing, @amogh-jahagirdar! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Spark-3.5: Support CTAS and RTAS to preserve schema nullability. [iceberg]

2024-04-12 Thread via GitHub
aokolnychyi merged PR #10074: URL: https://github.com/apache/iceberg/pull/10074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
syun64 commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1563120200 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise Modul

Re: [PR] Read: fetch file_schema directly from pyarrow_to_schema [iceberg-python]

2024-04-12 Thread via GitHub
Fokko commented on code in PR #597: URL: https://github.com/apache/iceberg-python/pull/597#discussion_r1563143233 ## pyiceberg/io/pyarrow.py: ## @@ -966,20 +965,15 @@ def _task_to_table( with fs.open_input_file(path) as fin: fragment = arrow_format.make_fragment(fi

[PR] Add Refs metadata table [iceberg-python]

2024-04-12 Thread via GitHub
geruh opened a new pull request, #602: URL: https://github.com/apache/iceberg-python/pull/602 This PR adds the Refs metadata table the existing inspect logic for Iceberg tables as listed in #511. The refs metadata table in Iceberg stores the table's known snapshot references including branc

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
Fokko commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1563107624 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise Module

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
Fokko commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1563107139 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise Module

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
Fokko commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1563100274 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise Module

Re: [I] Flink inserts creates duplicates per primary key [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar commented on issue #10076: URL: https://github.com/apache/iceberg/issues/10076#issuecomment-2052263317 Yeah +1 to what @manuzhang said. You're expecting UPSERT behavior but it seems like you don't have that configured. See https://iceberg.apache.org/docs/nightly/flink-write

Re: [I] Flink inserts creates duplicates per primary key [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar closed issue #10076: Flink inserts creates duplicates per primary key URL: https://github.com/apache/iceberg/issues/10076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Convert predicate to arrow filter and push down to parquet reader [iceberg-rust]

2024-04-12 Thread via GitHub
viirya commented on PR #295: URL: https://github.com/apache/iceberg-rust/pull/295#issuecomment-2052233861 @liurenjie1024 I've addressed all comments. Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
syun64 commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1562951244 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise Modul

Re: [PR] Spec: required request bodies [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar merged PR #10125: URL: https://github.com/apache/iceberg/pull/10125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [I] OpenApi requestBody: some are optional but should be required [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar closed issue #10004: OpenApi requestBody: some are optional but should be required URL: https://github.com/apache/iceberg/issues/10004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Spec: required request bodies [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar commented on PR #10125: URL: https://github.com/apache/iceberg/pull/10125#issuecomment-2052182543 I'll go ahead and merge, thanks @westse for fixing this and @nastra for reviewing. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
kevinjqliu commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1562924781 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise M

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
syun64 commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1562919602 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise Modul

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
syun64 commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1562918080 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise Modul

Re: [PR] Spark: Improvements around test initialization [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar merged PR #10131: URL: https://github.com/apache/iceberg/pull/10131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Spark: Improvements around test initialization [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10131: URL: https://github.com/apache/iceberg/pull/10131#discussion_r1562913308 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java: ## @@ -73,7 +73,7 @@ public static void dropWarehouse() throws IOExcepti

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562905300 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -501,6 +503,32 @@ public void testDeleteNonExistingRecords() {

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562812167 ## core/src/main/java/org/apache/iceberg/BaseRowDelta.java: ## @@ -43,6 +43,10 @@ protected BaseRowDelta self() { @Override protected String operation() { +

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562796968 ## core/src/main/java/org/apache/iceberg/BaseRowDelta.java: ## @@ -43,6 +43,10 @@ protected BaseRowDelta self() { @Override protected String operatio

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562796968 ## core/src/main/java/org/apache/iceberg/BaseRowDelta.java: ## @@ -43,6 +43,10 @@ protected BaseRowDelta self() { @Override protected String operatio

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562797279 ## core/src/main/java/org/apache/iceberg/BaseRowDelta.java: ## @@ -43,6 +43,10 @@ protected BaseRowDelta self() { @Override protected String operatio

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-12 Thread via GitHub
tshauck commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1562769346 ## crates/integrations/src/datafusion/schema.rs: ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

[I] [feature request] Allow engines to time travel [iceberg-python]

2024-04-12 Thread via GitHub
kevinjqliu opened a new issue, #600: URL: https://github.com/apache/iceberg-python/issues/600 ### Feature Request / Improvement When engines, such as Daft, read from the `Table` object (see [scan_iceberg](https://github.com/pola-rs/polars/blob/py-0.20.19/py-polars/polars/io/iceberg.p

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-12 Thread via GitHub
tshauck commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1562769346 ## crates/integrations/src/datafusion/schema.rs: ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-12 Thread via GitHub
tshauck commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1562767139 ## Cargo.toml: ## @@ -21,6 +21,7 @@ members = [ "crates/catalog/*", "crates/examples", "crates/iceberg", +"crates/integrations", Review Comment:

Re: [PR] Spec: required request bodies [iceberg]

2024-04-12 Thread via GitHub
westse commented on code in PR #10125: URL: https://github.com/apache/iceberg/pull/10125#discussion_r1562752513 ## open-api/rest-catalog-open-api.yaml: ## @@ -248,6 +248,7 @@ paths: The server might also add properties, such as `last_modified_time` etc. operation

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562733538 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -501,6 +503,32 @@ public void testDeleteNonExistingRecords() {

Re: [PR] reduce enum array allocation [iceberg]

2024-04-12 Thread via GitHub
nastra merged PR #10126: URL: https://github.com/apache/iceberg/pull/10126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Spark: Improvements around test initialization [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10131: URL: https://github.com/apache/iceberg/pull/10131#discussion_r1562748933 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/CatalogTestBase.java: ## @@ -48,6 +46,4 @@ protected static Object[][] parameters() { } }; }

Re: [PR] Spark: Improvements around test initialization [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10131: URL: https://github.com/apache/iceberg/pull/10131#discussion_r1562743988 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/ExtensionsTestBase.java: ## @@ -57,6 +58,8 @@ public static void startMetastoreAndSpar

Re: [PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
kevinjqliu commented on code in PR #599: URL: https://github.com/apache/iceberg-python/pull/599#discussion_r1562740546 ## pyiceberg/table/__init__.py: ## @@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: except ModuleNotFoundError as e: raise M

Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-12 Thread via GitHub
mrcnc commented on PR #10124: URL: https://github.com/apache/iceberg/pull/10124#issuecomment-2052005953 Thanks for the review and suggestions @nastra! I believe I've addressed all the feedback in the latest commit -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562733538 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -501,6 +503,32 @@ public void testDeleteNonExistingRecords() {

Re: [PR] reduce enum array allocation [iceberg]

2024-04-12 Thread via GitHub
sullis commented on PR #10126: URL: https://github.com/apache/iceberg/pull/10126#issuecomment-2051999782 CI build looks good ✅ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-04-12 Thread via GitHub
javrasya commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-2051883737 I've done some more refactorings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] WIP: View Spec implementation [iceberg-rust]

2024-04-12 Thread via GitHub
c-thiel commented on PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#issuecomment-2051874491 @ZENOTME one feature I did not implement yet is respecting "version.history.num-entries" as mentioned in the [View Spec](https://iceberg.apache.org/view-spec/#view-metadata). I noticed

[PR] Support Time Travel in InspectTable.entries [iceberg-python]

2024-04-12 Thread via GitHub
syun64 opened a new pull request, #599: URL: https://github.com/apache/iceberg-python/pull/599 Time travellng on Metadata Tables allows for comparisons between two snapshots of Iceberg tables in many different and meaningful ways (files, partitions, etc) Spark Iceberg API supports ti

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562627030 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java: ## @@ -317,10 +318,13 @@ protected void validate

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562617762 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -501,6 +503,32 @@ public void testDeleteNonExistingRecords() {

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-04-12 Thread via GitHub
javrasya commented on code in PR #9464: URL: https://github.com/apache/iceberg/pull/9464#discussion_r1562602068 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/SerializerHelper.java: ## @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-04-12 Thread via GitHub
javrasya commented on code in PR #9464: URL: https://github.com/apache/iceberg/pull/9464#discussion_r1562602068 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/SerializerHelper.java: ## @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-04-12 Thread via GitHub
javrasya commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-2051811087 Updated the PR @pvary @elkhand , appreciate if you guys could take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-04-12 Thread via GitHub
javrasya commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-2051782426 One thing we haven't talked too much though @pvary that the need for v3 since the older serialized splits won't be compatible for the new deserialization method. How should we implement

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-04-12 Thread via GitHub
javrasya commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-2051773375 Hi there, I was on vacation last week, so I am back this week. I almost had the update, give me few more days and I will ask for a review for the updates after I push. -- This is an a

Re: [I] UncheckedSQLException: Failed to execute exists query: SELECT table_namespace FROM iceberg_tables WHERE catalog_name = ? AND (table_namespace = ? OR table_namespace LIKE ? ESCAPE '\') LIMIT 1

2024-04-12 Thread via GitHub
jbonofre commented on issue #10056: URL: https://github.com/apache/iceberg/issues/10056#issuecomment-2051727663 @nastra @amogh-jahagirdar can you guys assign this issue to me ? I have two works in progress on that: 1. I'm testing if defining the charset on MySQL can help (without ch

Re: [PR] reduce enum array allocation [iceberg]

2024-04-12 Thread via GitHub
sullis commented on PR #10126: URL: https://github.com/apache/iceberg/pull/10126#issuecomment-2051707267 > Looks like CI is failing. Please run `./gradlew spotlessApply` Done. Ready for review. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [WIP] Migrate non TestBase related and Data classes in Flink [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10130: URL: https://github.com/apache/iceberg/pull/10130#discussion_r1562505930 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkAvroReaderWriter.java: ## @@ -95,14 +96,14 @@ private void writeAndValidate(Schema schema, Li

Re: [PR] [WIP] Migrate non TestBase related and Data classes in Flink [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10130: URL: https://github.com/apache/iceberg/pull/10130#discussion_r1562501901 ## data/src/test/java/org/apache/iceberg/data/parquet/TestParquetEncryptionWithWriteSupport.java: ## @@ -76,16 +77,16 @@ protected void writeAndValidate(Schema schema)

Re: [PR] [WIP] Migrate non TestBase related and Data classes in Flink [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10130: URL: https://github.com/apache/iceberg/pull/10130#discussion_r1562500825 ## data/src/test/java/org/apache/iceberg/data/parquet/TestGenericData.java: ## @@ -132,12 +132,12 @@ public void testTwoLevelList() throws IOException { .

Re: [PR] [WIP] Migrate non TestBase related and Data classes in Flink [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10130: URL: https://github.com/apache/iceberg/pull/10130#discussion_r1562500640 ## data/src/test/java/org/apache/iceberg/data/parquet/TestGenericData.java: ## @@ -132,12 +132,12 @@ public void testTwoLevelList() throws IOException { .

Re: [PR] [WIP] Migrate non TestBase related and Data classes in Flink [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10130: URL: https://github.com/apache/iceberg/pull/10130#discussion_r1562498776 ## data/src/test/java/org/apache/iceberg/data/DataTest.java: ## @@ -58,7 +58,7 @@ public abstract class DataTest { required(116, "dec_38_10", Types.DecimalT

Re: [PR] reduce enum array allocation [iceberg]

2024-04-12 Thread via GitHub
nastra commented on PR #10126: URL: https://github.com/apache/iceberg/pull/10126#issuecomment-2051679635 Looks like CI is failing. Please run `./gradlew spotlessApply` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] reduce enum array allocation [iceberg]

2024-04-12 Thread via GitHub
sullis commented on PR #10126: URL: https://github.com/apache/iceberg/pull/10126#issuecomment-2051631396 > @sullis thanks for the improvement. I think it might make sense to also update other enums that do it like this. Could you please check the codebase and update those? Agreed. I

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-12 Thread via GitHub
marvinlanhenke commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1562444674 ## crates/integrations/src/datafusion/schema.rs: ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-12 Thread via GitHub
marvinlanhenke commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1562442128 ## Cargo.toml: ## @@ -21,6 +21,7 @@ members = [ "crates/catalog/*", "crates/examples", "crates/iceberg", +"crates/integrations", Review Co

Re: [PR] Spark-3.5: Support CTAS and RTAS to preserve schema nullability. [iceberg]

2024-04-12 Thread via GitHub
zhongyujiang commented on PR #10074: URL: https://github.com/apache/iceberg/pull/10074#issuecomment-2051579466 @aokolnychyi @amogh-jahagirdar Thanks for reviewing, comments have been addresed, please take a look when you have time. -- This is an automated message from the Apache Git Servi

Re: [I] rewriting manifest can rewrite based on filter? [iceberg]

2024-04-12 Thread via GitHub
chenwyi2 commented on issue #10129: URL: https://github.com/apache/iceberg/issues/10129#issuecomment-2051540655 ` private List findMatchingManifests() { Snapshot currentSnapshot = table.currentSnapshot(); if (currentSnapshot == null) { return ImmutableList.of();

Re: [PR] reduce enum array allocation in FileFormat [iceberg]

2024-04-12 Thread via GitHub
nastra commented on PR #10126: URL: https://github.com/apache/iceberg/pull/10126#issuecomment-2051500692 @sullis thanks for the improvement. I think it might make sense to also update other enums that do it like this. Could you please check the codebase and update those? -- This is an au

Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10124: URL: https://github.com/apache/iceberg/pull/10124#discussion_r1562361150 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -161,6 +161,19 @@ public void testInitialize() { jdbcCatalog.initialize("test_jdbc_catal

Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10124: URL: https://github.com/apache/iceberg/pull/10124#discussion_r1562359601 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -161,6 +161,19 @@ public void testInitialize() { jdbcCatalog.initialize("test_jdbc_catal

Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10124: URL: https://github.com/apache/iceberg/pull/10124#discussion_r1562357647 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -161,6 +161,19 @@ public void testInitialize() { jdbcCatalog.initialize("test_jdbc_catal

Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10124: URL: https://github.com/apache/iceberg/pull/10124#discussion_r1562355476 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -843,4 +847,8 @@ public Transaction replaceTransaction() { return super.replaceTransaction

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562351286 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestStructuredStreamingRead3.java: ## @@ -476,7 +479,15 @@ public void testReadStreamWithSnapshotTyp

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1562347673 ## core/src/main/java/org/apache/iceberg/BaseRowDelta.java: ## @@ -43,6 +43,10 @@ protected BaseRowDelta self() { @Override protected String operation() { +

Re: [PR] Spec: required request bodies [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #10125: URL: https://github.com/apache/iceberg/pull/10125#discussion_r1562319061 ## open-api/rest-catalog-open-api.yaml: ## @@ -248,6 +248,7 @@ paths: The server might also add properties, such as `last_modified_time` etc. operation

[I] rewriting manifest can rewrite based on filter? [iceberg]

2024-04-12 Thread via GitHub
chenwyi2 opened a new issue, #10129: URL: https://github.com/apache/iceberg/issues/10129 ### Feature Request / Improvement spark 3.1 iceberg 1.2.1 i always met the below error : > org.apache.iceberg.exceptions.ValidationException: Manifest is missing: oss://xgimi-data/apps/spark

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #9611: URL: https://github.com/apache/iceberg/pull/9611#discussion_r1562264735 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -909,6 +909,47 @@ public void testParallelPartialProgressWithCo

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #9611: URL: https://github.com/apache/iceberg/pull/9611#discussion_r1562262300 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -909,6 +909,47 @@ public void testParallelPartialProgressWithCo

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-12 Thread via GitHub
nastra commented on code in PR #9611: URL: https://github.com/apache/iceberg/pull/9611#discussion_r1562260778 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -359,20 +361,31 @@ private Result doExecuteWithPartialProgress

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-04-12 Thread via GitHub
bk-mz commented on PR #10020: URL: https://github.com/apache/iceberg/pull/10020#issuecomment-2051139254 @szehon-ho but this implementation is rather a hack, workaround around original design. Why folks you don't think just advancing version of the table and switching to bigger const

  1   2   >