Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1829238607 ## tests/integration/test_writes/test_partitioned_writes.py: ## @@ -222,6 +277,113 @@ def test_query_filter_v1_v2_append_null( assert df.where(f"{col} is nu

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1829270083 ## tests/integration/test_writes/test_partitioned_writes.py: ## @@ -181,6 +181,61 @@ def test_query_filter_appended_null_partitioned( assert len(rows) == 6

[I] Adaptive retry for Glue Catalog calls to fix Glue throttling [iceberg-python]

2024-11-05 Thread via GitHub
mark-major opened a new issue, #1294: URL: https://github.com/apache/iceberg-python/issues/1294 ### Feature Request / Improvement I have experienced throttling exceptions when multiple nodes are reading and writing an Iceberg table with a Glue catalog. I have wrapped my calls in a re

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1829297150 ## tests/integration/test_writes/test_partitioned_writes.py: ## @@ -181,6 +181,61 @@ def test_query_filter_appended_null_partitioned( assert len(rows) == 6

Re: [PR] Ignore schema merge updates from long -> int [iceberg]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #11419: URL: https://github.com/apache/iceberg/pull/11419#discussion_r1829304556 ## core/src/main/java/org/apache/iceberg/schema/UnionByNameVisitor.java: ## @@ -180,6 +179,21 @@ private void updateColumn(Types.NestedField field, Types.NestedField e

Re: [PR] Spark-3.5: make `where` sql case sensitive setting alterable in rewrite data files procedure [iceberg]

2024-11-05 Thread via GitHub
ludlows commented on PR #11439: URL: https://github.com/apache/iceberg/pull/11439#issuecomment-2456884157 I think I typed the wrong version of iceberg in the issue https://github.com/apache/iceberg/issues/11438 -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Compute column stats incrementally [iceberg]

2024-11-05 Thread via GitHub
EremenkoValentin closed issue #11472: Compute column stats incrementally URL: https://github.com/apache/iceberg/issues/11472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Compute column stats incrementally [iceberg]

2024-11-05 Thread via GitHub
EremenkoValentin opened a new issue, #11475: URL: https://github.com/apache/iceberg/issues/11475 ### Query engine Iceberg API ### Question Does Iceberg support incremental statistics calculation? How can this be done for columns? How do you calculate changes between two

Re: [I] Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes [iceberg]

2024-11-05 Thread via GitHub
davidyuan1223 commented on issue #11465: URL: https://github.com/apache/iceberg/issues/11465#issuecomment-2456889163 > > can we use the sql `select column_sizes from table.files` to get the right size? > > I would prefer @RussellSpitzer's suggestion to directly check the parquet file

Re: [PR] Core, Data, Flink, Spark: Improve tableDir initialization for tests [iceberg]

2024-11-05 Thread via GitHub
nastra merged PR #11460: URL: https://github.com/apache/iceberg/pull/11460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1829270083 ## tests/integration/test_writes/test_partitioned_writes.py: ## @@ -181,6 +181,61 @@ def test_query_filter_appended_null_partitioned( assert len(rows) == 6

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1829272105 ## pyiceberg/table/__init__.py: ## @@ -456,6 +461,89 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) for data_f

Re: [I] Hive metastore 4.0.1 remove deprecated thrift APIs [iceberg-python]

2024-11-05 Thread via GitHub
akshayah3 commented on issue #1222: URL: https://github.com/apache/iceberg-python/issues/1222#issuecomment-2457219423 @Fokko I can look into this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Support Defining PartitionSpec and SortOrder without field-ids in create_table [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on issue #338: URL: https://github.com/apache/iceberg-python/issues/338#issuecomment-2457264582 > We are now able to express partition spec updates without referencing a field_id by using the create_table_transaction method on a catalog. If you're interested, this is

Re: [I] Adaptive retry for Glue Catalog calls to fix Glue throttling [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on issue #1294: URL: https://github.com/apache/iceberg-python/issues/1294#issuecomment-2457268171 Hey @mark-major thanks for jumping in here. I see that we only added retries to the REST catalog so far. Having this for the Glue catalog would be a great addition 👍 -- Thi

Re: [I] Adaptive retry for Glue Catalog calls to fix Glue throttling [iceberg-python]

2024-11-05 Thread via GitHub
mark-major commented on issue #1294: URL: https://github.com/apache/iceberg-python/issues/1294#issuecomment-2457326943 I'm happy to work on this if I have some time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Adaptive retry for Glue Catalog calls to fix Glue throttling [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on issue #1294: URL: https://github.com/apache/iceberg-python/issues/1294#issuecomment-2457330690 That would be great, happy to review! 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Hive metastore 4.0.1 remove deprecated thrift APIs [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on issue #1222: URL: https://github.com/apache/iceberg-python/issues/1222#issuecomment-2457332948 @akshayah3 That would be great! Let me know if you run into anything! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add scan planning api request and response models, parsers [iceberg]

2024-11-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #11369: URL: https://github.com/apache/iceberg/pull/11369#discussion_r1829479240 ## core/src/main/java/org/apache/iceberg/UnboundBaseFileScanTask.java: ## @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Add support for boolean expressions and quoted columns [iceberg-python]

2024-11-05 Thread via GitHub
MoSheikh commented on PR #1286: URL: https://github.com/apache/iceberg-python/pull/1286#issuecomment-2457401094 No problem, appreciate the team's work. And thank you for the quick turnaround. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Core: Adapt commit, scan, and snapshot stats for DVs [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11464: URL: https://github.com/apache/iceberg/pull/11464#discussion_r1829518195 ## core/src/main/java/org/apache/iceberg/metrics/ScanMetricsUtil.java: ## @@ -31,7 +32,11 @@ public static void indexedDeleteFile(ScanMetrics metrics, DeleteFile

Re: [PR] Core: Adapt commit, scan, and snapshot stats for DVs [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11464: URL: https://github.com/apache/iceberg/pull/11464#discussion_r1829516877 ## core/src/test/java/org/apache/iceberg/TestSnapshotSummary.java: ## @@ -358,4 +358,66 @@ public void rewriteWithDeletesAndDuplicates() { .containsEntry

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-11-05 Thread via GitHub
ajantha-bhat commented on code in PR #11279: URL: https://github.com/apache/iceberg/pull/11279#discussion_r1829584591 ## build.gradle: ## @@ -967,11 +970,9 @@ project(':iceberg-open-api') { testFixturesImplementation project(':iceberg-gcp') testFixturesImplementation p

Re: [PR] Allow union of `{int,long}`, `{float,double}`, etc [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on PR #1283: URL: https://github.com/apache/iceberg-python/pull/1283#issuecomment-2457557441 Thanks for the review @kevinjqliu 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-11-05 Thread via GitHub
ajantha-bhat commented on code in PR #11279: URL: https://github.com/apache/iceberg/pull/11279#discussion_r1829582677 ## build.gradle: ## @@ -967,11 +970,9 @@ project(':iceberg-open-api') { testFixturesImplementation project(':iceberg-gcp') testFixturesImplementation p

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-11-05 Thread via GitHub
ajantha-bhat commented on PR #11279: URL: https://github.com/apache/iceberg/pull/11279#issuecomment-2457528642 @danielcweeks, @Fokko, @bryanck: Please take a look at the PR again. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Core: Adapt commit, scan, and snapshot stats for DVs [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi merged PR #11464: URL: https://github.com/apache/iceberg/pull/11464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Core: Adapt commit, scan, and snapshot stats for DVs [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on PR #11464: URL: https://github.com/apache/iceberg/pull/11464#issuecomment-2457575822 Thank you, @nastra! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Allow union of `{int,long}`, `{float,double}`, etc [iceberg-python]

2024-11-05 Thread via GitHub
Fokko merged PR #1283: URL: https://github.com/apache/iceberg-python/pull/1283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

[PR] Core, Puffin: Add DV file writer [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi opened a new pull request, #11476: URL: https://github.com/apache/iceberg/pull/11476 This PR adds a base DV file writer with basic tests. More tests to come once we support DV reads. ``` Benchmark (deletedRowsRa

Re: [PR] Core: Adapt commit, scan, and snapshot stats for DVs [iceberg]

2024-11-05 Thread via GitHub
nastra commented on code in PR #11464: URL: https://github.com/apache/iceberg/pull/11464#discussion_r1828858604 ## core/src/test/java/org/apache/iceberg/TestSnapshotSummary.java: ## @@ -358,4 +358,66 @@ public void rewriteWithDeletesAndDuplicates() { .containsEntry(Snap

Re: [I] Including Iceberg Version in metadata json file for better traceability of PendingUpdate [iceberg]

2024-11-05 Thread via GitHub
rice668 commented on issue #11471: URL: https://github.com/apache/iceberg/issues/11471#issuecomment-2456506434 Thanks @nastra ! If it is only recorded in Snapshot, it is not very convenient to troubleshoot the problem. What we need is a `PendingUpdate`, not just a `SnapshotUpdate`. It is be

Re: [PR] Core: Make PositionDeleteIndex serializable [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi merged PR #11463: URL: https://github.com/apache/iceberg/pull/11463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

[I] Compute column stats incrementally [iceberg]

2024-11-05 Thread via GitHub
EremenkoValentin opened a new issue, #11472: URL: https://github.com/apache/iceberg/issues/11472 Query engine Iceberg API 1.6.1 Question Does Iceberg support incremental statistics calculation? How can this be done for columns? How do you calculate changes between two snapshots?

Re: [PR] Updating configuration docs [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on PR #1292: URL: https://github.com/apache/iceberg-python/pull/1292#issuecomment-2456552908 Thanks for the quick follow up 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Improve documentation on Configuration page [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on issue #1290: URL: https://github.com/apache/iceberg-python/issues/1290#issuecomment-2456557731 Enabling Github Discussions has been brought up once or twice, but it hasn't been yet decided. Mainly because there are [several places to discuss things](https://iceberg.apach

Re: [I] Support Defining PartitionSpec and SortOrder without field-ids in create_table [iceberg-python]

2024-11-05 Thread via GitHub
Samreay commented on issue #338: URL: https://github.com/apache/iceberg-python/issues/338#issuecomment-2456444991 Hey @sungwy, just thought I'd chase this as well. The PR you linked is merged and 0.7.1 is now out, so does that mean there is a new way of specifying sort order we can use with

[I] User ID information in Iceberg Table's snapshot [iceberg]

2024-11-05 Thread via GitHub
ArijitSinghEDA opened a new issue, #11474: URL: https://github.com/apache/iceberg/issues/11474 ### Query engine Spark ### Question I am using Iceberg with PostgreSQL as catalog, MinIO as data storage and using Spark for interacting with Iceberg. My application can take m

Re: [I] [feat request] Make `Table` / `TableMetadata` JSON serializable [iceberg-python]

2024-11-05 Thread via GitHub
db-trin-life commented on issue #535: URL: https://github.com/apache/iceberg-python/issues/535#issuecomment-2457289794 @kevinjqliu if no one is on this, can look to take this on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Support Defining PartitionSpec and SortOrder without field-ids in create_table [iceberg-python]

2024-11-05 Thread via GitHub
sungwy commented on issue #338: URL: https://github.com/apache/iceberg-python/issues/338#issuecomment-2457255925 Hi @Samreay thank you for chasing up! We are now able to express partition spec updates without referencing a `field_id` by using the `create_table_transaction` method on a

Re: [I] Support Defining PartitionSpec and SortOrder without field-ids in create_table [iceberg-python]

2024-11-05 Thread via GitHub
sungwy commented on issue #338: URL: https://github.com/apache/iceberg-python/issues/338#issuecomment-2457311613 Yes, @Fokko - this is exactly the type of user confusion that prompted me to create the issue for https://github.com/apache/iceberg-python/issues/1284 to separate the behavior ba

Re: [PR] Core: Support DVs in DeleteFileIndex [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi merged PR #11467: URL: https://github.com/apache/iceberg/pull/11467 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Core: Support DVs in DeleteFileIndex [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11467: URL: https://github.com/apache/iceberg/pull/11467#discussion_r1829495298 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -84,4 +85,17 @@ public static String referencedDataFileLocation(DeleteFile deleteFile)

Re: [PR] Core: Support DVs in DeleteFileIndex [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11467: URL: https://github.com/apache/iceberg/pull/11467#discussion_r1829495298 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -84,4 +85,17 @@ public static String referencedDataFileLocation(DeleteFile deleteFile)

Re: [PR] Core: Support DVs in DeleteFileIndex [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on PR #11467: URL: https://github.com/apache/iceberg/pull/11467#issuecomment-2457384832 Thanks for reviewing, @nastra! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Bump griffe from 1.3.1 to 1.5.1 [iceberg-python]

2024-11-05 Thread via GitHub
Fokko merged PR #1289: URL: https://github.com/apache/iceberg-python/pull/1289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Spark: add property to disable client-side purging in spark [iceberg]

2024-11-05 Thread via GitHub
twuebi commented on code in PR #11317: URL: https://github.com/apache/iceberg/pull/11317#discussion_r1829207481 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -60,6 +60,10 @@ public static Catalog wrap( return new CachingCatalog(catalog, caseSensitive,

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1829210096 ## mkdocs/docs/api.md: ## @@ -353,6 +353,127 @@ lat: [[52.371807,37.773972,53.11254],[53.21917]] long: [[4.896029,-122.431297,6.0989],[6.56667]] ``` +### Partial

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #931: URL: https://github.com/apache/iceberg-python/pull/931#discussion_r1829233234 ## pyiceberg/table/__init__.py: ## @@ -456,6 +461,89 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) for data_f

Re: [PR] Core, Puffin: Add DV file writer [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11476: URL: https://github.com/apache/iceberg/pull/11476#discussion_r1829658091 ## core/src/main/java/org/apache/iceberg/deletes/BaseDVFileWriter.java: ## @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Support partitioning spec during data file rewrites in Spark. [iceberg]

2024-11-05 Thread via GitHub
rdsarvar commented on PR #11368: URL: https://github.com/apache/iceberg/pull/11368#issuecomment-2457637165 > Thanks @rdsarvar , the part I'm a bit confused about is why we need a new `useSpec` API. I think the use case you described could be solved by adding a new spec, without setting it a

Re: [PR] Core, Puffin: Add DV file writer [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11476: URL: https://github.com/apache/iceberg/pull/11476#discussion_r1829672080 ## core/src/main/java/org/apache/iceberg/deletes/BaseDVFileWriter.java: ## @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-11-05 Thread via GitHub
ajantha-bhat commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1829687559 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor lice

Re: [PR] Spec: add variant type [iceberg]

2024-11-05 Thread via GitHub
aihuaxu commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1829687763 ## format/spec.md: ## @@ -1287,6 +1307,7 @@ Types are serialized according to this table: |**`struct`**|`JSON object: {`  `"type": "struct",`  `"fields": [ {``"i

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1829690607 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,16 @@ public interface ExpireSnapshots extends PendingUpdate> { * @retur

[I] Error while connecting to REST catalog using Spark [iceberg]

2024-11-05 Thread via GitHub
Gowthami03B opened a new issue, #11477: URL: https://github.com/apache/iceberg/issues/11477 ### Apache Iceberg version 1.4.3 ### Query engine Spark ### Please describe the bug 🐞 Spark config and code - ``` iceberg_rest = { "spark.sql.extensions"

Re: [PR] Core, Puffin: Add DV file writer [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11476: URL: https://github.com/apache/iceberg/pull/11476#discussion_r1829659538 ## core/src/main/java/org/apache/iceberg/io/StructCopy.java: ## @@ -21,8 +21,8 @@ import org.apache.iceberg.StructLike; /** Copy the StructLike's values into

Re: [PR] Core, Puffin: Add DV file writer [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11476: URL: https://github.com/apache/iceberg/pull/11476#discussion_r1829669340 ## data/src/test/java/org/apache/iceberg/io/TestDVWriters.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [PR] Spec: add variant type [iceberg]

2024-11-05 Thread via GitHub
aihuaxu commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1829674025 ## format/spec.md: ## @@ -1110,6 +1125,7 @@ Maps with non-string keys must use an array representation with the `map` logica |**`struct`**|`record`|| |**`list`**|`

Re: [PR] Core, Puffin: Add DV file writer [iceberg]

2024-11-05 Thread via GitHub
jbonofre commented on code in PR #11476: URL: https://github.com/apache/iceberg/pull/11476#discussion_r1829703181 ## core/src/main/java/org/apache/iceberg/io/StructCopy.java: ## @@ -21,8 +21,8 @@ import org.apache.iceberg.StructLike; /** Copy the StructLike's values into a n

Re: [PR] Docs: Fix verifying release candidate with Spark and Flink [iceberg]

2024-11-05 Thread via GitHub
jbonofre commented on code in PR #11461: URL: https://github.com/apache/iceberg/pull/11461#discussion_r1829706396 ## site/docs/how-to-release.md: ## @@ -435,10 +435,10 @@ spark-shell \ To verify using Flink, start a Flink SQL Client with the following command: ```bash -wget

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-11-05 Thread via GitHub
ajantha-bhat commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1829710550 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor lice

Re: [PR] Spark: Merge new position deletes with old deletes during writing [iceberg]

2024-11-05 Thread via GitHub
amogh-jahagirdar commented on PR #11273: URL: https://github.com/apache/iceberg/pull/11273#issuecomment-2457760335 >I am assuming since we went ahead with broadcasting approach, it sends it chunk by chunk using torrent broadcast as @aokolnychyi mentioned, so OOM not a problem ? I co

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1829747078 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,16 @@ public interface ExpireSnapshots extends PendingUpdate> { * @retur

Re: [PR] Spark: add property to disable client-side purging in spark [iceberg]

2024-11-05 Thread via GitHub
nastra commented on code in PR #11317: URL: https://github.com/apache/iceberg/pull/11317#discussion_r1828902360 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestRestDropPurgeTable.java: ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Spark: add property to disable client-side purging in spark [iceberg]

2024-11-05 Thread via GitHub
nastra commented on code in PR #11317: URL: https://github.com/apache/iceberg/pull/11317#discussion_r1828904623 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -60,6 +60,10 @@ public static Catalog wrap( return new CachingCatalog(catalog, caseSensitive,

Re: [PR] Updating configuration docs [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on PR #1292: URL: https://github.com/apache/iceberg-python/pull/1292#issuecomment-2456519145 @Samreay It looks like there are some formatting issues, could you run `make lint`? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Core: Support DVs in DeleteFileIndex [iceberg]

2024-11-05 Thread via GitHub
nastra commented on code in PR #11467: URL: https://github.com/apache/iceberg/pull/11467#discussion_r1828864978 ## core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -84,4 +85,17 @@ public static String referencedDataFileLocation(DeleteFile deleteFile) {

Re: [I] PyIceberg is not respecting `token` in the load table response [iceberg-python]

2024-11-05 Thread via GitHub
Fokko closed issue #1113: PyIceberg is not respecting `token` in the load table response URL: https://github.com/apache/iceberg-python/issues/1113 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Pass table-token to commit endpoint [iceberg-python]

2024-11-05 Thread via GitHub
Fokko merged PR #1278: URL: https://github.com/apache/iceberg-python/pull/1278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] PyIceberg is not respecting `token` in the load table response [iceberg-python]

2024-11-05 Thread via GitHub
Fokko closed issue #1113: PyIceberg is not respecting `token` in the load table response URL: https://github.com/apache/iceberg-python/issues/1113 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Bump mkdocs-material from 9.5.42 to 9.5.43 [iceberg-python]

2024-11-05 Thread via GitHub
Fokko merged PR #1288: URL: https://github.com/apache/iceberg-python/pull/1288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] Improve documentation on Configuration page [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on issue #1290: URL: https://github.com/apache/iceberg-python/issues/1290#issuecomment-2456516108 > Later on in the page in the Catalogs section, there are some ways detailed as to how to configure catalogs, but I'm not sure if this applies to writer configuration as well (

Re: [I] Improve documentation on Configuration page [iceberg-python]

2024-11-05 Thread via GitHub
Samreay commented on issue #1290: URL: https://github.com/apache/iceberg-python/issues/1290#issuecomment-2456530043 @Fokko I've updated the lint in the PR. Also, what are your thoughts on enabling Github discussions? I have a few questions about best practises and similar but I don't want

Re: [PR] Updating configuration docs [iceberg-python]

2024-11-05 Thread via GitHub
Samreay commented on PR #1292: URL: https://github.com/apache/iceberg-python/pull/1292#issuecomment-2456527904 Apologies, lint updates now in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] feat: abstract the MetricsEvaluator [iceberg-rust]

2024-11-05 Thread via GitHub
liurenjie1024 commented on issue #674: URL: https://github.com/apache/iceberg-rust/issues/674#issuecomment-2456639410 Good point, I believe this could be achieved by allowing injecting metrics evaluator into table scan planning to help pruning more unnecessary data files. -- This is an au

Re: [I] bug: ArrowSchemaConverter can't handle unsigned datatypes from arrow [iceberg-rust]

2024-11-05 Thread via GitHub
liurenjie1024 commented on issue #675: URL: https://github.com/apache/iceberg-rust/issues/675#issuecomment-2456650073 Iceberg has no built in support for unsigned data types, and I think we could handle this by storing signed value and do byte to byte conversion in read time. -- This is

Re: [PR] Updating configuration docs [iceberg-python]

2024-11-05 Thread via GitHub
Fokko merged PR #1292: URL: https://github.com/apache/iceberg-python/pull/1292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Updating configuration docs [iceberg-python]

2024-11-05 Thread via GitHub
Fokko commented on PR #1292: URL: https://github.com/apache/iceberg-python/pull/1292#issuecomment-2456651563 Thanks @Samreay for fixing this 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] feat: Support Parquet modular encryption [iceberg-rust]

2024-11-05 Thread via GitHub
liurenjie1024 commented on issue #686: URL: https://github.com/apache/iceberg-rust/issues/686#issuecomment-2456654659 Hi, @adamreeve thanks for raising this! I'm not aware of encryption yet, and of course we are always welcoming to contributions! -- This is an automated message from the A

Re: [I] Improve documentation on Configuration page [iceberg-python]

2024-11-05 Thread via GitHub
Fokko closed issue #1290: Improve documentation on Configuration page URL: https://github.com/apache/iceberg-python/issues/1290 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Flink: Port #11144 to v1.19 [iceberg]

2024-11-05 Thread via GitHub
pvary opened a new pull request, #11473: URL: https://github.com/apache/iceberg/pull/11473 Clean backport for #11144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Improve documentation on Configuration page [iceberg-python]

2024-11-05 Thread via GitHub
Samreay commented on issue #1290: URL: https://github.com/apache/iceberg-python/issues/1290#issuecomment-2456659007 Thanks @Fokko, I've joined the community slack and made a post to the compaction channel about a different topic, looking forward to getting more engaged with the community.

Re: [PR] Core, Data, Flink, Spark: Improve tableDir initialization for tests [iceberg]

2024-11-05 Thread via GitHub
Fokko commented on code in PR #11460: URL: https://github.com/apache/iceberg/pull/11460#discussion_r1829017722 ## core/src/test/java/org/apache/iceberg/TestMetrics.java: ## @@ -73,6 +72,7 @@ public static List parameters() { } @TempDir public Path temp; Review Comment:

Re: [I] Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes [iceberg]

2024-11-05 Thread via GitHub
davidyuan1223 commented on issue #11465: URL: https://github.com/apache/iceberg/issues/11465#issuecomment-2456694669 can we use the sql `select column_sizes from table.files` to get the reight size? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Spark 3.5: Fix flaky test due to temp directory not empty during delete [iceberg]

2024-11-05 Thread via GitHub
jbonofre commented on code in PR #11470: URL: https://github.com/apache/iceberg/pull/11470#discussion_r1829042748 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWrites.java: ## @@ -88,6 +87,8 @@ public static Collection parameters() { @Parame

Re: [I] Flink: TestMetadataTableReadableMetrics relies on Hardcoded File Sizes [iceberg]

2024-11-05 Thread via GitHub
pvary commented on issue #11465: URL: https://github.com/apache/iceberg/issues/11465#issuecomment-2456782288 > can we use the sql `select column_sizes from table.files` to get the right size? I would prefer @RussellSpitzer's suggestion to directly check the parquet file sizes. Otherw

Re: [PR] Core: Make PositionDeleteIndex serializable [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on PR #11463: URL: https://github.com/apache/iceberg/pull/11463#issuecomment-2456437229 Thanks for reviewing, @danielcweeks @amogh-jahagirdar! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Core, Data, Flink, Spark: Improve tableDir initialization for tests [iceberg]

2024-11-05 Thread via GitHub
nastra commented on code in PR #11460: URL: https://github.com/apache/iceberg/pull/11460#discussion_r1829076486 ## core/src/test/java/org/apache/iceberg/TestMetrics.java: ## @@ -73,6 +72,7 @@ public static List parameters() { } @TempDir public Path temp; Review Comment:

Re: [PR] Spark 3.5: Fix flaky test due to temp directory not empty during delete [iceberg]

2024-11-05 Thread via GitHub
nastra merged PR #11470: URL: https://github.com/apache/iceberg/pull/11470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Spark-3.5: make `where` sql case sensitive setting alterable in rewrite data files procedure [iceberg]

2024-11-05 Thread via GitHub
ludlows commented on PR #11439: URL: https://github.com/apache/iceberg/pull/11439#issuecomment-2456878191 Hi @szehon-ho , please review the test cases, should you have time. one possible problem is the type of exception is `IllegalArgumentException` here instead of the `ValidationExcep

Re: [I] Serialization of the org.apache.iceberg.io.WriteResult class. [iceberg]

2024-11-05 Thread via GitHub
simonykq commented on issue #10710: URL: https://github.com/apache/iceberg/issues/10710#issuecomment-2457936553 If you can not touch `DataFile` or `DeleteFile`, you could also register it using flink config: ``` pipeline.serialization-config: - org.apache.iceberg.io.WriteResu

[PR] Change delete granularity [iceberg]

2024-11-05 Thread via GitHub
amogh-jahagirdar opened a new pull request, #11478: URL: https://github.com/apache/iceberg/pull/11478 Depends on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] Serialization of the org.apache.iceberg.io.WriteResult class. [iceberg]

2024-11-05 Thread via GitHub
simonykq commented on issue #10710: URL: https://github.com/apache/iceberg/issues/10710#issuecomment-2457942633 Btw, I found a way to get this to work (without enabling generic types, but still use kyro to serialize the write result ender the hood). First create a class called `Write

Re: [I] How to reinitialize/refresh iceberg catalog object in spark catalog on an ongoing spark session [iceberg]

2024-11-05 Thread via GitHub
nerstak commented on issue #10227: URL: https://github.com/apache/iceberg/issues/10227#issuecomment-2457944896 Hello! With the following use case, it does not seems to be feasible. Is there an alternative? ```scala scala> import org.apache.spark.sql.SparkSession val sc = Spar

Re: [PR] Core, Puffin: Add DV file writer [iceberg]

2024-11-05 Thread via GitHub
aokolnychyi commented on code in PR #11476: URL: https://github.com/apache/iceberg/pull/11476#discussion_r1829922422 ## core/src/main/java/org/apache/iceberg/deletes/BaseDVFileWriter.java: ## @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1829925773 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,16 @@ public interface ExpireSnapshots extends PendingUpdate> { * @retur

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1829925773 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,16 @@ public interface ExpireSnapshots extends PendingUpdate> { * @retur

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2024-11-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1829925773 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,16 @@ public interface ExpireSnapshots extends PendingUpdate> { * @retur

Re: [PR] API: Add Variant data type [iceberg]

2024-11-05 Thread via GitHub
aihuaxu commented on code in PR #11324: URL: https://github.com/apache/iceberg/pull/11324#discussion_r1829929729 ## api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java: ## @@ -562,7 +563,7 @@ private static String sanitize(Literal literal, long now, int today)

  1   2   >