Re: [PR] Bump Nessie to 0.76.0 [iceberg]

2024-01-03 Thread via GitHub
Fokko merged PR #9398: URL: https://github.com/apache/iceberg/pull/9398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Bump Nessie to 0.76.0 [iceberg]

2024-01-03 Thread via GitHub
Fokko commented on PR #9398: URL: https://github.com/apache/iceberg/pull/9398#issuecomment-1874986151 Thanks @snazy for the PR, and @jbonofre, @ajantha-bhat and @dimas-b for the prompt review 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Correct schema behavior [iceberg-python]

2024-01-03 Thread via GitHub
Fokko commented on code in PR #247: URL: https://github.com/apache/iceberg-python/pull/247#discussion_r1440199047 ## pyiceberg/table/__init__.py: ## @@ -942,15 +942,16 @@ def snapshot(self) -> Optional[Snapshot]: return self.table.current_snapshot() def projectio

Re: [PR] Correct schema behavior [iceberg-python]

2024-01-03 Thread via GitHub
Fokko commented on code in PR #247: URL: https://github.com/apache/iceberg-python/pull/247#discussion_r1440202999 ## pyiceberg/table/__init__.py: ## @@ -942,15 +942,16 @@ def snapshot(self) -> Optional[Snapshot]: return self.table.current_snapshot() def projectio

Re: [PR] Allow filtering on newly added columns [iceberg-python]

2024-01-03 Thread via GitHub
Fokko commented on code in PR #246: URL: https://github.com/apache/iceberg-python/pull/246#discussion_r1440208230 ## tests/test_integration.py: ## @@ -373,6 +379,15 @@ def test_scan_branch(test_positional_mor_deletes: Table) -> None: assert arrow_table["number"].to_pylist(

Re: [PR] Allow filtering on newly added columns [iceberg-python]

2024-01-03 Thread via GitHub
Fokko commented on code in PR #246: URL: https://github.com/apache/iceberg-python/pull/246#discussion_r1440214517 ## pyiceberg/expressions/visitors.py: ## @@ -906,7 +906,16 @@ def visit_bound_predicate(self, predicate: BoundPredicate[L]) -> BooleanExpressi def translate_co

Re: [PR] Allow filtering on newly added columns [iceberg-python]

2024-01-03 Thread via GitHub
Fokko merged PR #246: URL: https://github.com/apache/iceberg-python/pull/246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [I] Incorrect filtering on newly added columns [iceberg-python]

2024-01-03 Thread via GitHub
Fokko closed issue #217: Incorrect filtering on newly added columns URL: https://github.com/apache/iceberg-python/issues/217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Fix PR link in 1.4.3 release notes [iceberg-docs]

2024-01-03 Thread via GitHub
manuzhang opened a new pull request, #299: URL: https://github.com/apache/iceberg-docs/pull/299 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] Bug fix falsy value of zero [iceberg-python]

2024-01-03 Thread via GitHub
MehulBatra opened a new pull request, #249: URL: https://github.com/apache/iceberg-python/pull/249 Resolves: [](https://github.com/apache/iceberg-python/issues/232) Python, certain values are considered False in a boolean context. These include None, 0, empty sequences/collections (`''`, 

Re: [PR] Bug fix falsy value of zero [iceberg-python]

2024-01-03 Thread via GitHub
MehulBatra commented on PR #249: URL: https://github.com/apache/iceberg-python/pull/249#issuecomment-1875156254 Tested Locally all three seems to be passing ``` class DummyClass: def __init__(self, metadata): self.metadata = metadata def test_snapshot_id

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440319498 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + * Estima

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440319918 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + * Estima

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440322057 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440325074 ## core/src/main/java/org/apache/iceberg/deletes/PositionDeleteIndex.java: ## @@ -44,4 +44,14 @@ public interface PositionDeleteIndex { /** Returns true if thi

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440331799 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440335582 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440336894 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440362963 ## data/src/main/java/org/apache/iceberg/data/DeleteFilter.java: ## @@ -224,14 +223,10 @@ public Predicate eqDeletedRowFilter() { } public PositionDeleteInd

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440366952 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440367424 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440368146 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440372097 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440372097 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440382646 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether to

Re: [PR] Fix ParallelIterable memory leak because queue continues to be added even if iterator exited [iceberg]

2024-01-03 Thread via GitHub
Heltman commented on PR #9402: URL: https://github.com/apache/iceberg/pull/9402#issuecomment-1875282246 see #7844 for whole discuss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Core: Add param to limit manifest parallel reader queue size [iceberg]

2024-01-03 Thread via GitHub
Heltman commented on PR #7844: URL: https://github.com/apache/iceberg/pull/7844#issuecomment-1875283687 > I will add a some change for fix memory leak. And think about creating BlockingParallelIterable instead of change ParallelIterable. I add a new pr just fix memory leak. See #9402.

Re: [I] Can iceberg support truncating table? [iceberg]

2024-01-03 Thread via GitHub
jhchee commented on issue #9387: URL: https://github.com/apache/iceberg/issues/9387#issuecomment-1875290566 You could remove table entry from your catalog and create new table within the same directory. This should preserve all your files. -- This is an automated message from the Apache

[PR] Flink: Backport #9308 to v1.17 and the relevant parts to v1.16 [iceberg]

2024-01-03 Thread via GitHub
pvary opened a new pull request, #9403: URL: https://github.com/apache/iceberg/pull/9403 Clean backport of #9308 to Flink 1.17 In 1.16, the `pauseOrResumeSplits` is not needed, but backported the other parts, so the code similar between the Flink versions. -- This is an automated m

Re: [PR] API, Core: Move SQLViewRepresentation to API [iceberg]

2024-01-03 Thread via GitHub
pvary commented on PR #9302: URL: https://github.com/apache/iceberg/pull/9302#issuecomment-1875336457 @nastra: I think we can skip this for now - still think this should be some caching issue on gradle side, which is very hard to repro so not too many people is affected -- This is an aut

Re: [I] Spark DataFrame write fails if input dataframe has columns in different order than iceberg schema [iceberg]

2024-01-03 Thread via GitHub
amitmittal5 commented on issue #741: URL: https://github.com/apache/iceberg/issues/741#issuecomment-1875395560 > Hello, is this issue resolved? I am still getting this issue in iceberg 1.4.2 while trying to write in iceberg format to ADLS using spark-streaming. It was actually resolve

Re: [I] When using the Flink upsert mode, the speed of reading data from the iceberg table is very slow. [iceberg]

2024-01-03 Thread via GitHub
pvary commented on issue #9363: URL: https://github.com/apache/iceberg/issues/9363#issuecomment-1875610859 @13535048320: How do you populate the data? Is it a requirement to update the previous records based on the incoming new data, or every record is new? If you have delete files generate

Re: [I] Flink API rewriteDataFile How to set up scanning based on file size [iceberg]

2024-01-03 Thread via GitHub
pvary commented on issue #9386: URL: https://github.com/apache/iceberg/issues/9386#issuecomment-187561 If a file is bigger than the TARGET_FILE_SIZE, it will create multiple splits when we read it. The last split of the file is a good candidate to add to merge with a new split, so it co

Re: [PR] Flink: Backport #9308 to v1.17 and the relevant parts to v1.16 [iceberg]

2024-01-03 Thread via GitHub
pvary merged PR #9403: URL: https://github.com/apache/iceberg/pull/9403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Flink: Backport #9308 to v1.17 and the relevant parts to v1.16 [iceberg]

2024-01-03 Thread via GitHub
pvary commented on PR #9403: URL: https://github.com/apache/iceberg/pull/9403#issuecomment-1875674967 Thanks for the review @stevenzwu! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[I] Snowflake Iceberg Partitioned data read issue [iceberg]

2024-01-03 Thread via GitHub
purna344 opened a new issue, #9404: URL: https://github.com/apache/iceberg/issues/9404 ### Feature Request / Improvement We are using Snowflake Iceberg to read the data from the S3 location and that is working fine for the non partitioned data. But If the data is partitioned a

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
Fokko commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1440607629 ## crates/iceberg/src/spec/snapshot.rs: ## @@ -124,6 +150,70 @@ impl Snapshot { Utc.timestamp_millis_opt(self.timestamp_ms).unwrap() } +/// Get the

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-01-03 Thread via GitHub
rdblue commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1875787341 @jasonf20, to make that work, I think you'd need to keep track of a base sequence number and update the metadata for each new manifest with the correct sequence number when the manifest li

Re: [PR] Deliver key metadata for encryption of data files [iceberg]

2024-01-03 Thread via GitHub
rdblue commented on code in PR #9359: URL: https://github.com/apache/iceberg/pull/9359#discussion_r1440793961 ## core/src/main/java/org/apache/iceberg/encryption/StandardKeyMetadata.java: ## @@ -31,7 +31,7 @@ import org.apache.iceberg.relocated.com.google.common.collect.Immutab

Re: [PR] Spark: Add actions for disaster recovery. [iceberg]

2024-01-03 Thread via GitHub
flyrain commented on PR #4705: URL: https://github.com/apache/iceberg/pull/4705#issuecomment-1875808007 Hi @laithalzyoud, glad you found this useful. Would you like to take the lead for this task? I could be the co-author if that makes sense to you. I can help on the review, but we will sti

Re: [PR] Deliver key metadata for encryption of data files [iceberg]

2024-01-03 Thread via GitHub
rdblue commented on code in PR #9359: URL: https://github.com/apache/iceberg/pull/9359#discussion_r1440814517 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -53,6 +58,7 @@ abstract class BaseBatchReader extends BaseReader newBatchI

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440883043 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -749,4 +795,23 @@ public void applyEndpointConfigurations(T builder) { builder

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440883043 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -749,4 +795,23 @@ public void applyEndpointConfigurations(T builder) { builder

Re: [PR] Bug fix falsy value of zero [iceberg-python]

2024-01-03 Thread via GitHub
Fokko commented on code in PR #249: URL: https://github.com/apache/iceberg-python/pull/249#discussion_r1440884348 ## pyiceberg/table/__init__.py: ## @@ -545,7 +545,7 @@ def new_snapshot_id(self) -> int: def current_snapshot(self) -> Optional[Snapshot]: """Get the

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440914203 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3AccessGrantsPluginConfigurations.java: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440915164 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -50,6 +51,23 @@ public class S3FileIOProperties implements Serializable { */ pub

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440915860 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -684,6 +715,22 @@ private Set toS3Tags(Map properties, String prefix) { .coll

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440921673 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + * Estima

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440925641 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -749,4 +796,47 @@ public void applyEndpointConfigurations(T builder) { builder

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440929838 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,59 @@ private static void checkSchemaCompatibility( } } + public static lo

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440930608 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether t

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440930608 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether t

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440931514 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1875952717 @singhpk234 @RussellSpitzer @szehon-ho, I rebased this. I addressed most comments, I am working on tests and docs. There are a few open questions too. I'll take a look at them tomorro

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440934324 ## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ## @@ -125,6 +126,25 @@ public static StructLikeSet toEqualitySet( } } + public static Ch

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440935178 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440937961 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + * Estima

Re: [PR] Write support [iceberg-python]

2024-01-03 Thread via GitHub
robtandy commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1440972324 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id]

[PR] API: Fix JavaDoc on UpdateSchema#updateColumnDoc [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar opened a new pull request, #9405: URL: https://github.com/apache/iceberg/pull/9405 This change fixes the JavaDoc on UpdateSchema#updateColumnDoc; previously it was referring to rename (looked to just be a bad copy paste) and now the JavaDoc reflects the actual operation bei

[PR] Build: Bump sqlalchemy from 2.0.24 to 2.0.25 [iceberg-python]

2024-01-03 Thread via GitHub
dependabot[bot] opened a new pull request, #250: URL: https://github.com/apache/iceberg-python/pull/250 Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 2.0.24 to 2.0.25. Release notes Sourced from https://github.com/sqlalchemy/sqlalchemy/releases";>sqlalchemy's r

Re: [PR] API: Fix Javadoc on UpdateSchema#updateColumnDoc [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on PR #9405: URL: https://github.com/apache/iceberg/pull/9405#issuecomment-1876077427 Thanks @Fokko for the review! Will merge after CI completes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] API: Fix Javadoc on UpdateSchema#updateColumnDoc [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar merged PR #9405: URL: https://github.com/apache/iceberg/pull/9405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
adnanhemani commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441103728 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3AccessGrantsPluginConfigurations.java: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [I] Snowflake Iceberg Partitioned data read issue [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on issue #9404: URL: https://github.com/apache/iceberg/issues/9404#issuecomment-1876122199 I ultimately recommend continue reaching out to Snowflake on any issues you are encountering on Iceberg integration, but the Spark behavior in the reported issue does seem r

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
adnanhemani commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441105061 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -50,6 +51,23 @@ public class S3FileIOProperties implements Serializable { */ pu

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
adnanhemani commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441105763 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -684,6 +715,22 @@ private Set toS3Tags(Map properties, String prefix) { .col

Re: [I] Hive ping functionality seems to leak threads [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] commented on issue #7034: URL: https://github.com/apache/iceberg/issues/7034#issuecomment-1876137904 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Hive ping functionality seems to leak threads [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] closed issue #7034: Hive ping functionality seems to leak threads URL: https://github.com/apache/iceberg/issues/7034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Documentation improvements in regards to time travel [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] commented on issue #7000: URL: https://github.com/apache/iceberg/issues/7000#issuecomment-1876137932 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Documentation improvements in regards to time travel [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] closed issue #7000: Documentation improvements in regards to time travel URL: https://github.com/apache/iceberg/issues/7000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] How does iceberg ensure the correctness of data writing under high concurrency [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] closed issue #6885: How does iceberg ensure the correctness of data writing under high concurrency URL: https://github.com/apache/iceberg/issues/6885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] How does iceberg ensure the correctness of data writing under high concurrency [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] commented on issue #6885: URL: https://github.com/apache/iceberg/issues/6885#issuecomment-1876137969 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
adnanhemani commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441141271 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -749,4 +796,47 @@ public void applyEndpointConfigurations(T builder) { builde

Re: [I] Partitioned table folder creation behaviour [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar closed issue #9388: Partitioned table folder creation behaviour URL: https://github.com/apache/iceberg/issues/9388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441176818 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + * Estimate

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441177965 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441178910 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441178910 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441180949 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + * Estimate

Re: [PR] Flink: Watermark read options [iceberg]

2024-01-03 Thread via GitHub
stevenzwu commented on code in PR #9346: URL: https://github.com/apache/iceberg/pull/9346#discussion_r1441115126 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/IcebergTableSource.java: ## @@ -131,16 +131,17 @@ private DataStream createDataStream(StreamExecut

Re: [I] doc: rust.iceberg.apache.org is not resolved [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on issue #137: URL: https://github.com/apache/iceberg-rust/issues/137#issuecomment-1876198271 Seems still not working. Do we have any way to debug this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] dropDeleteFilesOlderthan should be partition level instead of table level [iceberg]

2024-01-03 Thread via GitHub
manuzhang commented on issue #9383: URL: https://github.com/apache/iceberg/issues/9383#issuecomment-1876206670 > I am seeing v2 tables (partitioned tables) having delete files retained in partitions but those delete files wont apply to any data files within that partition. This is me

Re: [I] How does Iceberg support writing data to local paths, network disks, interfaces, and other storage media [iceberg]

2024-01-03 Thread via GitHub
manuzhang commented on issue #9378: URL: https://github.com/apache/iceberg/issues/9378#issuecomment-1876210607 It depends on your catalog `io-impl`. Take https://iceberg.apache.org/docs/latest/aws/#spark as an example. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Core: Remove deprecated method from BaseMetadataTable [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on PR #9298: URL: https://github.com/apache/iceberg/pull/9298#issuecomment-1876211849 Sorry for the delay in review on this @ajantha-bhat , I'll take a look at this tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Core: remove statistic files in CatalogUtil:dropTableData [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar merged PR #9305: URL: https://github.com/apache/iceberg/pull/9305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

[I] manifest list missing error after "cannot commit table due to base location not same as glue location" [iceberg]

2024-01-03 Thread via GitHub
waichee opened a new issue, #9406: URL: https://github.com/apache/iceberg/issues/9406 ### Apache Iceberg version 1.3.1 ### Query engine Spark ### Please describe the bug 🐞 **Setup** We use the following spark libraries to write to Iceberg on EMR: `org.

Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.1 [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on PR #9371: URL: https://github.com/apache/iceberg/pull/9371#issuecomment-1876239430 I'm actually quite confused why we need spring boot dependencies in the project? If we could remove that, that would be ideal. -- This is an automated message from the Apache G

Re: [PR] JMH: Improvements to `jmh.gradle` [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar merged PR #9390: URL: https://github.com/apache/iceberg/pull/9390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441243904 ## crates/iceberg/Cargo.toml: ## @@ -62,4 +62,5 @@ uuid = { workspace = true } [dev-dependencies] pretty_assertions = { workspace = true } tempfile = { work

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441243904 ## crates/iceberg/Cargo.toml: ## @@ -62,4 +62,5 @@ uuid = { workspace = true } [dev-dependencies] pretty_assertions = { workspace = true } tempfile = { work

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441248686 ## crates/iceberg/src/spec/snapshot.rs: ## @@ -124,6 +150,70 @@ impl Snapshot { Utc.timestamp_millis_opt(self.timestamp_ms).unwrap() } +///

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441249066 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441249437 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441249808 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441250114 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441251310 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -38,6 +38,12 @@ static MAIN_BRANCH: &str = "main"; static DEFAULT_SPEC_ID: i32 = 0; static DEFAULT_SORT_O

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441252030 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441253067 ## crates/iceberg/src/spec/manifest.rs: ## @@ -819,6 +849,49 @@ impl ManifestEntry { ManifestStatus::Added | ManifestStatus::Existing )

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441253820 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -628,6 +630,30 @@ impl TryFrom for ManifestContentType { } } +impl ManifestListEntry { Review Comme

[I] refactor: Rename [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 opened a new issue, #145: URL: https://github.com/apache/iceberg-rust/issues/145 I'm confused by the naming, should this be a `ManifestFile`? From the [spec](https://iceberg.apache.org/spec/#manifest-lists): Manifest list files store `manifest_file`,

  1   2   >