Re: [PR] Support timestamp type in partition string when importing files [iceberg]

2024-01-12 Thread via GitHub
Zhile commented on PR #7291: URL: https://github.com/apache/iceberg/pull/7291#issuecomment-1888612281 Also meet the same error when migrating the Hive tables to Iceberg tables. The partition in Hive is like `event_date=2023-04-27 11%3A00%3A00.0` anybody working on this or https://gith

Re: [PR] AES GCM Stream changes [iceberg]

2024-01-12 Thread via GitHub
ggershinsky commented on code in PR #9453: URL: https://github.com/apache/iceberg/pull/9453#discussion_r1450011622 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmOutputStream.java: ## @@ -56,6 +58,7 @@ public class AesGcmOutputStream extends PositionOutputStream {

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-12 Thread via GitHub
pvary commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1450053946 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkVersionDetector.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Spark: propagate snapshot properties for RewriteDataFiles and RewritePositionDeleteFiles [iceberg]

2024-01-12 Thread via GitHub
advancedxy commented on PR #9449: URL: https://github.com/apache/iceberg/pull/9449#issuecomment-1888764887 Gently ping @ajantha-bhat @ajantha-bhat @szehon-ho @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Issues/7420 [iceberg]

2024-01-12 Thread via GitHub
akshayjain3450 closed pull request #7426: Issues/7420 URL: https://github.com/apache/iceberg/pull/7426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: i

[I] AWS: Support setting description for Glue table [iceberg]

2024-01-12 Thread via GitHub
lkokhreidze opened a new issue, #9462: URL: https://github.com/apache/iceberg/issues/9462 ### Feature Request / Improvement Iceberg Glue integration supports setting database description as per https://github.com/apache/iceberg/pull/3467 It would be also beneficial to support setti

Re: [I] AWS: Support setting description for Glue table [iceberg]

2024-01-12 Thread via GitHub
lkokhreidze commented on issue #9462: URL: https://github.com/apache/iceberg/issues/9462#issuecomment-1889283974 Seems like a small patch and happy to take this on myself if it makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[I] Getting history of Iceberg table in BigQuery [iceberg]

2024-01-12 Thread via GitHub
PiotrBB opened a new issue, #9463: URL: https://github.com/apache/iceberg/issues/9463 ### Query engine BigQuery ### Question Hi, I just created few Iceberg tables in BigQuery using this guide: https://www.youtube.com/watch?v=IQR9gJuLXbQ Tables are there, I can see

Re: [PR] Apply Name mapping [iceberg-python]

2024-01-12 Thread via GitHub
syun64 commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1450524326 ## pyiceberg/io/pyarrow.py: ## @@ -698,77 +708,143 @@ def before_field(self, field: pa.Field) -> None: def after_field(self, field: pa.Field) -> None:

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1889413720 Thanks for your input. I tried [rewrite_position_delete_files](https://iceberg.apache.org/docs/1.3.1/spark-procedures/#rewrite_position_delete_files) but no luck. -- This is an a

Re: [PR] Build: Upgrade to gradle 8.4 [iceberg]

2024-01-12 Thread via GitHub
jbonofre commented on PR #8486: URL: https://github.com/apache/iceberg/pull/8486#issuecomment-1889450052 I'm back on this one, sorry for the delay. I will share the gradle 8.5 update + revapi approach. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-12 Thread via GitHub
szehon-ho commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1450615797 ## format/spec.md: ## @@ -329,19 +329,35 @@ The `void` transform may be used to replace the transform in an existing partiti Bucket Transform Details -Buck

Re: [PR] #154 : Add homepage to Cargo.toml [iceberg-rust]

2024-01-12 Thread via GitHub
hiirrxnn commented on PR #160: URL: https://github.com/apache/iceberg-rust/pull/160#issuecomment-1889638975 Please review ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
stevenzwu commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1889639566 first of all, current Flink Iceberg source (FLIP-27 or old) doesn't support streaming read with row-level deletes. It only read append-only snapshots/commits. > is it possi

Re: [I] Support partitioned writes [iceberg-python]

2024-01-12 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1889656103 >How are we going to fan out the writing of the data. We have an Arrow table, what is an efficient way to compute the partitions and scale out the work. For example, are we going

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-12 Thread via GitHub
gjacoby126 commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1450769376 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkVersionDetector.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1889782916 Thanks for the answer @stevenzwu, no you are right. I know we shouldn't should streaming on Iceberg other than append only tables. But we don't do stream in this case. Every day we

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1889832176 It still feels weird to allow that big of a split to be created. Wouldn't it possible to make the deleted files lazy and rather be loaded in the respective task node, instead of the

Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]

2024-01-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #9414: URL: https://github.com/apache/iceberg/pull/9414#discussion_r1450860911 ## core/src/main/java/org/apache/iceberg/view/ViewVersionReplace.java: ## @@ -56,30 +63,71 @@ public ViewVersion apply() { } ViewMetadata internalApply

Re: [I] Support partitioned writes [iceberg-python]

2024-01-12 Thread via GitHub
Fokko commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1889891973 I currently see two approaches: - First get the unique partitions, and then filter for each of the partitions the relevant data. It is nice that we know the partition upfron

Re: [PR] chore: Update contributing guide. [iceberg-rust]

2024-01-12 Thread via GitHub
Fokko merged PR #163: URL: https://github.com/apache/iceberg-rust/pull/163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] chore: Update reader api status [iceberg-rust]

2024-01-12 Thread via GitHub
Fokko merged PR #162: URL: https://github.com/apache/iceberg-rust/pull/162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [I] Discussion: Rethink `PrimitiveLiteral`. [iceberg-rust]

2024-01-12 Thread via GitHub
Fokko commented on issue #159: URL: https://github.com/apache/iceberg-rust/issues/159#issuecomment-1889992180 > 2. The parse string method implemented in pyiceberg is not a typical approach in rust. Rust has elegant support for macros, which is efficient and type safe. This was more

Re: [PR] Apply Name mapping [iceberg-python]

2024-01-12 Thread via GitHub
Fokko commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1449403730 ## pyiceberg/io/pyarrow.py: ## @@ -620,9 +624,18 @@ def _combine_positional_deletes(positional_deletes: List[pa.ChunkedArray], rows: return np.setdiff1d(np.ara

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-12 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1450968650 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + * Estima

Re: [PR] Arrow: Set field-id with prefix [iceberg-python]

2024-01-12 Thread via GitHub
HonahX merged PR #227: URL: https://github.com/apache/iceberg-python/pull/227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Apply Name mapping [iceberg-python]

2024-01-12 Thread via GitHub
Fokko commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1450994558 ## pyiceberg/io/pyarrow.py: ## @@ -698,77 +708,143 @@ def before_field(self, field: pa.Field) -> None: def after_field(self, field: pa.Field) -> None:

Re: [PR] Write support [iceberg-python]

2024-01-12 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1451020865 ## pyiceberg/table/snapshots.py: ## @@ -277,23 +278,30 @@ def _truncate_table_summary(summary: Summary, previous_summary: Mapping[str, str }: summary[pr

Re: [PR] Apply Name mapping [iceberg-python]

2024-01-12 Thread via GitHub
syun64 commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1451023397 ## pyiceberg/io/pyarrow.py: ## @@ -698,77 +708,143 @@ def before_field(self, field: pa.Field) -> None: def after_field(self, field: pa.Field) -> None:

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
stevenzwu commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1890156116 ah. I didn't know it is a batch read mode using `asOfSnapshotId`. note that they are `delete` (not `deleted`) files to capture the row-level deletes. the actual files are not load

Re: [I] The out-of-order problem occurs around the process of recovery [iceberg]

2024-01-12 Thread via GitHub
github-actions[bot] closed issue #6756: The out-of-order problem occurs around the process of recovery URL: https://github.com/apache/iceberg/issues/6756 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Improving Scala Checkstyle/Spotless configurations [iceberg]

2024-01-12 Thread via GitHub
github-actions[bot] commented on issue #6736: URL: https://github.com/apache/iceberg/issues/6736#issuecomment-1890167304 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] The out-of-order problem occurs around the process of recovery [iceberg]

2024-01-12 Thread via GitHub
github-actions[bot] commented on issue #6756: URL: https://github.com/apache/iceberg/issues/6756#issuecomment-1890167270 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Be able to add extra snapshot metadata on sql statement with pyspark [iceberg]

2024-01-12 Thread via GitHub
github-actions[bot] commented on issue #6754: URL: https://github.com/apache/iceberg/issues/6754#issuecomment-1890167281 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Be able to add extra snapshot metadata on sql statement with pyspark [iceberg]

2024-01-12 Thread via GitHub
github-actions[bot] closed issue #6754: Be able to add extra snapshot metadata on sql statement with pyspark URL: https://github.com/apache/iceberg/issues/6754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Improving Scala Checkstyle/Spotless configurations [iceberg]

2024-01-12 Thread via GitHub
github-actions[bot] closed issue #6736: Improving Scala Checkstyle/Spotless configurations URL: https://github.com/apache/iceberg/issues/6736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1890232951 I had the changes and test working in my local. It a bit tricky to just use `writeBytes`. DataOutputSerializer.writeBytes is buggy and increasing the buffer position twice. Do you

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too ma… [iceberg]

2024-01-12 Thread via GitHub
javrasya commented on code in PR #9464: URL: https://github.com/apache/iceberg/pull/9464#discussion_r1451078255 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java: ## @@ -147,7 +147,7 @@ byte[] serializeV2() throws IOException {

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too ma… [iceberg]

2024-01-12 Thread via GitHub
javrasya commented on code in PR #9464: URL: https://github.com/apache/iceberg/pull/9464#discussion_r1451079177 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java: ## @@ -166,12 +166,19 @@ static IcebergSourceSplit deserializeV2(byte

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1890244814 I took the liberty and created the PR since I had the changes locally. Hope you guys don't mind 🙏 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Spark: Support min/max/count push down for partition columns [iceberg]

2024-01-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #9457: URL: https://github.com/apache/iceberg/pull/9457#discussion_r1451098884 ## api/src/main/java/org/apache/iceberg/expressions/AggregateEvaluator.java: ## @@ -83,6 +83,18 @@ public void update(DataFile file) { } } + public

Re: [I] Getting history of Iceberg table in BigQuery [iceberg]

2024-01-12 Thread via GitHub
amogh-jahagirdar closed issue #9463: Getting history of Iceberg table in BigQuery URL: https://github.com/apache/iceberg/issues/9463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Getting history of Iceberg table in BigQuery [iceberg]

2024-01-12 Thread via GitHub
amogh-jahagirdar commented on issue #9463: URL: https://github.com/apache/iceberg/issues/9463#issuecomment-1890266500 This is a vendor specific issue, I recommend reaching out to GCP BigQuery support with this request. Closing. -- This is an automated message from the Apache Git Service.

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-01-12 Thread via GitHub
pvary commented on code in PR #9464: URL: https://github.com/apache/iceberg/pull/9464#discussion_r1451270311 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java: ## @@ -166,12 +166,19 @@ static IcebergSourceSplit deserializeV2(byte[]

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
stevenzwu commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1890332322 > Yes I did use Spark ([rewrite_position_delete_files](https://iceberg.apache.org/docs/1.3.1/spark-procedures/#rewrite_position_delete_files)) to clean up positional deletes. Maybe

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-01-12 Thread via GitHub
pvary commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-1890335313 Let's take a step back before rushing to a solution. Here are some things we have to solve: - Serializing long Strings - Serializing extra chars, like Chinese chars - Backward comp

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
pvary commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1890336927 > But I still have too many equality deletes which is still causing the limit excess. Did I use the wrong spark procedure? I think RewriteDataFilesAction could help you there. If

Re: [PR] Backporting Flink: Watermark Read Options to 1.17 and 1.16 [iceberg]

2024-01-12 Thread via GitHub
pvary merged PR #9456: URL: https://github.com/apache/iceberg/pull/9456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Backporting Flink: Watermark Read Options to 1.17 and 1.16 [iceberg]

2024-01-12 Thread via GitHub
pvary commented on PR #9456: URL: https://github.com/apache/iceberg/pull/9456#issuecomment-1890337920 Thanks @rodmeneses for the backport, and @stevenzwu for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] #154 : Add homepage to Cargo.toml [iceberg-rust]

2024-01-12 Thread via GitHub
Fokko merged PR #160: URL: https://github.com/apache/iceberg-rust/pull/160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] #154 : Add homepage to Cargo.toml [iceberg-rust]

2024-01-12 Thread via GitHub
Fokko commented on PR #160: URL: https://github.com/apache/iceberg-rust/pull/160#issuecomment-1890339709 Thanks @hiirrxnn for working on this 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] #154 : Add homepage to Cargo.toml [iceberg-rust]

2024-01-12 Thread via GitHub
hiirrxnn commented on PR #160: URL: https://github.com/apache/iceberg-rust/pull/160#issuecomment-1890340610 I'm glad to have collaborated with you guys! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-12 Thread via GitHub
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1890349330 Thank you both. You are right @stevenzwu , it is sad that there is no implementation yet for `ConvertEqualityDeleteFiles`. I will give `RewriteDataFilesAction` a try and let you kn

Re: [I] Add the homepage to the `Cargo.toml` [iceberg-rust]

2024-01-12 Thread via GitHub
liurenjie1024 closed issue #154: Add the homepage to the `Cargo.toml` URL: https://github.com/apache/iceberg-rust/issues/154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add the homepage to the `Cargo.toml` [iceberg-rust]

2024-01-12 Thread via GitHub
liurenjie1024 commented on issue #154: URL: https://github.com/apache/iceberg-rust/issues/154#issuecomment-1890363265 Closed by #160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific