Re: [PR] Support Location Providers [iceberg-python]

2024-12-20 Thread via GitHub
smaheshwar-pltr commented on code in PR #1452: URL: https://github.com/apache/iceberg-python/pull/1452#discussion_r1893981856 ## pyiceberg/table/locations.py: ## @@ -0,0 +1,82 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreeme

Re: [I] Issue when connecting to REST catalogs on AWS ( Amazon SageMaker Lakehouse) [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on issue #1449: URL: https://github.com/apache/iceberg-python/issues/1449#issuecomment-2557247096 I have not tested this personally but from reading the AWS blog on connect Spark to AWS Glue Iceberg REST catalog, there are some configurations that are different from wh

Re: [PR] Add an e2e test for writing an Iceberg table with pyiceberg and reading it with DataFusion [iceberg-rust]

2024-12-20 Thread via GitHub
kevinjqliu commented on code in PR #825: URL: https://github.com/apache/iceberg-rust/pull/825#discussion_r1894119574 ## crates/integration_tests/testdata/pyiceberg/load_types_table.py: ## @@ -0,0 +1,79 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more c

Re: [PR] Remove number from assert description [iceberg]

2024-12-20 Thread via GitHub
TQJADE commented on PR #11827: URL: https://github.com/apache/iceberg/pull/11827#issuecomment-2557293121 Thanks a lot for your reviewing @szehon-ho @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2024-12-20 Thread via GitHub
singhpk234 commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1894120389 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -84,13 +86,30 @@ private boolean shouldRewrite(List group) { return enoughI

Re: [PR] Remove deprecation warnings in test [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1416: URL: https://github.com/apache/iceberg-python/pull/1416#issuecomment-2557233390 done! @Fokko can you take another look? ran locally no warnings in `make-test` -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Support Location Providers [iceberg-python]

2024-12-20 Thread via GitHub
smaheshwar-pltr commented on code in PR #1452: URL: https://github.com/apache/iceberg-python/pull/1452#discussion_r1893986378 ## pyiceberg/table/locations.py: ## @@ -0,0 +1,82 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreeme

Re: [PR] Remove deprecation warnings in test [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on code in PR #1416: URL: https://github.com/apache/iceberg-python/pull/1416#discussion_r1894101745 ## tests/expressions/test_parser.py: ## @@ -70,7 +70,6 @@ def test_equals_false() -> None: def test_is_null() -> None: assert IsNull("foo") == parser.pa

[PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2024-12-20 Thread via GitHub
gaborkaszab opened a new pull request, #11837: URL: https://github.com/apache/iceberg/pull/11837 The current implementation uses the deleteFile() of the FileIO even if it supports bulk operations. Even though the user of the RemoveSnapshots API can provide a custom Consumer to perform bulk

Re: [PR] feat(datafusion): Support cast operations [iceberg-rust]

2024-12-20 Thread via GitHub
Fokko commented on PR #821: URL: https://github.com/apache/iceberg-rust/pull/821#issuecomment-2557306494 @Xuanwo Thanks, I was noodling a bit on this one, I want to make some changes to make it safer for the user before marking this as ready. -- This is an automated message from the Apach

Re: [PR] Add plan tasks for TableScan [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#issuecomment-2557356375 Thanks everyone for the great discussion here! To summarize the thread above, I think the main concern here is around exposing this functionality as part of PyIceberg's `DataSca

Re: [PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2024-12-20 Thread via GitHub
gaborkaszab commented on PR #11837: URL: https://github.com/apache/iceberg/pull/11837#issuecomment-2557371186 Slack discussion about this: https://apache-iceberg.slack.com/archives/C03LG1D563F/p1733215233582339 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Spark: Test reading default values in Spark [iceberg]

2024-12-20 Thread via GitHub
Fokko commented on code in PR #11832: URL: https://github.com/apache/iceberg/pull/11832#discussion_r1894159165 ## api/src/main/java/org/apache/iceberg/types/Types.java: ## @@ -711,8 +711,15 @@ public boolean equals(Object o) { return false; } else if (!Objects.eq

Re: [PR] Spark: Test reading default values in Spark [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11832: URL: https://github.com/apache/iceberg/pull/11832#discussion_r1894163798 ## api/src/main/java/org/apache/iceberg/types/Types.java: ## @@ -711,8 +711,15 @@ public boolean equals(Object o) { return false; } else if (!Objects.e

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557381909 @jiakai-li Thanks for working on this! And happy holidays :) > I noticed the PyArrowFileIO._initialize_fs function doesn't take netloc parameter into account when initial

Re: [PR] Remove deprecation warnings in test [iceberg-python]

2024-12-20 Thread via GitHub
Fokko commented on code in PR #1416: URL: https://github.com/apache/iceberg-python/pull/1416#discussion_r1894094791 ## tests/expressions/test_parser.py: ## @@ -70,7 +70,6 @@ def test_equals_false() -> None: def test_is_null() -> None: assert IsNull("foo") == parser.parse("

Re: [PR] Support Location Providers [iceberg-python]

2024-12-20 Thread via GitHub
smaheshwar-pltr commented on code in PR #1452: URL: https://github.com/apache/iceberg-python/pull/1452#discussion_r1893986378 ## pyiceberg/table/locations.py: ## @@ -0,0 +1,82 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreeme

Re: [PR] Implement column projection [iceberg-python]

2024-12-20 Thread via GitHub
Fokko commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1894112102 ## pyiceberg/io/pyarrow.py: ## @@ -1237,16 +1238,29 @@ def _task_to_record_batches( # When V3 support is introduced, we will update `downcast_ns_timestam

Re: [PR] Change dot notation in add column documentation to tuple [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1433: URL: https://github.com/apache/iceberg-python/pull/1433#issuecomment-2557282080 > Yes, the struct has to exist before you can insert anything into it. ah i see, that makes sense. in that case, can we edit the example so that it works out of the box?

Re: [PR] Change dot notation in add column documentation to tuple [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1433: URL: https://github.com/apache/iceberg-python/pull/1433#issuecomment-2557283437 i found another dot notion in `Move column`, do we need to change this too? https://py.iceberg.apache.org/api/#move-column -- This is an automated message from the Apache Gi

Re: [PR] Implement column projection [iceberg-python]

2024-12-20 Thread via GitHub
Fokko commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1894115453 ## pyiceberg/io/pyarrow.py: ## @@ -1237,16 +1238,29 @@ def _task_to_record_batches( # When V3 support is introduced, we will update `downcast_ns_timestam

Re: [PR] Spec: Support geo type [iceberg]

2024-12-20 Thread via GitHub
paleolimbot commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1894143585 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional

[PR] Use ExternalTypeInfo in Rowconverter code instead of deprecated TableSchema.getFieldTypes [iceberg]

2024-12-20 Thread via GitHub
abharath9 opened a new pull request, #11838: URL: https://github.com/apache/iceberg/pull/11838 - I just discovered that the LegacyTypeInfoDataTypeConverter(used by tableSchema.getFieldTypes()) doesn't support Instant datatypes. - To work for Instant datatypes, we either change LegacyTyp

Re: [I] Data loss bug in MergeIntoCommand [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on issue #11765: URL: https://github.com/apache/iceberg/issues/11765#issuecomment-2557395549 I just left a -1 on the docs PR. I don't think that this is the right place to put a warning and I also think that the warning is overly broad and would lead to confusion. -- Thi

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1894176293 ## tests/io/test_pyarrow.py: ## @@ -381,10 +382,11 @@ def test_pyarrow_unified_session_properties() -> None: **UNIFIED_AWS_SESSION_PROPERTIES, }

Re: [PR] WIP: feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#issuecomment-2557393177 Thanks for working on this @felixscherz Feel free to tag me when its ready for review :) -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Doc: Do Not Modify the Source Data Table During MergeIntoCommand Exec… [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on PR #11787: URL: https://github.com/apache/iceberg/pull/11787#issuecomment-2557394434 -1 I don't think it is appropriate to add danger warnings to Iceberg docs for bugs like this one and if we did I think this is not clear enough about the cause. It looks like the

Re: [I] Issue when connecting to REST catalogs on AWS ( Amazon SageMaker Lakehouse) [iceberg-python]

2024-12-20 Thread via GitHub
Neuw84 commented on issue #1449: URL: https://github.com/apache/iceberg-python/issues/1449#issuecomment-2557399952 Well for normal s3 buckets is working well without the parameter (have a working script that writes and read via the rest catalog). Will try to dig on why is doing those

Re: [PR] Flink 1.20: Support default values in Parquet reader [iceberg]

2024-12-20 Thread via GitHub
jbonofre commented on PR #11839: URL: https://github.com/apache/iceberg/pull/11839#issuecomment-2557396251 @rdblue @pvary @RussellSpitzer I started to add default value support on Flink (Parquet). I'm working on the tests right now. -- This is an automated message from the Apache Git Serv

Re: [I] Issue when connecting to REST catalogs on AWS ( Amazon SageMaker Lakehouse) [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on issue #1449: URL: https://github.com/apache/iceberg-python/issues/1449#issuecomment-2557405306 can you share what you've tried that worked? Might be helpful to debug this further -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]

2024-12-20 Thread via GitHub
aihuaxu commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1893471194 ## core/src/main/java/org/apache/iceberg/variants/SerializedObject.java: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[I] `IcebergTableScan` plan and stream schemas are different under projections [iceberg-rust]

2024-12-20 Thread via GitHub
gruuya opened a new issue, #828: URL: https://github.com/apache/iceberg-rust/issues/828 The `ExecutionPlan::schema` for `IcebergTableScan` and `RecordBatchStream::schema` for the returned stream have a mismatch in the schema in the presence of projections, leading to problems for anything t

Re: [PR] Bump Spark 3.5.4 [iceberg]

2024-12-20 Thread via GitHub
pan3793 commented on PR #11731: URL: https://github.com/apache/iceberg/pull/11731#issuecomment-2556920961 Spark 3.5.4 RC3 passed the vote and the jars were available on Maven Central a few minutes ago, I removed the staging repo and it's ready to go. cc @nastra @Fokko and @jbonofre

Re: [PR] Spark 3.5: Adapt to Spark 3.5.4 [iceberg]

2024-12-20 Thread via GitHub
pan3793 commented on PR #11802: URL: https://github.com/apache/iceberg/pull/11802#issuecomment-2556923300 Spark 3.5.4 is out, close and in favor #11731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Spark 3.5: Adapt to Spark 3.5.4 [iceberg]

2024-12-20 Thread via GitHub
pan3793 closed pull request #11802: Spark 3.5: Adapt to Spark 3.5.4 URL: https://github.com/apache/iceberg/pull/11802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Table Scan Delete File Handling: Positional and Equality Delete Support [iceberg-rust]

2024-12-20 Thread via GitHub
Fokko commented on PR #652: URL: https://github.com/apache/iceberg-rust/pull/652#issuecomment-2556890529 @sdd Thank you for your understanding, looking forward to the smaller PRs 👍 From PyIceberg I've learned that there are a lot of subtle optimizations and want to make sure that we handle

Re: [PR] feat: exposing delete files in task [iceberg-rust]

2024-12-20 Thread via GitHub
Fokko commented on PR #625: URL: https://github.com/apache/iceberg-rust/pull/625#issuecomment-2556892109 @xxhZs Thanks for working on this, and I agree with @liurenjie1024 that this is a partial PR. I hope you don't mind closing this one in favor of https://github.com/apache/iceberg-rust/pu

Re: [PR] feat: exposing delete files in task [iceberg-rust]

2024-12-20 Thread via GitHub
Fokko closed pull request #625: feat: exposing delete files in task URL: https://github.com/apache/iceberg-rust/pull/625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[I] Forbidden Exception creating Polaris Rest catalog with Flink 1.20 [iceberg]

2024-12-20 Thread via GitHub
David-N-Perkins opened a new issue, #11836: URL: https://github.com/apache/iceberg/issues/11836 ### Apache Iceberg version 1.7.1 (latest release) ### Query engine Flink ### Please describe the bug 🐞 We attempted to upgrade Iceberg `1.6.1` and Flink `1.18.1`

Re: [PR] Doc: Add status page for different implementations. [iceberg]

2024-12-20 Thread via GitHub
sungwy commented on code in PR #11772: URL: https://github.com/apache/iceberg/pull/11772#discussion_r1893960025 ## site/docs/status.md: ## @@ -0,0 +1,362 @@ +--- +title: "Implementation Status" +--- + + +# Implementations Status + +Apache iceberg now has implementations of the i

Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1894201179 ## core/src/main/java/org/apache/iceberg/variants/SerializedObject.java: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Use ExternalTypeInfo in Rowconverter code instead of deprecated TableSchema.getFieldTypes [iceberg]

2024-12-20 Thread via GitHub
abharath9 commented on PR #11838: URL: https://github.com/apache/iceberg/pull/11838#issuecomment-2557428125 @stevenzwu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]

2024-12-20 Thread via GitHub
danielcweeks commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1894200743 ## core/src/main/java/org/apache/iceberg/variants/VariantUtil.java: ## @@ -0,0 +1,195 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [PR] Flink: Add RowConverter for Iceberg Source [iceberg]

2024-12-20 Thread via GitHub
abharath9 commented on PR #11301: URL: https://github.com/apache/iceberg/pull/11301#issuecomment-2557427257 > @abharath9 thanks for the contribution. > > can you also create clean back port to 1.18 and 1.19 @stevenzwu can you also review this pr https://github.com/apache/iceber

Re: [I] `datetime` objects in `row_filter` expressions are not casted and raise an error [iceberg-python]

2024-12-20 Thread via GitHub
wwqwq2313 commented on issue #1456: URL: https://github.com/apache/iceberg-python/issues/1456#issuecomment-2557453307 Hello, we tried to solve the issue. This is what we did: Modify the to_bytes function for TimestampType and TimestamptzType to handle datetime objects directly

Re: [PR] Auth Manager API part 2: AuthManager [iceberg]

2024-12-20 Thread via GitHub
danielcweeks merged PR #11809: URL: https://github.com/apache/iceberg/pull/11809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceb

Re: [I] `datetime` objects in `row_filter` expressions are not casted and raise an error [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on issue #1456: URL: https://github.com/apache/iceberg-python/issues/1456#issuecomment-2557429995 Thanks for reporting this issue! I believe this is the relevant code https://github.com/apache/iceberg-python/blob/dbcf65b4892779efca7362e069edecff7f2bf69f/pyiceber

Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1894195216 ## core/src/main/java/org/apache/iceberg/variants/SerializedObject.java: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[I] `datetime` objects in `row_filter` expressions are not casted and raise an error [iceberg-python]

2024-12-20 Thread via GitHub
jayceslesar opened a new issue, #1456: URL: https://github.com/apache/iceberg-python/issues/1456 ### Feature Request / Improvement in the following `row_filter` in a `scan()` call, ```py row_filter=And( GreaterThanOrEqual("timestamp_received", start_time),

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557425441 Thank you @kevinjqliu , just try to clear my head a little bit > I think a potential solution might be to omit the "region" property and allow the S3FileSystem to determine

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557480803 > BTW theres a similar issue in #1041 Can I tackle on this issue as well if there is no one working on it? -- This is an automated message from the Apache Git Service. To

Re: [I] [SPJ] Skweded partitions harm merge performances [iceberg]

2024-12-20 Thread via GitHub
szehon-ho commented on issue #11800: URL: https://github.com/apache/iceberg/issues/11800#issuecomment-2557495242 Hm, but for "not match" check, you need to check each replicated partition against all the splitted partitions, so it defeats the point of splitting I think. -- This is an aut

Re: [I] Reported and actual arrow schema of the table can be different [iceberg-rust]

2024-12-20 Thread via GitHub
gruuya commented on issue #813: URL: https://github.com/apache/iceberg-rust/issues/813#issuecomment-2556524539 Even more liberally, one could do without changing `get_arrow_datum` at all ```diff if let Some(idx) = self.bound_reference(reference)? { -let literal =

Re: [PR] refactor: Remove spawn and channel inside arrow reader [iceberg-rust]

2024-12-20 Thread via GitHub
sdd commented on PR #806: URL: https://github.com/apache/iceberg-rust/pull/806#issuecomment-2556556568 I'm not sure this is what we want. `try_buffer_unordered` executes futures **concurrently** according to the docs: https://docs.rs/futures/latest/futures/stream/trait.TryStreamExt.html#met

Re: [PR] refactor: Remove spawn and channel inside arrow reader [iceberg-rust]

2024-12-20 Thread via GitHub
Xuanwo commented on PR #806: URL: https://github.com/apache/iceberg-rust/pull/806#issuecomment-2556576097 > I'm not sure this is what we want. `try_buffer_unordered` executes futures **concurrently** according to the docs: [docs.rs/futures/latest/futures/stream/trait.TryStreamExt.html#metho

Re: [I] flink在提交任务的时候报错 [iceberg]

2024-12-20 Thread via GitHub
jbonofre commented on issue #11823: URL: https://github.com/apache/iceberg/issues/11823#issuecomment-2556634610 Can you please translate the title/question in english ? Thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] 在使用flink执行任务的时候报错,可以帮我看一下吗 [iceberg]

2024-12-20 Thread via GitHub
jbonofre commented on issue #11822: URL: https://github.com/apache/iceberg/issues/11822#issuecomment-2556635063 Can you please translate the title/question in english ? Thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Spec: Support geo type [iceberg]

2024-12-20 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r189376 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [PR] feat: add s3tables catalog [iceberg-rust]

2024-12-20 Thread via GitHub
Xuanwo commented on PR #807: URL: https://github.com/apache/iceberg-rust/pull/807#issuecomment-2556494901 Hi @flaneur2020, I suggest splitting this PR into multiple ones to make it easier to review and accelerate the iteration speed. -- This is an automated message from the Apache Git Ser

Re: [PR] feat: add s3tables catalog [iceberg-rust]

2024-12-20 Thread via GitHub
flaneur2020 commented on PR #807: URL: https://github.com/apache/iceberg-rust/pull/807#issuecomment-2556502230 @Xuanwo i believe the rest part of this pr is adding tests, i've created a real s3tables bucket to test it and it looks work fine, can you give some suggestions about the test part

Re: [PR] Table Scan Delete File Handling: Positional and Equality Delete Support [iceberg-rust]

2024-12-20 Thread via GitHub
sdd commented on code in PR #652: URL: https://github.com/apache/iceberg-rust/pull/652#discussion_r1893620855 ## crates/iceberg/src/delete_file_index.rs: ## @@ -0,0 +1,95 @@ +use std::future::Future; +use std::pin::Pin; +use std::sync::{Arc, RwLock}; +use std::task::{Context, Po

Re: [I] Reported and actual arrow schema of the table can be different [iceberg-rust]

2024-12-20 Thread via GitHub
gruuya commented on issue #813: URL: https://github.com/apache/iceberg-rust/issues/813#issuecomment-2556514864 > For the filtering situation, we want to cast the type to the physical type Is something like this close to what you had in mind ```diff /// Convert Iceberg Datum to A

Re: [PR] Supply a hint arrow schema for casting Parquet field types during scans [iceberg-rust]

2024-12-20 Thread via GitHub
gruuya commented on PR #814: URL: https://github.com/apache/iceberg-rust/pull/814#issuecomment-2556516072 Making this a draft as the upstream dependency is also a draft atm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Add crate for sqllogictest. [iceberg-rust]

2024-12-20 Thread via GitHub
liurenjie1024 opened a new pull request, #827: URL: https://github.com/apache/iceberg-rust/pull/827 This crate is the first part of sqllogictest crate, it just add a new crate. For a more complete framework, see https://github.com/apache/iceberg-rust/pull/581 -- This is an automate

[I] `TestFlinkIcebergSinkDistributionMode#testRangeDistributionStatisticsMigration()` is failing [iceberg]

2024-12-20 Thread via GitHub
jbonofre opened a new issue, #11835: URL: https://github.com/apache/iceberg/issues/11835 ### Apache Iceberg version None ### Query engine Flink ### Please describe the bug 🐞 When building iceberg-java `main`, `TestFlinkIcebergSinkDistributionMode#testRangeD

Re: [PR] Implement column projection [iceberg-python]

2024-12-20 Thread via GitHub
gabeiglio commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1894502208 ## tests/io/test_pyarrow.py: ## @@ -1122,6 +1123,110 @@ def test_projection_concat_files(schema_int: Schema, file_int: str) -> None: assert repr(result_t

Re: [PR] Implement column projection [iceberg-python]

2024-12-20 Thread via GitHub
gabeiglio commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1894502208 ## tests/io/test_pyarrow.py: ## @@ -1122,6 +1123,110 @@ def test_projection_concat_files(schema_int: Schema, file_int: str) -> None: assert repr(result_t

Re: [I] NullPointerException after deleting old partition column [iceberg]

2024-12-20 Thread via GitHub
anuragmantri commented on issue #10626: URL: https://github.com/apache/iceberg/issues/10626#issuecomment-2557872438 @Fokko - This can be easily reproduced in Spark by adding a SELECT after these tests - https://github.com/apache/iceberg/blob/dea2fd1d9debfd23aeda9403ed3eb81c6aebf30f/spark

Re: [PR] Spark: Test reading default values in Spark [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on PR #11832: URL: https://github.com/apache/iceberg/pull/11832#issuecomment-2557915442 Thanks for the reviews, @Fokko! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Spark: Test reading default values in Spark [iceberg]

2024-12-20 Thread via GitHub
rdblue merged PR #11832: URL: https://github.com/apache/iceberg/pull/11832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Core: add variant type support [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1894518245 ## core/src/main/java/org/apache/iceberg/SchemaParser.java: ## @@ -132,6 +133,8 @@ static void toJson(Type.PrimitiveType primitive, JsonGenerator generator) throws

Re: [PR] Core: add variant type support [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1894518424 ## core/src/main/java/org/apache/iceberg/SchemaParser.java: ## @@ -42,6 +42,7 @@ private SchemaParser() {} private static final String STRUCT = "struct"; private

Re: [PR] Core: add variant type support [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1894518504 ## core/src/main/java/org/apache/iceberg/SchemaParser.java: ## @@ -166,6 +169,10 @@ public static String toJson(Schema schema, boolean pretty) { private static T

Re: [PR] URL-encode partition field names in file locations [iceberg-python]

2024-12-20 Thread via GitHub
smaheshwar-pltr commented on code in PR #1457: URL: https://github.com/apache/iceberg-python/pull/1457#discussion_r1894501982 ## tests/integration/test_partitioning_key.py: ## @@ -203,10 +203,11 @@ # """ ), ( -[PartitionField(source_id=

Re: [PR] URL-encode partition field names in file locations [iceberg-python]

2024-12-20 Thread via GitHub
smaheshwar-pltr commented on code in PR #1457: URL: https://github.com/apache/iceberg-python/pull/1457#discussion_r1894517823 ## tests/integration/test_partitioning_key.py: ## @@ -722,6 +723,25 @@ (CAST('2023-01-01 11:55:59.99' AS TIMESTAMP), CAST('2023-01-01'

Re: [PR] Core: add variant type support [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1894518142 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -709,6 +709,10 @@ public T map(Types.MapType map, Supplier keyResult, Supplier valueResult)

Re: [PR] Core: add variant type support [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1894519023 ## core/src/test/java/org/apache/iceberg/TestMetadataUpdateParser.java: ## @@ -52,6 +56,15 @@ public class TestMetadataUpdateParser { Types.NestedField.requ

Re: [PR] Core: add variant type support [iceberg]

2024-12-20 Thread via GitHub
rdblue commented on code in PR #11831: URL: https://github.com/apache/iceberg/pull/11831#discussion_r1894519134 ## core/src/test/java/org/apache/iceberg/avro/TestAvroSchemaProjection.java: ## @@ -150,4 +152,16 @@ public void projectWithMapSchemaChanged() { .as("Result o

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1894520472 ## pyiceberg/io/pyarrow.py: ## Review Comment: Make doc changes as well. https://py.iceberg.apache.org/configuration/#s3 ## pyiceberg/io/p

Re: [PR] Core: Prevent dropping column which is referenced by active partition specs [iceberg]

2024-12-20 Thread via GitHub
anuragmantri commented on code in PR #11842: URL: https://github.com/apache/iceberg/pull/11842#discussion_r1894504885 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -533,6 +534,25 @@ private static Schema applyChanges( } } +if (base != null)

Re: [PR] Implement column projection [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1894504972 ## tests/io/test_pyarrow.py: ## @@ -1122,6 +1123,110 @@ def test_projection_concat_files(schema_int: Schema, file_int: str) -> None: assert repr(result_

Re: [I] NullPointerException after deleting old partition column [iceberg]

2024-12-20 Thread via GitHub
anuragmantri commented on issue #10626: URL: https://github.com/apache/iceberg/issues/10626#issuecomment-2557872911 I created a PR to block this drop here. Please take a look https://github.com/apache/iceberg/pull/11842 -- This is an automated message from the Apache Git Service. To re

Re: [PR] URL-encode partition field names in file locations [iceberg-python]

2024-12-20 Thread via GitHub
smaheshwar-pltr commented on code in PR #1457: URL: https://github.com/apache/iceberg-python/pull/1457#discussion_r1894501982 ## tests/integration/test_partitioning_key.py: ## @@ -203,10 +203,11 @@ # """ ), ( -[PartitionField(source_id=

Re: [PR] Core: Prevent dropping column which is referenced by active partition specs [iceberg]

2024-12-20 Thread via GitHub
anuragmantri commented on PR #11842: URL: https://github.com/apache/iceberg/pull/11842#issuecomment-2557887971 I will fix other partition evolution tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Core: Prevent dropping column which is referenced by active partition specs [iceberg]

2024-12-20 Thread via GitHub
anuragmantri opened a new pull request, #11842: URL: https://github.com/apache/iceberg/pull/11842 Prevents - https://github.com/apache/iceberg/issues/10234 - https://github.com/apache/iceberg/issues/10626 - https://github.com/apache/iceberg/issues/11314 - and any other similar re

Re: [PR] Core: Prevent dropping column which is referenced by active partition specs [iceberg]

2024-12-20 Thread via GitHub
anuragmantri commented on PR #11842: URL: https://github.com/apache/iceberg/pull/11842#issuecomment-2557871692 @amogh-jahagirdar, @advancedxy @RussellSpitzer - Could you please take a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557880783 Thank you @kevinjqliu , can I have some more guidance on this please? > Im dont think netloc can be used to determine the region. S3 URI scheme doesn't use netloc, only S3

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557894124 Sweet, I'll go ahead with this approach then. Thanks very much @kevinjqliu ! -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557893079 > I did some search and seems in terms of s3 scheme, the format is s3:///. The netloc parsed from urlparse (essentially passed to the _initialize_fs call) then points to the buc

Re: [PR] Fix read from multiple s3 regions [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557893390 BTW there are 2 FileIO implementations, one for pyarrow, another for fsspec. We might want to do the same for fsspec https://github.com/apache/iceberg-python/blob/dbcf65b

Re: [PR] Parquet: Correctly prune nested columns [iceberg]

2024-12-20 Thread via GitHub
github-actions[bot] commented on PR #11373: URL: https://github.com/apache/iceberg/pull/11373#issuecomment-2557905206 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Docs: Use the correct YAML text block indicator to prevent formatting issues [iceberg]

2024-12-20 Thread via GitHub
github-actions[bot] commented on PR #11552: URL: https://github.com/apache/iceberg/pull/11552#issuecomment-2557905222 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Implement column projection [iceberg-python]

2024-12-20 Thread via GitHub
gabeiglio commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1894286043 ## pyiceberg/io/pyarrow.py: ## @@ -1237,16 +1238,29 @@ def _task_to_record_batches( # When V3 support is introduced, we will update `downcast_ns_time

Re: [PR] Spec: add variant type [iceberg]

2024-12-20 Thread via GitHub
emkornfield commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1894350830 ## format/spec.md: ## @@ -182,6 +182,21 @@ A **`list`** is a collection of values with some element type. The element field A **`map`** is a collection of key

Re: [PR] Fix comment on `WRITE_OBJECT_STORE_PARTITIONED_PATHS` table property [iceberg]

2024-12-20 Thread via GitHub
smaheshwar-pltr commented on code in PR #11798: URL: https://github.com/apache/iceberg/pull/11798#discussion_r1888502532 ## core/src/main/java/org/apache/iceberg/TableProperties.java: ## @@ -244,7 +244,7 @@ private TableProperties() {} public static final String OBJECT_STORE_

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-12-20 Thread via GitHub
jiakai-li commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2557828231 Hey guys, I can pick this up together with #1279 if no one is currently working on this. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2557842068 assigned to you @jiakai-li -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Spec: Support geo type [iceberg]

2024-12-20 Thread via GitHub
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1894474230 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_

Re: [I] (Potential Bug) Partition field names are not URL-encoded in file locations [iceberg-python]

2024-12-20 Thread via GitHub
kevinjqliu commented on issue #1458: URL: https://github.com/apache/iceberg-python/issues/1458#issuecomment-2557850636 Great catch. I think this is a bug and #175 might be related. On the java side, looks like this was added recently https://github.com/apache/iceberg/blame/dea2fd1d9

Re: [PR] Core: Prevent dropping column which is referenced by active partition specs [iceberg]

2024-12-20 Thread via GitHub
advancedxy commented on code in PR #11842: URL: https://github.com/apache/iceberg/pull/11842#discussion_r1894534190 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -533,6 +534,25 @@ private static Schema applyChanges( } } +if (base != null) {

  1   2   >