Re: [PR] Spark 3.5: Avoid deprecated method [iceberg]

2025-01-07 Thread via GitHub
nastra commented on code in PR #11874: URL: https://github.com/apache/iceberg/pull/11874#discussion_r1905437474 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java: ## @@ -59,106 +59,101 @@ public static T visit(DataType sType, Typ

Re: [I] flink在提交任务的时候报错 [iceberg]

2025-01-07 Thread via GitHub
hashmapybx commented on issue #11823: URL: https://github.com/apache/iceberg/issues/11823#issuecomment-2575342993 Please provide your source code:` ci_ap_lab`, or some more info. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #1041: URL: https://github.com/apache/iceberg-python/issues/1041#issuecomment-2575575091 @jiakai-li we can close this issue! > At the meantime, I'm keen to work on the write.data.path and write.metadata.path if that's something we want to enable and no

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905804295 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

Re: [PR] Build: Bump pytest-checkdocs from 2.10.1 to 2.13.0 [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu merged PR #682: URL: https://github.com/apache/iceberg-python/pull/682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ice

Re: [PR] Split metadata tables into separate modules [iceberg-rust]

2025-01-07 Thread via GitHub
Xuanwo commented on code in PR #872: URL: https://github.com/apache/iceberg-rust/pull/872#discussion_r1905099231 ## crates/iceberg/src/inspect/metadata_table.rs: ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Split metadata tables into separate modules [iceberg-rust]

2025-01-07 Thread via GitHub
Xuanwo commented on code in PR #872: URL: https://github.com/apache/iceberg-rust/pull/872#discussion_r1905099231 ## crates/iceberg/src/inspect/metadata_table.rs: ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Metadata table scans as streams [iceberg-rust]

2025-01-07 Thread via GitHub
Xuanwo commented on code in PR #870: URL: https://github.com/apache/iceberg-rust/pull/870#discussion_r1905367016 ## crates/iceberg/src/metadata_scan.rs: ## @@ -95,7 +99,17 @@ impl<'a> SnapshotsTable<'a> { } /// Scans the snapshots table. -pub fn scan(&self) -> Re

Re: [PR] Spark 3.5: Avoid deprecated method [iceberg]

2025-01-07 Thread via GitHub
nastra commented on code in PR #11874: URL: https://github.com/apache/iceberg/pull/11874#discussion_r1905437474 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java: ## @@ -59,106 +59,101 @@ public static T visit(DataType sType, Typ

Re: [PR] Spark 3.5: Avoid deprecated method [iceberg]

2025-01-07 Thread via GitHub
ebyhr commented on code in PR #11874: URL: https://github.com/apache/iceberg/pull/11874#discussion_r1905448233 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java: ## @@ -59,106 +59,101 @@ public static T visit(DataType sType, Type

Re: [PR] Spark 3.5: Avoid deprecated method [iceberg]

2025-01-07 Thread via GitHub
nastra commented on code in PR #11874: URL: https://github.com/apache/iceberg/pull/11874#discussion_r1905455172 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java: ## @@ -59,106 +59,101 @@ public static T visit(DataType sType, Typ

Re: [PR] Metadata table scans as streams [iceberg-rust]

2025-01-07 Thread via GitHub
rshkv commented on code in PR #870: URL: https://github.com/apache/iceberg-rust/pull/870#discussion_r1905520107 ## crates/iceberg/src/metadata_scan.rs: ## @@ -95,7 +99,17 @@ impl<'a> SnapshotsTable<'a> { } /// Scans the snapshots table. -pub fn scan(&self) -> Res

Re: [PR] Metadata table scans as streams [iceberg-rust]

2025-01-07 Thread via GitHub
rshkv commented on code in PR #870: URL: https://github.com/apache/iceberg-rust/pull/870#discussion_r1905520810 ## crates/iceberg/src/metadata_scan.rs: ## @@ -95,7 +99,17 @@ impl<'a> SnapshotsTable<'a> { } /// Scans the snapshots table. -pub fn scan(&self) -> Res

[PR] feat: support scan nested type(struct, map, list) [iceberg-rust]

2025-01-07 Thread via GitHub
ZENOTME opened a new pull request, #882: URL: https://github.com/apache/iceberg-rust/pull/882 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] [Bug] Error in overwrite(): pyarrow.lib.ArrowInvalid: offset overflow with large dataset (~3M rows) [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #1491: URL: https://github.com/apache/iceberg-python/issues/1491#issuecomment-2575706473 thanks for reporting this issue @sundaresanr do you have an example so i can reproduce this issue? -- This is an automated message from the Apache Git Service. To respo

Re: [I] [Bug] Error in overwrite(): pyarrow.lib.ArrowInvalid: offset overflow with large dataset (~3M rows) [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #1491: URL: https://github.com/apache/iceberg-python/issues/1491#issuecomment-2575710490 I've recreated this example from https://github.com/apache/arrow/issues/33049 and was able to `append` successfully. ``` import numpy as np import pyarrow as

Re: [PR] Core: Try create Iceberg metadata table for Jdbc catalog in initialization [iceberg]

2025-01-07 Thread via GitHub
jbonofre commented on code in PR #11427: URL: https://github.com/apache/iceberg/pull/11427#discussion_r1905737238 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -123,7 +123,7 @@ enum SchemaVersion { + JdbcTableOperations.METADATA_LOCATION_PROP

Re: [I] [BUG] pyiceberg hanging on multiprocessing [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #1488: URL: https://github.com/apache/iceberg-python/issues/1488#issuecomment-2575734864 One thing we can test is to force create a new FileIO in the worker. Something like this ``` from multiprocessing import Process from pyiceberg.io.pyarrow im

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905752019 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +from

Re: [I] [bug] read from multiple s3 regions [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #1279: URL: https://github.com/apache/iceberg-python/issues/1279#issuecomment-2575577225 Closed by #1453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] [Bug] Cannot use PyIceberg with multiple FS [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu closed issue #1041: [Bug] Cannot use PyIceberg with multiple FS URL: https://github.com/apache/iceberg-python/issues/1041 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[I] [feature] Add support for `write.data.path` and `write.metadata.path` [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu opened a new issue, #1492: URL: https://github.com/apache/iceberg-python/issues/1492 ### Feature Request / Improvement In the [write properties](https://iceberg.apache.org/docs/1.6.0/configuration/?h=write.data.path#write-properties) section #1452 adds support for L

Re: [PR] Support Location Providers [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #1452: URL: https://github.com/apache/iceberg-python/pull/1452#discussion_r1905630618 ## pyiceberg/table/locations.py: ## @@ -0,0 +1,82 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] Bump moto from 5.0.25 to 5.0.26 [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu merged PR #1490: URL: https://github.com/apache/iceberg-python/pull/1490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [I] [bug] read from multiple s3 regions [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu closed issue #1279: [bug] read from multiple s3 regions URL: https://github.com/apache/iceberg-python/issues/1279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: support scan nested type(struct, map, list) [iceberg-rust]

2025-01-07 Thread via GitHub
ZENOTME commented on PR #882: URL: https://github.com/apache/iceberg-rust/pull/882#issuecomment-2575597063 cc @liurenjie1024 @Xuanwo @Fokko @sdd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Kafka Connect: Add table to topics mapping property [iceberg]

2025-01-07 Thread via GitHub
bryanck commented on PR #10422: URL: https://github.com/apache/iceberg/pull/10422#issuecomment-2575556734 Thanks, I'll take a deeper look next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Core: Fix loading a table in CachingCatalog with metadata table name [iceberg]

2025-01-07 Thread via GitHub
gaborkaszab commented on PR #11738: URL: https://github.com/apache/iceberg/pull/11738#issuecomment-2575561645 Maybe @danielcweeks @pvary @nastra: Would you mind taking a look at this fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] bump version to 0.9.0 [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on PR #1489: URL: https://github.com/apache/iceberg-python/pull/1489#issuecomment-2575752540 moved to `0.9.0`! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Spark 3.5: Avoid deprecated method [iceberg]

2025-01-07 Thread via GitHub
nastra commented on code in PR #11874: URL: https://github.com/apache/iceberg/pull/11874#discussion_r1905440692 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java: ## @@ -59,106 +59,101 @@ public static T visit(DataType sType, Typ

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905784116 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905788239 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905791773 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905793291 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

Re: [PR] Build: Bump pytest-checkdocs from 2.10.1 to 2.13.0 [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on PR #682: URL: https://github.com/apache/iceberg-python/pull/682#issuecomment-2575809562 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Core, Spark: Rewrite data files with high delete ratio [iceberg]

2025-01-07 Thread via GitHub
nastra commented on code in PR #11825: URL: https://github.com/apache/iceberg/pull/11825#discussion_r1905666756 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedDataRewriter.java: ## @@ -84,13 +86,30 @@ private boolean shouldRewrite(List group) { return enoughInput

[I] [feature] UpdateSchema.add_column supports both parent and child in the same transaction [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu opened a new issue, #1493: URL: https://github.com/apache/iceberg-python/issues/1493 ### Apache Iceberg version None ### Please describe the bug 🐞 Current we cannot add the parent field with its child nested field in the same transaction. For example, `

Re: [PR] Change dot notation in add column documentation to tuple [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #1433: URL: https://github.com/apache/iceberg-python/pull/1433#discussion_r1905685734 ## mkdocs/docs/api.md: ## @@ -951,8 +951,10 @@ Using `add_column` you can add a column, without having to worry about the field with table.update_schema() a

Re: [I] [feature] UpdateSchema.add_column supports both parent and child in the same transaction [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #1493: URL: https://github.com/apache/iceberg-python/issues/1493#issuecomment-2575670705 Example test ``` def test_add_and_evolve_schema(simple_table: Table) -> None: parent = "parent" child = ("parent", "child") schema = simple_table.

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #10755: URL: https://github.com/apache/iceberg/pull/10755#discussion_r1905832698 ## core/src/main/java/org/apache/iceberg/UpdateRequirements.java: ## @@ -173,6 +175,27 @@ private void update(MetadataUpdate.SetDefaultSortOrder unused) { }

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905825703 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

Re: [PR] Add reproducible test for #1194 [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu closed pull request #1483: Add reproducible test for #1194 URL: https://github.com/apache/iceberg-python/pull/1483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905837764 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

[I] Unable to create tables automatically with Iceberg REST and AWS Glue [iceberg]

2025-01-07 Thread via GitHub
AnatolyPopov opened a new issue, #11921: URL: https://github.com/apache/iceberg/issues/11921 ### Query engine _No response_ ### Question I was trying to set up Kafka connector against AWS Glue with rest catalog type and with S3FIleIO. What I faced with is that I did n

Re: [PR] Hive: Optimize viewExists API in hive catalog [iceberg]

2025-01-07 Thread via GitHub
dramaticlly commented on PR #11813: URL: https://github.com/apache/iceberg/pull/11813#issuecomment-2575904763 Thank you @pvary @nastra for reviewing and merging the change! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Spec: add variant type [iceberg]

2025-01-07 Thread via GitHub
aihuaxu commented on PR #10831: URL: https://github.com/apache/iceberg/pull/10831#issuecomment-2575904277 I believe I have addressed the comments and can we move forward to merge the PR? Let me know if I miss anything. cc @RussellSpitzer and @rdblue -- This is an automated message from

Re: [PR] Spec: add variant type [iceberg]

2025-01-07 Thread via GitHub
emkornfield commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1905854896 ## version.txt: ## Review Comment: this can be removed? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Spec: Document Snapshot Summary Optional Fields for Standardization [iceberg]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #11660: URL: https://github.com/apache/iceberg/pull/11660#discussion_r1905905118 ## format/spec.md: ## @@ -1633,3 +1633,57 @@ might indicate different snapshot IDs for a specific timestamp. The discrepancie When processing point in time que

Re: [PR] Spark 3.5: Implement RewriteTablePath [iceberg]

2025-01-07 Thread via GitHub
flyrain commented on PR #11555: URL: https://github.com/apache/iceberg/pull/11555#issuecomment-2576006557 Thanks @szehon-ho for working on this. Feel free to merge it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[I] [feature] Add support for `write.data.path` and `write.metadata.path` [iceberg-python]

2025-01-07 Thread via GitHub
jiakai-li opened a new issue, #1494: URL: https://github.com/apache/iceberg-python/issues/1494 ### Feature Request / Improvement Add support for [`write.data.path`](https://iceberg.apache.org/docs/latest/configuration/#write-properties:~:text=the%20file%20path-,write.data.path,-table%

Re: [I] [feature] Add support for `write.data.path` and `write.metadata.path` [iceberg-python]

2025-01-07 Thread via GitHub
jiakai-li closed issue #1494: [feature] Add support for `write.data.path` and `write.metadata.path` URL: https://github.com/apache/iceberg-python/issues/1494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2025-01-07 Thread via GitHub
amogh-jahagirdar commented on PR #10755: URL: https://github.com/apache/iceberg/pull/10755#issuecomment-2575953987 I'll go ahead and merge, the remaining comment(s) are something we can clean up in a follow on! Thank you for your patience and great work @advancedxy , really appreciate it! t

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905950940 ## core/src/main/java/org/apache/iceberg/avro/InternalReader.java: ## @@ -205,7 +205,6 @@ public ValueReader primitive(Pair partner, Schema primitive) { case

Re: [I] [Bug] Error in overwrite(): pyarrow.lib.ArrowInvalid: offset overflow with large dataset (~3M rows) [iceberg-python]

2025-01-07 Thread via GitHub
sundaresanr commented on issue #1491: URL: https://github.com/apache/iceberg-python/issues/1491#issuecomment-2576078566 @kevinjqliu The issue can be reproduced with a partitioned table. As you can see in the backtrace, the issue is with the call to `_determine_partitions()`, which ca

Re: [PR] Parquet: Internal writer and reader [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1905999633 ## .palantir/revapi.yml: ## @@ -1171,6 +1171,28 @@ acceptedBreaks: \ java.util.function.Function, org.apache.iceberg.io.CloseableIterable,\ \ java.u

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2025-01-07 Thread via GitHub
sopel39 commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1905856982 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +257,17 @@ private static class Task implements Supplier>>, Closeable { @Over

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905843540 ## core/src/main/java/org/apache/iceberg/avro/BaseAvroSchemaVisitor.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [I] [feature] Add support for `write.data.path` and `write.metadata.path` [iceberg-python]

2025-01-07 Thread via GitHub
jiakai-li commented on issue #1494: URL: https://github.com/apache/iceberg-python/issues/1494#issuecomment-2576009191 I'm happy to work on this feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] [feature] Add support for `write.data.path` and `write.metadata.path` [iceberg-python]

2025-01-07 Thread via GitHub
jiakai-li commented on issue #1492: URL: https://github.com/apache/iceberg-python/issues/1492#issuecomment-2576010415 I raised a duplicated one, which is closed now. Can I work on this feature? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905990532 ## core/src/test/java/org/apache/iceberg/avro/TestInternalWriter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905991506 ## core/src/test/java/org/apache/iceberg/avro/TestInternalWriter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905992426 ## core/src/test/java/org/apache/iceberg/avro/TestInternalWriter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905993243 ## core/src/test/java/org/apache/iceberg/avro/TestInternalWriter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905993934 ## core/src/test/java/org/apache/iceberg/avro/TestInternalWriter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905995521 ## core/src/test/java/org/apache/iceberg/avro/TestInternalWriter.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Implement column projection [iceberg-python]

2025-01-07 Thread via GitHub
gabeiglio commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1906110946 ## pyiceberg/io/pyarrow.py: ## @@ -1216,6 +1216,25 @@ def _field_id(self, field: pa.Field) -> int: return -1 +def _get_column_projection_values( +

[PR] [Docs] Update spark-getting-started docs page to make the example valid [iceberg]

2025-01-07 Thread via GitHub
nickdelnano opened a new pull request, #11923: URL: https://github.com/apache/iceberg/pull/11923 The [Spark Getting Started docs page](https://iceberg.apache.org/docs/nightly/spark-getting-started/) has intro Spark examples but they reference tables and columns that do not exist in the exa

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906125908 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -359,10 +250,10 @@ public ParquetValueReader primitive( ColumnDescript

[PR] Build: Bump boto3 from 1.35.88 to 1.35.93 [iceberg-python]

2025-01-07 Thread via GitHub
dependabot[bot] opened a new pull request, #1495: URL: https://github.com/apache/iceberg-python/pull/1495 Bumps [boto3](https://github.com/boto/boto3) from 1.35.88 to 1.35.93. Commits https://github.com/boto/boto3/commit/7e5990c694164f96125d1362ed26bfec978c9e01";>7e5990c Merge

Re: [PR] [Docs] Update spark-getting-started docs page to make the example valid [iceberg]

2025-01-07 Thread via GitHub
nickdelnano commented on code in PR #11923: URL: https://github.com/apache/iceberg/pull/11923#discussion_r1906121331 ## docs/docs/spark-getting-started.md: ## @@ -77,21 +77,24 @@ Once your table is created, insert data using [`INSERT INTO`](spark-writes.md#in ```sql INSERT

[PR] Build: Bump mypy-boto3-glue from 1.35.87 to 1.35.93 [iceberg-python]

2025-01-07 Thread via GitHub
dependabot[bot] opened a new pull request, #1496: URL: https://github.com/apache/iceberg-python/pull/1496 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.35.87 to 1.35.93. Release notes Sourced from https://github.com/youtype/mypy_boto3_builder/release

Re: [PR] feat(catalog): Add Catalog Registry [iceberg-go]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #244: URL: https://github.com/apache/iceberg-go/pull/244#discussion_r1906120334 ## catalog/registry.go: ## @@ -0,0 +1,135 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOT

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906132097 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -114,113 +112,6 @@ public ParquetValueReader struct( } } - private c

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906132860 ## parquet/src/main/java/org/apache/iceberg/data/parquet/GenericParquetReaders.java: ## @@ -92,4 +127,232 @@ protected void set(Record struct, int pos, Object value) {

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906132346 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -76,6 +64,16 @@ protected ParquetValueReader createReader( protected abstrac

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906134768 ## parquet/src/main/java/org/apache/iceberg/data/parquet/InternalReader.java: ## @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] [Docs] Update spark-getting-started docs page to make the example valid [iceberg]

2025-01-07 Thread via GitHub
nickdelnano commented on PR #11923: URL: https://github.com/apache/iceberg/pull/11923#issuecomment-2576387003 Hi @kevinjqliu - I saw that you're a committer and recently looked at this doc page in https://github.com/apache/iceberg/pull/11845. Could you review this PR? -- This is an autom

Re: [PR] Core: Parsing and Writing Tests for V3 Metadata [iceberg]

2025-01-07 Thread via GitHub
HonahX closed pull request #11730: Core: Parsing and Writing Tests for V3 Metadata URL: https://github.com/apache/iceberg/pull/11730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] feat(catalog): Add Catalog Registry [iceberg-go]

2025-01-07 Thread via GitHub
zeroshade commented on code in PR #244: URL: https://github.com/apache/iceberg-go/pull/244#discussion_r1906137219 ## catalog/glue.go: ## @@ -54,6 +57,50 @@ var ( _ Catalog = (*GlueCatalog)(nil) ) +func init() { + Register("glue", RegistrarFunc(func(_ string, pro

Re: [PR] feat(catalog): Add Catalog Registry [iceberg-go]

2025-01-07 Thread via GitHub
zeroshade commented on code in PR #244: URL: https://github.com/apache/iceberg-go/pull/244#discussion_r1906136606 ## catalog/registry.go: ## @@ -0,0 +1,135 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTI

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906138972 ## parquet/src/main/java/org/apache/iceberg/data/parquet/InternalReader.java: ## @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906139217 ## parquet/src/main/java/org/apache/iceberg/data/parquet/InternalReader.java: ## @@ -0,0 +1,207 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906140132 ## parquet/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java: ## @@ -38,6 +50,19 @@ protected StructWriter createStructWriter(List> wr retu

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906140419 ## parquet/src/main/java/org/apache/iceberg/data/parquet/GenericParquetWriter.java: ## @@ -38,6 +50,19 @@ protected StructWriter createStructWriter(List> wr retu

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906141975 ## parquet/src/test/java/org/apache/iceberg/parquet/TestInternalWriter.java: ## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Implement column projection [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1906142055 ## pyiceberg/io/pyarrow.py: ## @@ -1216,6 +1216,25 @@ def _field_id(self, field: pa.Field) -> int: return -1 +def _get_column_projection_values(

Re: [PR] Parquet: Add readers and writers for the internal object model [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11904: URL: https://github.com/apache/iceberg/pull/11904#discussion_r1906143562 ## .palantir/revapi.yml: ## @@ -1171,6 +1171,28 @@ acceptedBreaks: \ java.util.function.Function, org.apache.iceberg.io.CloseableIterable,\ \ java.u

Re: [PR] Fix ParallelIterable deadlock [iceberg]

2025-01-07 Thread via GitHub
RussellSpitzer commented on PR #11781: URL: https://github.com/apache/iceberg/pull/11781#issuecomment-2575944738 @findepi I'm on board with this but I want to make sure you are also happy with this. I'm unsure of whether having the ability to yield before files will really help memory press

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905875973 ## tests/catalog/test_s3tables.py: ## @@ -0,0 +1,227 @@ +import pytest Review Comment: I renamed it to `integration_test_s3tables.py` and also changed th

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905843540 ## core/src/main/java/org/apache/iceberg/avro/BaseAvroSchemaVisitor.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [PR] Avro: Add internal writer [iceberg]

2025-01-07 Thread via GitHub
rdblue commented on code in PR #11919: URL: https://github.com/apache/iceberg/pull/11919#discussion_r1905943594 ## core/src/main/java/org/apache/iceberg/avro/BaseAvroSchemaVisitor.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905791773 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

Re: [PR] API: Support removeUnusedSpecs in ExpireSnapshots [iceberg]

2025-01-07 Thread via GitHub
amogh-jahagirdar merged PR #10755: URL: https://github.com/apache/iceberg/pull/10755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

2025-01-07 Thread via GitHub
felixscherz commented on code in PR #1429: URL: https://github.com/apache/iceberg-python/pull/1429#discussion_r1905883595 ## pyiceberg/catalog/s3tables.py: ## @@ -0,0 +1,324 @@ +import re +from typing import TYPE_CHECKING, List, Optional, Set, Tuple, Union + +import boto3 + +fro

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-07 Thread via GitHub
RussellSpitzer commented on PR #11906: URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2575979132 The test here says it's failling because youare deleting ``` but the following elements were unexpected: ["file:/tmp/junit-14563533605645158466/data/_c2_tr

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-07 Thread via GitHub
RussellSpitzer commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1905894464 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java: ## @@ -854,12 +867,14 @@ public void testCompareToFileList

Re: [I] [feature] Add support for `write.data.path` and `write.metadata.path` [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on issue #1492: URL: https://github.com/apache/iceberg-python/issues/1492#issuecomment-2576027138 BTW this is how java side is done https://grep.app/search?q=WRITE_DATA_LOCATION&filter[repo][0]=apache/iceberg&filter[path][0]=core/src/ -- This is an automated message

[PR] Spark: Relativize in-memory paths for data file and rewritable delete file locations [iceberg]

2025-01-07 Thread via GitHub
amogh-jahagirdar opened a new pull request, #11525: URL: https://github.com/apache/iceberg/pull/11525 This is a follow up to https://github.com/apache/iceberg/pull/11273/files# Instead of broadcasting a map with absolute paths for data files and delete files to executors, we could sh

[PR] feat(catalog): Standardize Catalog create table function [iceberg-go]

2025-01-07 Thread via GitHub
zeroshade opened a new pull request, #245: URL: https://github.com/apache/iceberg-go/pull/245 Adding a CreateTable function to the `Catalog` interface, standardizing the implementation that was initially created by #146 so that it isn't specific to the REST catalog and can be implemented by

Re: [PR] Implement column projection [iceberg-python]

2025-01-07 Thread via GitHub
kevinjqliu commented on code in PR #1443: URL: https://github.com/apache/iceberg-python/pull/1443#discussion_r1906149085 ## pyiceberg/io/pyarrow.py: ## @@ -1216,6 +1216,25 @@ def _field_id(self, field: pa.Field) -> int: return -1 +def _get_column_projection_values(

  1   2   3   >