Re: [PR] Bump `pre-commit` versions [iceberg-python]

2024-11-19 Thread via GitHub
Fokko merged PR #1344: URL: https://github.com/apache/iceberg-python/pull/1344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] A more robust way to deprecate APIs [iceberg-python]

2024-11-19 Thread via GitHub
Fokko commented on issue #1330: URL: https://github.com/apache/iceberg-python/issues/1330#issuecomment-2487793689 Hey @ndrluis thanks for bringing this up! Are you suggesting to copy the code into our codebase? I always favor reusing an existing library instead of reinventing the wheel. I

Re: [I] .pyiceberg.yaml config files should be loaded from current dir instead of home folder [iceberg-python]

2024-11-19 Thread via GitHub
anentropic commented on issue #1333: URL: https://github.com/apache/iceberg-python/issues/1333#issuecomment-2487759065 It should check the current dir first since it is the most specific e.g. then you could use the global home dir file for most things but override that on a particula

Re: [PR] add assertions in TestRowDelta [iceberg]

2024-11-19 Thread via GitHub
nastra commented on code in PR #11594: URL: https://github.com/apache/iceberg/pull/11594#discussion_r1849698894 ## core/src/test/java/org/apache/iceberg/TestRowDelta.java: ## @@ -74,6 +74,9 @@ public void addOnlyDeleteFilesProducesDeleteOperation() { assertThat(snap.sequenc

Re: [PR] Set default for `SortField`'s `transform` [iceberg-python]

2024-11-19 Thread via GitHub
Fokko merged PR #1347: URL: https://github.com/apache/iceberg-python/pull/1347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Drop upper bounds for fsspec and it's implementations [iceberg-python]

2024-11-19 Thread via GitHub
sumanth-manchala commented on PR #1341: URL: https://github.com/apache/iceberg-python/pull/1341#issuecomment-2486726774 @Fokko , pls review now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] API, Core: Add formatVersion() to Table [iceberg]

2024-11-19 Thread via GitHub
nastra commented on code in PR #11587: URL: https://github.com/apache/iceberg/pull/11587#discussion_r1849692234 ## core/src/main/java/org/apache/iceberg/BaseMetadataTable.java: ## @@ -212,6 +212,11 @@ public String toString() { return name(); } + @Override + public i

Re: [PR] Core: Fix caching table with metadata table names [iceberg]

2024-11-19 Thread via GitHub
manuzhang commented on code in PR #11123: URL: https://github.com/apache/iceberg/pull/11123#discussion_r1849679354 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestCachingTableWithMetaTableName.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software F

[I] NullPointerException when using VectorizedArrowReader to read a null column [iceberg]

2024-11-19 Thread via GitHub
slessard opened a new issue, #10275: URL: https://github.com/apache/iceberg/issues/10275 ### Apache Iceberg version 1.5.1 (latest release) ### Query engine Other ### Please describe the bug ๐Ÿž I am writing a compatibility layer for Teradata so that it can acc

Re: [PR] Feature: Write to branches [iceberg-python]

2024-11-19 Thread via GitHub
vinjai commented on PR #941: URL: https://github.com/apache/iceberg-python/pull/941#issuecomment-2487692994 @kevinjqliu What are the next steps to get this merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] datafusion: Create table provider for a snapshot. [iceberg-rust]

2024-11-19 Thread via GitHub
ryzhyk commented on code in PR #707: URL: https://github.com/apache/iceberg-rust/pull/707#discussion_r1849634146 ## crates/integrations/datafusion/src/table/mod.rs: ## @@ -60,14 +66,52 @@ impl IcebergTableProvider { let schema = Arc::new(schema_to_arrow_schema(table.

Re: [PR] Core: Fix caching table with metadata table names [iceberg]

2024-11-19 Thread via GitHub
manuzhang commented on code in PR #11123: URL: https://github.com/apache/iceberg/pull/11123#discussion_r1849674752 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -145,22 +146,26 @@ public Table loadTable(TableIdentifier ident) { } if (MetadataTab

Re: [PR] Remove Hive 2 [iceberg]

2024-11-19 Thread via GitHub
manuzhang commented on PR #10996: URL: https://github.com/apache/iceberg/pull/10996#issuecomment-2487438171 @pvary @nastra @Fokko @gaborkaszab I've sent out a discussion email. Please share your thoughts there. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Procedure to compute table stats [iceberg]

2024-11-19 Thread via GitHub
ajantha-bhat commented on PR #10986: URL: https://github.com/apache/iceberg/pull/10986#issuecomment-2487663286 > I will take a look at the partition stats PR first by @ajantha-bhat. I want to understand if we want a single analyze procedure or different procedures for table and partition st

Re: [PR] Procedure to compute table stats [iceberg]

2024-11-19 Thread via GitHub
ajantha-bhat commented on code in PR #10986: URL: https://github.com/apache/iceberg/pull/10986#discussion_r1849657282 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/SparkProcedures.java: ## @@ -61,6 +61,7 @@ private static Map> initProcedureBuilders() {

Re: [PR] datafusion: Create table provider for a snapshot. [iceberg-rust]

2024-11-19 Thread via GitHub
ryzhyk commented on code in PR #707: URL: https://github.com/apache/iceberg-rust/pull/707#discussion_r1849634146 ## crates/integrations/datafusion/src/table/mod.rs: ## @@ -60,14 +66,52 @@ impl IcebergTableProvider { let schema = Arc::new(schema_to_arrow_schema(table.

Re: [I] [bug] read from multiple s3 regions [iceberg-python]

2024-11-19 Thread via GitHub
danhphan commented on issue #1279: URL: https://github.com/apache/iceberg-python/issues/1279#issuecomment-2487630114 Yes @kevinjqliu , seems that I still not able to fully understand the requirement for this change. I think I will need more time to read the codes, and may be try som

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-19 Thread via GitHub
flyrain commented on PR #11564: URL: https://github.com/apache/iceberg/pull/11564#issuecomment-2487565516 Hi @bryanck, do we still have time to include this bug fix in 1.7.1? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Spark 3.5: Fix NotSerializableException when migrating Spark tables [iceberg]

2024-11-19 Thread via GitHub
manuzhang commented on code in PR #11157: URL: https://github.com/apache/iceberg/pull/11157#discussion_r1849635195 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java: ## @@ -273,4 +273,22 @@ public void testMigrateWith

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-19 Thread via GitHub
pvary commented on PR #11597: URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2487619096 Quick question: Is this a behavioral change? Previously we failed when the metadata was corrupt. After this, we succeed. How do we handle corrupt metadata in other catalog implement

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-19 Thread via GitHub
pvary commented on code in PR #11597: URL: https://github.com/apache/iceberg/pull/11597#discussion_r1849630441 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -412,6 +412,25 @@ private void validateTableIsIcebergTableOrView( } } + @Over

Re: [PR] Boto Glue standard retry policy with configuration [iceberg-python]

2024-11-19 Thread via GitHub
mark-major commented on code in PR #1307: URL: https://github.com/apache/iceberg-python/pull/1307#discussion_r1849629423 ## pyiceberg/catalog/glue.py: ## @@ -305,7 +308,18 @@ def __init__(self, name: str, **properties: Any): aws_secret_access_key=get_first_property_

Re: [PR] datafusion: Create table provider for a snapshot. [iceberg-rust]

2024-11-19 Thread via GitHub
ryzhyk commented on code in PR #707: URL: https://github.com/apache/iceberg-rust/pull/707#discussion_r1849619809 ## crates/integrations/datafusion/src/table/mod.rs: ## @@ -60,14 +66,52 @@ impl IcebergTableProvider { let schema = Arc::new(schema_to_arrow_schema(table.

Re: [I] Query specific table snapshot with datafusion. [iceberg-rust]

2024-11-19 Thread via GitHub
ryzhyk commented on issue #702: URL: https://github.com/apache/iceberg-rust/issues/702#issuecomment-2487590447 Thank you! Yes, I understand it's doable directly with the `iceberg` crate, but I prefer to use datafusion in this case, as it allows running a SQL statement over the Iceber

Re: [PR] Procedure to compute table stats [iceberg]

2024-11-19 Thread via GitHub
nastra commented on code in PR #10986: URL: https://github.com/apache/iceberg/pull/10986#discussion_r1849603635 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestComputeTableStatsProcedure.java: ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apac

Re: [I] improve performance of Table.add_files by parallelizing [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1335: URL: https://github.com/apache/iceberg-python/issues/1335#issuecomment-2487569620 sounds good! Feel free to ping me for review. I'll add this issue to the 0.8.1 milestone for now -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Core: delete temp metadata file when version already exists [iceberg]

2024-11-19 Thread via GitHub
nastra merged PR #11350: URL: https://github.com/apache/iceberg/pull/11350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Add unit test for create_changelog_view time range behavior [iceberg]

2024-11-19 Thread via GitHub
flyrain commented on PR #11564: URL: https://github.com/apache/iceberg/pull/11564#issuecomment-2487559443 Thank you, @Acehaidrey, for reporting this issue! It seems that the method `buildChangelogScan()` does not properly set up the scan when the startTimestamp is newer than the timestamp o

Re: [I] improve performance of Table.add_files by parallelizing [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1335: URL: https://github.com/apache/iceberg-python/issues/1335#issuecomment-2487549044 @vtk9 thanks for the context from slack, I must have missed that thread -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] improve performance of Table.add_files by parallelizing [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1335: URL: https://github.com/apache/iceberg-python/issues/1335#issuecomment-2487547946 thanks @bigluck that makes sense! I think `_parquet_files_to_data_files` might be a good place to add the parallelism @vtk9 is this something you would like to cont

Re: [I] A more robust way to deprecate APIs [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1330: URL: https://github.com/apache/iceberg-python/issues/1330#issuecomment-2487482903 thanks for doing the research! I like this approach. do you know if the deprecation message includes the call site or stack trace? for example, in #1336, it would be he

Re: [I] improve performance of Table.add_files by parallelizing [iceberg-python]

2024-11-19 Thread via GitHub
bigluck commented on issue #1335: URL: https://github.com/apache/iceberg-python/issues/1335#issuecomment-2487492436 I believe @vtk9 is suggesting the files to be read in parallel rather than sequentially. I could be mistaken, but it seems that if you have 10,000 files, each one is

Re: [I] `catalog.load_table` raises Invalid JSON error [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1328: URL: https://github.com/apache/iceberg-python/issues/1328#issuecomment-2487486564 The issue is most likely from reading the table metadata file https://github.com/apache/iceberg-python/blob/93ebd39e3c457dcb86cd053c60d2d13f0713a637/pyiceberg/catalo

Re: [I] `catalog.load_table` raises Invalid JSON error [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1328: URL: https://github.com/apache/iceberg-python/issues/1328#issuecomment-2487485560 > Invalid JSON: EOF while parsing a value at line 1 column 0 [type=json_invalid, input_value='', input_type=str]" I think this usually means the table metadata from

Re: [I] improve performance of Table.add_files by parallelizing [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1335: URL: https://github.com/apache/iceberg-python/issues/1335#issuecomment-2487476101 [`_parquet_files_to_data_files` is a generator](https://github.com/apache/iceberg-python/blob/3ccdc44735d70bd3ef6ed18b60b3eba43c4b3b44/pyiceberg/table/__init__.py#L1529-L15

Re: [I] .pyiceberg.yaml config files should be loaded from current dir instead of home folder [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1333: URL: https://github.com/apache/iceberg-python/issues/1333#issuecomment-2487478138 > I don't want a global iceberg config for my whole machine, I want a file that lives in my project directory that could be potentially checked into git that's an in

Re: [I] Allow `file_format` to be lower-case [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1340: URL: https://github.com/apache/iceberg-python/issues/1340#issuecomment-2487473814 https://github.com/apache/iceberg-python/blob/93ebd39e3c457dcb86cd053c60d2d13f0713a637/pyiceberg/manifest.py#L95-L102 -- This is an automated message from the Apache Git

Re: [I] How to get rid of the warning [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1336: URL: https://github.com/apache/iceberg-python/issues/1336#issuecomment-2487470684 Opened #1346 as a fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] Kevinjqliu/use table name [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu opened a new pull request, #1346: URL: https://github.com/apache/iceberg-python/pull/1346 Closes #1336 This PR changes the implementation of the `Table.name` function to use `self._identifier` instead of `self.identifier` to avoid having unnecessary deprecation warnings. T

Re: [PR] feat: Add equality delete writer [iceberg-rust]

2024-11-19 Thread via GitHub
ZENOTME commented on PR #372: URL: https://github.com/apache/iceberg-rust/pull/372#issuecomment-2485706164 Hi, I find that this PR has some fail. I try to fix it in #703. To simplify the review, I separate this PR and fix into two commits. Feel free to tell me if something needs to be refin

Re: [I] How to get rid of the warning [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1336: URL: https://github.com/apache/iceberg-python/issues/1336#issuecomment-2487444598 The first warning, `Table.identifier property is deprecated. Please use Table.name() function instead.` is from the use of `Table.identifier` used by the pyiceberg codebas

Re: [PR] test: Introduce integration test framework. [iceberg-rust]

2024-11-19 Thread via GitHub
ZENOTME commented on PR #581: URL: https://github.com/apache/iceberg-rust/pull/581#issuecomment-2487363072 Thanks @liurenjie1024! This PR is great. After #349, I can also help to migrate our e2e test using this test framework. It's helpful for us to test using different query engines or SDK

Re: [PR] 1.7.1rc0 cherry picks [iceberg]

2024-11-19 Thread via GitHub
bryanck closed pull request #11593: 1.7.1rc0 cherry picks URL: https://github.com/apache/iceberg/pull/11593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [I] Nested namespace support is broken in 1.7.0 [iceberg]

2024-11-19 Thread via GitHub
bryanck closed issue #11539: Nested namespace support is broken in 1.7.0 URL: https://github.com/apache/iceberg/issues/11539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: support append data file and add e2e test [iceberg-rust]

2024-11-19 Thread via GitHub
ZENOTME commented on PR #349: URL: https://github.com/apache/iceberg-rust/pull/349#issuecomment-2487346035 > @ZENOTME Thanks, `make test` also runs successfully for me. I'm pretty sure that the test works, but I want to assert certain things on the metadata. Having the IDE to set breakpoint

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-11-19 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1849434684 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -307,12 +331,22 @@ private void invalidateFilteredCache() { /** * @return a Mani

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-11-19 Thread via GitHub
huaxingao commented on PR #11257: URL: https://github.com/apache/iceberg/pull/11257#issuecomment-2486300125 @RussellSpitzer There are some conflict files. If I rebase, it will also pick up changes for Spark3.5, so I opened a new [PR](https://github.com/apache/iceberg/pull/11583). I will pi

[PR] datafusion: Create table provider for a snapshot. [iceberg-rust]

2024-11-19 Thread via GitHub
ryzhyk opened a new pull request, #707: URL: https://github.com/apache/iceberg-rust/pull/707 The Iceberg table provider allows querying an Iceberg table via datafusion. The initial implementation only allowed querying the latest snapshot of the table. It sometimes useful to query a specific

Re: [PR] datafusion: Create table provider for a snapshot. [iceberg-rust]

2024-11-19 Thread via GitHub
liurenjie1024 commented on code in PR #707: URL: https://github.com/apache/iceberg-rust/pull/707#discussion_r1849400044 ## crates/integrations/datafusion/src/table/mod.rs: ## @@ -60,14 +66,52 @@ impl IcebergTableProvider { let schema = Arc::new(schema_to_arrow_schema

Re: [PR] Bump deptry from 0.21.0 to 0.21.1 [iceberg-python]

2024-11-19 Thread via GitHub
Fokko merged PR #1342: URL: https://github.com/apache/iceberg-python/pull/1342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] Query specific table snapshot with datafusion. [iceberg-rust]

2024-11-19 Thread via GitHub
ryzhyk commented on issue #702: URL: https://github.com/apache/iceberg-rust/issues/702#issuecomment-2487109839 I created #707 to try to address this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] How to get rid of the warning [iceberg-python]

2024-11-19 Thread via GitHub
sungwy commented on issue #1336: URL: https://github.com/apache/iceberg-python/issues/1336#issuecomment-2487112440 Hi @Fokko, @djouallah thanks for flagging this! I think I must have missed out on updating the `name` method in this large deprecation exercise: https://github.com/apache/ice

Re: [I] Query specific table snapshot with datafusion. [iceberg-rust]

2024-11-19 Thread via GitHub
liurenjie1024 commented on issue #702: URL: https://github.com/apache/iceberg-rust/issues/702#issuecomment-2487153241 Hi, @ryzhyk It's possible using iceberg-rust's api: https://github.com/apache/iceberg-rust/blob/6e0bcf56028e0d20d5ceeedf87dbb3db7c929ee3/crates/iceberg/src/scan.rs#L131

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-11-19 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1849242654 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -185,6 +198,16 @@ List filterManifests(Schema tableSchema, List manife return I

Re: [PR] Core: delete temp metadata file when version already exists [iceberg]

2024-11-19 Thread via GitHub
leesf commented on PR #11350: URL: https://github.com/apache/iceberg/pull/11350#issuecomment-2487135105 @nastra I pushed an update to fix the ut updated by your push. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] How to get rid of the warning [iceberg-python]

2024-11-19 Thread via GitHub
djouallah commented on issue #1336: URL: https://github.com/apache/iceberg-python/issues/1336#issuecomment-2487131390 polaris -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] 1.7.1rc0 cherry picks [iceberg]

2024-11-19 Thread via GitHub
bryanck commented on PR #11593: URL: https://github.com/apache/iceberg/pull/11593#issuecomment-2487079958 Thanks for the review @nastra and @Fokko ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] 1.7.1rc0 cherry picks [iceberg]

2024-11-19 Thread via GitHub
bryanck merged PR #11593: URL: https://github.com/apache/iceberg/pull/11593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Hive: Optimize tableExists API in hive catalog [iceberg]

2024-11-19 Thread via GitHub
dramaticlly commented on PR #11597: URL: https://github.com/apache/iceberg/pull/11597#issuecomment-2487035183 FYI @szehon-ho and @haizhou-zhao if you are interested -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] "Iceberg.engine.hive.enabled" Conf is not honouring for HIVE CATALOG [iceberg]

2024-11-19 Thread via GitHub
github-actions[bot] commented on issue #10286: URL: https://github.com/apache/iceberg/issues/10286#issuecomment-2487026481 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [PR] Build: Bump calcite from 1.10.0 to 1.38.0 [iceberg]

2024-11-19 Thread via GitHub
github-actions[bot] commented on PR #11361: URL: https://github.com/apache/iceberg/pull/11361#issuecomment-2487026731 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think thatโ€™s incorrect or this pul

Re: [I] Can IceBerg support diskann algorithm ? [iceberg]

2024-11-19 Thread via GitHub
github-actions[bot] closed issue #10285: Can IceBerg support diskann algorithm ? URL: https://github.com/apache/iceberg/issues/10285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] NullPointerException when using VectorizedArrowReader to read a null column [iceberg]

2024-11-19 Thread via GitHub
github-actions[bot] closed issue #10275: NullPointerException when using VectorizedArrowReader to read a null column URL: https://github.com/apache/iceberg/issues/10275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Can IceBerg support diskann algorithm ? [iceberg]

2024-11-19 Thread via GitHub
github-actions[bot] commented on issue #10285: URL: https://github.com/apache/iceberg/issues/10285#issuecomment-2487026457 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] NullPointerException when using VectorizedArrowReader to read a null column [iceberg]

2024-11-19 Thread via GitHub
github-actions[bot] commented on issue #10275: URL: https://github.com/apache/iceberg/issues/10275#issuecomment-2487026427 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [PR] Add `all_manifests` metadata table with tests [iceberg-python]

2024-11-19 Thread via GitHub
Fokko commented on PR #1241: URL: https://github.com/apache/iceberg-python/pull/1241#issuecomment-2486890621 @soumya-ghosh I see this one is still pending, are you still interested to get this in? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-11-19 Thread via GitHub
ajantha-bhat commented on PR #11216: URL: https://github.com/apache/iceberg/pull/11216#issuecomment-2486040761 @RussellSpitzer: I have added the Assert and replied to https://github.com/apache/iceberg/pull/11216#discussion_r1822062905, do you have anymore comments for this PR? -- T

Re: [I] OR condition does not leverage all parquet metadata (metrics, dictionary, bloom filter) causing inefficient queries [iceberg]

2024-11-19 Thread via GitHub
cccs-jc commented on issue #10029: URL: https://github.com/apache/iceberg/issues/10029#issuecomment-2485859703 There is a PR ready to go however no one is reviewing it. We've been running with a local fork for months now. I wish this PR made it's way into the main branch. https

Re: [PR] Spark 3.5: Fix NotSerializableException when migrating Spark tables [iceberg]

2024-11-19 Thread via GitHub
RussellSpitzer commented on code in PR #11157: URL: https://github.com/apache/iceberg/pull/11157#discussion_r1848654925 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -971,4 +979,109 @@ public int hashCode() { return Objects.hashCode

Re: [PR] Bump mypy-boto3-glue from 1.35.53 to 1.35.65 [iceberg-python]

2024-11-19 Thread via GitHub
Fokko merged PR #1343: URL: https://github.com/apache/iceberg-python/pull/1343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] How to get rid of the warning [iceberg-python]

2024-11-19 Thread via GitHub
Fokko commented on issue #1336: URL: https://github.com/apache/iceberg-python/issues/1336#issuecomment-2486954482 Looping in @sungwy here, did you mean to return `self._identifier` here? https://github.com/apache/iceberg-python/blob/a90c0140ee7b6c3a9d553c7317a98b8f9582d7d9/pyiceberg/

Re: [PR] Procedure to compute table stats [iceberg]

2024-11-19 Thread via GitHub
szehon-ho commented on PR #10986: URL: https://github.com/apache/iceberg/pull/10986#issuecomment-2486954110 This looks good to me, will merge tomorrow if no additional comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten [iceberg]

2024-11-19 Thread via GitHub
aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1849171314 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -185,6 +198,16 @@ List filterManifests(Schema tableSchema, List manife return I

Re: [I] Error: table_type missing from table parameters while only loading iceberg table from Hive metaStore [iceberg-python]

2024-11-19 Thread via GitHub
Fokko closed issue #1331: Error: table_type missing from table parameters while only loading iceberg table from Hive metaStore URL: https://github.com/apache/iceberg-python/issues/1331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[I] Bug in `PartialEq` for `Struct` [iceberg-rust]

2024-11-19 Thread via GitHub
Sl1mb0 opened a new issue, #706: URL: https://github.com/apache/iceberg-rust/issues/706 # Problem If I write a `Manifest` to an `output.avro` file and then read that same `output.avro` file into another `Manifest` object, asserting that the two objects are equal fails due to inequality b

Re: [PR] Ignore tables without table_type parameters while loading all iceberg table from Glue and Hive catalog (#1331) [iceberg-python]

2024-11-19 Thread via GitHub
Fokko merged PR #1332: URL: https://github.com/apache/iceberg-python/pull/1332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Drop upper bounds for fsspec and it's implementations [iceberg-python]

2024-11-19 Thread via GitHub
Fokko merged PR #1341: URL: https://github.com/apache/iceberg-python/pull/1341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Tests: Bump Spark to 3.5.3 [iceberg-python]

2024-11-19 Thread via GitHub
HonahX commented on code in PR #1322: URL: https://github.com/apache/iceberg-python/pull/1322#discussion_r1849139089 ## dev/Dockerfile: ## @@ -36,7 +36,7 @@ ENV PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.7-src.zip:$ RUN mkdir -p ${HADOOP_HOME} && mkdir -p

Re: [PR] Tests: Bump Spark to 3.5.3 [iceberg-python]

2024-11-19 Thread via GitHub
HonahX merged PR #1322: URL: https://github.com/apache/iceberg-python/pull/1322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Build: Bump pytest-checkdocs from 2.10.1 to 2.13.0 [iceberg-python]

2024-11-19 Thread via GitHub
Fokko commented on PR #682: URL: https://github.com/apache/iceberg-python/pull/682#issuecomment-2486889150 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Boto Glue standard retry policy with configuration [iceberg-python]

2024-11-19 Thread via GitHub
Fokko commented on code in PR #1307: URL: https://github.com/apache/iceberg-python/pull/1307#discussion_r1849135659 ## pyiceberg/catalog/glue.py: ## @@ -305,7 +308,18 @@ def __init__(self, name: str, **properties: Any): aws_secret_access_key=get_first_property_value

Re: [PR] Tests: Bump Spark to 3.5.3 [iceberg-python]

2024-11-19 Thread via GitHub
HonahX commented on code in PR #1322: URL: https://github.com/apache/iceberg-python/pull/1322#discussion_r1849139089 ## dev/Dockerfile: ## @@ -36,7 +36,7 @@ ENV PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.7-src.zip:$ RUN mkdir -p ${HADOOP_HOME} && mkdir -p

Re: [PR] Add Support for Dynamic Overwrite [iceberg-python]

2024-11-19 Thread via GitHub
Fokko commented on PR #931: URL: https://github.com/apache/iceberg-python/pull/931#issuecomment-2486886304 @jqin61 Do you have time to follow up on the last few comments? Would be great to get this in ๐Ÿ‘ -- This is an automated message from the Apache Git Service. To respond to the messag

[PR] Bump mypy-boto3-glue from 1.35.53 to 1.35.65 [iceberg-python]

2024-11-19 Thread via GitHub
dependabot[bot] opened a new pull request, #1343: URL: https://github.com/apache/iceberg-python/pull/1343 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.35.53 to 1.35.65. Commits See full diff in https://github.com/youtype/mypy_boto3_builder/commi

[PR] Bump deptry from 0.21.0 to 0.21.1 [iceberg-python]

2024-11-19 Thread via GitHub
dependabot[bot] opened a new pull request, #1342: URL: https://github.com/apache/iceberg-python/pull/1342 Bumps [deptry](https://github.com/fpgmaas/deptry) from 0.21.0 to 0.21.1. Release notes Sourced from https://github.com/fpgmaas/deptry/releases";>deptry's releases. 0.21.

Re: [PR] core: Filter on live entries when reading the manifest [iceberg]

2024-11-19 Thread via GitHub
Fokko commented on code in PR #9996: URL: https://github.com/apache/iceberg/pull/9996#discussion_r1849083781 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -443,38 +443,36 @@ private ManifestFile filterManifestWithDeletedFiles(

Re: [PR] Procedure to compute table stats [iceberg]

2024-11-19 Thread via GitHub
karuppayya commented on code in PR #10986: URL: https://github.com/apache/iceberg/pull/10986#discussion_r1849077800 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestComputeTableStatsProcedure.java: ## @@ -0,0 +1,122 @@ +/* + * Licensed to the

[PR] core: Filter on live entries when reading the manifest [iceberg]

2024-11-19 Thread via GitHub
Fokko opened a new pull request, #9996: URL: https://github.com/apache/iceberg/pull/9996 This will reduce the allocation of objects and filter out irrelevant manifests at read time. PS: In the diff, the indentation looks a bit off. -- This is an automated message from the Apache Gi

Re: [I] How to get rid of the warning [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1336: URL: https://github.com/apache/iceberg-python/issues/1336#issuecomment-2486727584 Possibly related to #1318, but i'll double check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] How to get rid of the warning [iceberg-python]

2024-11-19 Thread via GitHub
kevinjqliu commented on issue #1336: URL: https://github.com/apache/iceberg-python/issues/1336#issuecomment-2486727124 Thanks for reporting this! This is a bug where the warning is emitted when the catalog identifier is not used. -- This is an automated message from the Apache Git Servi

Re: [PR] Parquet: Correctly prune nested columns [iceberg]

2024-11-19 Thread via GitHub
RussellSpitzer commented on PR #11373: URL: https://github.com/apache/iceberg/pull/11373#issuecomment-2486664942 @MichaelDeSteven - ``` Error: eckstyle] [ERROR] /home/runner/work/iceberg/iceberg/parquet/src/test/java/org/apache/iceberg/parquet/TestPruneColumns.java:23:1: Use org.apac

Re: [PR] Oauth changes [iceberg]

2024-11-19 Thread via GitHub
cccs-cat001 closed pull request #11595: Oauth changes URL: https://github.com/apache/iceberg/pull/11595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Core: Fix caching table with metadata table names [iceberg]

2024-11-19 Thread via GitHub
gaborkaszab commented on code in PR #11123: URL: https://github.com/apache/iceberg/pull/11123#discussion_r1848981822 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -145,22 +146,26 @@ public Table loadTable(TableIdentifier ident) { } if (MetadataT

Re: [I] Consider Using object_store as IO Abstraction [iceberg-rust]

2024-11-19 Thread via GitHub
BlakeOrth commented on issue #172: URL: https://github.com/apache/iceberg-rust/issues/172#issuecomment-2486497892 @liurenjie1024 I have taken some time to explore an implementation based on your suggestion above, just as I did for the user extensible `Storage` proposed earlier. Unfortunatel

Re: [PR] Core: Change Delete granularity to file for new tables [iceberg]

2024-11-19 Thread via GitHub
amogh-jahagirdar commented on code in PR #11478: URL: https://github.com/apache/iceberg/pull/11478#discussion_r1848422899 ## spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -154,7 +155,7 @@ public void testDeleteWithVectorized

Re: [PR] Core: Change Delete granularity to file for new tables [iceberg]

2024-11-19 Thread via GitHub
amogh-jahagirdar commented on code in PR #11478: URL: https://github.com/apache/iceberg/pull/11478#discussion_r1848422899 ## spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -154,7 +155,7 @@ public void testDeleteWithVectorized

Re: [PR] Core,Format: Deprecate embedded manifests [iceberg]

2024-11-19 Thread via GitHub
nastra commented on code in PR #11586: URL: https://github.com/apache/iceberg/pull/11586#discussion_r1848820398 ## format/spec.md: ## @@ -654,17 +654,17 @@ The `first_row_id` is only inherited for added data files. The inherited value m A snapshot consists of the following f

Re: [PR] Core,Format: Deprecate embedded manifests [iceberg]

2024-11-19 Thread via GitHub
amogh-jahagirdar commented on code in PR #11586: URL: https://github.com/apache/iceberg/pull/11586#discussion_r1848789823 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -158,6 +164,9 @@ static Snapshot fromJson(JsonNode node) { manifestList);

Re: [PR] Core,Format: Deprecate embedded manifests [iceberg]

2024-11-19 Thread via GitHub
amogh-jahagirdar commented on code in PR #11586: URL: https://github.com/apache/iceberg/pull/11586#discussion_r1848789823 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -158,6 +164,9 @@ static Snapshot fromJson(JsonNode node) { manifestList);

Re: [PR] Core,Format: Deprecate embedded manifests [iceberg]

2024-11-19 Thread via GitHub
amogh-jahagirdar commented on code in PR #11586: URL: https://github.com/apache/iceberg/pull/11586#discussion_r1848789823 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -158,6 +164,9 @@ static Snapshot fromJson(JsonNode node) { manifestList);

  1   2   >