Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
mattmartin14 commented on PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2650876043 @kevinjqliu - I manually added all the lint fixes you called out as well as updated my the poetry toml and lock files with the new ones you provided. thank you so much and i'm

[PR] Remove `_task_to_table` [iceberg-python]

2025-02-11 Thread via GitHub
Fokko opened a new pull request, #1643: URL: https://github.com/apache/iceberg-python/pull/1643 Seems not being used. Less is more! Noticed this while reviewing https://github.com/apache/iceberg-python/pull/1388 -- This is an automated message from the Apache Git Service. To respon

Re: [PR] API, Core: Support default values in UpdateSchema [iceberg]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #12211: URL: https://github.com/apache/iceberg/pull/12211#discussion_r1950767278 ## api/src/main/java/org/apache/iceberg/UpdateSchema.java: ## @@ -125,16 +185,23 @@ default UpdateSchema addColumn(String parent, String name, Type type) { * @para

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
mattmartin14 commented on PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2650818707 > @mattmartin14 i tried to fix as many linter issue as possible, the changes are in https://github.com/apache/iceberg-python/compare/main...kevinjqliu:iceberg-python:kevinjqli

[PR] Support reading initial-defaults [iceberg-python]

2025-02-11 Thread via GitHub
Fokko opened a new pull request, #1644: URL: https://github.com/apache/iceberg-python/pull/1644 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Docs: Add documentation for Rate limiting in Spark Structured Streaming [iceberg]

2025-02-11 Thread via GitHub
singhpk234 commented on PR #12217: URL: https://github.com/apache/iceberg/pull/12217#issuecomment-2651291767 cc @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Remove `_task_to_table` [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu merged PR #1643: URL: https://github.com/apache/iceberg-python/pull/1643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Add ResidualVisitor to compute residuals [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#discussion_r1950654390 ## tests/expressions/test_residual_evaluator.py: ## @@ -0,0 +1,251 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] Remove old metadata [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on code in PR #1607: URL: https://github.com/apache/iceberg-python/pull/1607#discussion_r1951160511 ## tests/catalog/test_sql.py: ## @@ -1613,3 +1614,50 @@ def test_merge_manifests_local_file_system(catalog: SqlCatalog, arrow_table_with tbl.append(

Re: [I] Delete Files in Table Scans [iceberg-rust]

2025-02-11 Thread via GitHub
ZENOTME commented on issue #630: URL: https://github.com/apache/iceberg-rust/issues/630#issuecomment-2651384092 Thanks for this great job! @sdd Should we also consider the case that `enum Deletes` occupy too much space so we need to support spilling it into disk? -- This is an automated m

[PR] Core: Adjust Jackson settings to handle large metadata json [iceberg]

2025-02-11 Thread via GitHub
bryanck opened a new pull request, #12224: URL: https://github.com/apache/iceberg/pull/12224 With very large table metadata json, for example, those with many snapshots with partition summaries, we sometimes encounter errors involving hash collisions when loading the metadata. This PR disab

Re: [PR] Core: Fix numeric overflow of timestamp nano literal [iceberg]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #11775: URL: https://github.com/apache/iceberg/pull/11775#discussion_r1951168267 ## api/src/main/java/org/apache/iceberg/expressions/Literals.java: ## @@ -300,8 +300,7 @@ public Literal to(Type type) { case TIMESTAMP: return (Li

Re: [I] UpdateSchema.add_column doesn't support adding parent and child in the same transaction [iceberg]

2025-02-11 Thread via GitHub
singhpk234 commented on issue #12223: URL: https://github.com/apache/iceberg/issues/12223#issuecomment-2651327728 Q: does it works for scenario ? ``` Transaction t = table.newTransaction(); UpdateSchema uSchema1 = t.updateSchema(); uSchema1.addColumn("myparent", parentType)

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on code in PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#discussion_r1951181666 ## pyiceberg/catalog/__init__.py: ## @@ -746,6 +747,24 @@ def _convert_schema_if_needed(schema: Union[Schema, "pa.Schema"]) -> Schema: pass

Re: [I] Create table format version constants [iceberg-python]

2025-02-11 Thread via GitHub
iyad-f commented on issue #851: URL: https://github.com/apache/iceberg-python/issues/851#issuecomment-2651411960 @kevinjqliu ok i will be starting this now, and will make a PR soon for it -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Improve ThreadPools for graceful shutdown [iceberg]

2025-02-11 Thread via GitHub
pvary commented on issue #12220: URL: https://github.com/apache/iceberg/issues/12220#issuecomment-2651461828 You might want to take a look at the discussion here: https://lists.apache.org/thread/mowmbr36y8wr1k9don2xx36l97n5f1xz. This could be the starting point here -- This is an automat

Re: [PR] API, Core: Support default values in UpdateSchema [iceberg]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #12211: URL: https://github.com/apache/iceberg/pull/12211#discussion_r1950761510 ## api/src/main/java/org/apache/iceberg/UpdateSchema.java: ## @@ -67,24 +70,52 @@ default UpdateSchema addColumn(String name, Type type) { } /** - * Add a new

Re: [PR] API, Core: Support default values in UpdateSchema [iceberg]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #12211: URL: https://github.com/apache/iceberg/pull/12211#discussion_r1950769476 ## api/src/main/java/org/apache/iceberg/UpdateSchema.java: ## @@ -169,13 +239,41 @@ default UpdateSchema addRequiredColumn(String name, Type type) { * @return this

Re: [PR] Add ResidualVisitor to compute residuals [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#discussion_r1950682935 ## tests/expressions/test_residual_evaluator.py: ## @@ -0,0 +1,251 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] Add ResidualVisitor to compute residuals [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#discussion_r1950707206 ## pyiceberg/table/__init__.py: ## @@ -1466,6 +1475,25 @@ def _build_partition_evaluator(self, spec_id: int) -> Callable[[DataFile], bool] # shared insta

[I] UpdateSchema.add_column doesn't support adding parent and child in the same transaction [iceberg]

2025-02-11 Thread via GitHub
brunomendola opened a new issue, #12223: URL: https://github.com/apache/iceberg/issues/12223 ### Apache Iceberg version 1.5.0 ### Query engine Spark ### Please describe the bug 🐞 Currently we cannot add the parent field with its child nested field in the sa

Re: [PR] API, Core: Support default values in UpdateSchema [iceberg]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #12211: URL: https://github.com/apache/iceberg/pull/12211#discussion_r1950799826 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -322,17 +303,45 @@ public UpdateSchema updateColumnDoc(String name, String doc) { // merge wit

Re: [PR] API, Core: Support default values in UpdateSchema [iceberg]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #12211: URL: https://github.com/apache/iceberg/pull/12211#discussion_r1950808951 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -322,17 +303,45 @@ public UpdateSchema updateColumnDoc(String name, String doc) { // merge wit

Re: [PR] fix: Misleading error messages in `iceberg-catalog-rest` and allow `StatusCode::OK` in responses [iceberg-rust]

2025-02-11 Thread via GitHub
Xuanwo merged PR #962: URL: https://github.com/apache/iceberg-rust/pull/962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] API, Core: Support default values in UpdateSchema [iceberg]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #12211: URL: https://github.com/apache/iceberg/pull/12211#discussion_r1950756477 ## api/src/main/java/org/apache/iceberg/types/Types.java: ## @@ -583,6 +589,26 @@ private Builder(NestedField toCopy) { this.writeDefault = toCopy.writeDefault

Re: [PR] Core: Adjust Jackson settings to handle large metadata json [iceberg]

2025-02-11 Thread via GitHub
bryanck commented on PR #12224: URL: https://github.com/apache/iceberg/pull/12224#issuecomment-2651697343 > @bryanck I didn't quite get the partition summary field names. were you referring to `PartitionFieldSummaryParser`? it seems to have just 4 field names. > > String.intern can be

Re: [PR] Core: Adjust Jackson settings to handle large metadata json [iceberg]

2025-02-11 Thread via GitHub
bryanck commented on PR #12224: URL: https://github.com/apache/iceberg/pull/12224#issuecomment-2651836612 > > Canonicalization can help when field names are reused within a single metadata file, so that seemed helpful still. > > canonicalization lifecycle is scoped to a single metadat

Re: [PR] Core: Adjust Jackson settings to handle large metadata json [iceberg]

2025-02-11 Thread via GitHub
bryanck commented on PR #12224: URL: https://github.com/apache/iceberg/pull/12224#issuecomment-2651840830 > > > Canonicalization can help when field names are reused within a single metadata file, so that seemed helpful still. > > > > > > canonicalization lifecycle is scoped to a

Re: [PR] ci(dependabot): fix dependabot config [iceberg-go]

2025-02-11 Thread via GitHub
zeroshade merged PR #299: URL: https://github.com/apache/iceberg-go/pull/299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[PR] build(deps): bump the gomod_updates group with 4 updates [iceberg-go]

2025-02-11 Thread via GitHub
dependabot[bot] opened a new pull request, #300: URL: https://github.com/apache/iceberg-go/pull/300 Bumps the gomod_updates group with 4 updates: [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2), [github.com/aws/aws-sdk-go-v2/service/glue](https://github.com/aws/a

Re: [PR] Add ResidualVisitor to compute residuals [iceberg-python]

2025-02-11 Thread via GitHub
mrutunjay-kinagi commented on PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#issuecomment-2651843972 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#discussion_r1951471241 ## pyiceberg/table/upsert_util.py: ## @@ -0,0 +1,131 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. S

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#discussion_r1951471685 ## pyiceberg/table/upsert_util.py: ## @@ -0,0 +1,131 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. S

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on code in PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#discussion_r1951477462 ## pyiceberg/table/upsert_util.py: ## @@ -0,0 +1,131 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. S

Re: [PR] Core: Adjust Jackson settings to handle large metadata json [iceberg]

2025-02-11 Thread via GitHub
bryanck commented on PR #12224: URL: https://github.com/apache/iceberg/pull/12224#issuecomment-2652018035 I switched back to the original change, to just disable intern and the hash collision check. Disabling canonicalization can impact performance significantly. -- This is an automated

[PR] API: Deprecate NestedType.of in favor of builder [iceberg]

2025-02-11 Thread via GitHub
rdblue opened a new pull request, #12227: URL: https://github.com/apache/iceberg/pull/12227 This is a follow up to #12211. While adding support for default values in `UpdateSchema`, many of the changes were to use the `NestedField` builder's copy constructor, `from(NestedField)`, so that fi

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
marcoaanogueira commented on code in PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#discussion_r1951669632 ## pyiceberg/table/__init__.py: ## @@ -1086,6 +1094,78 @@ def name_mapping(self) -> Optional[NameMapping]: """Return the table's field-id NameMa

Re: [PR] API, Core: Support default values in UpdateSchema [iceberg]

2025-02-11 Thread via GitHub
rdblue commented on code in PR #12211: URL: https://github.com/apache/iceberg/pull/12211#discussion_r1951673332 ## api/src/main/java/org/apache/iceberg/UpdateSchema.java: ## @@ -280,6 +410,30 @@ default UpdateSchema updateColumn(String name, Type.PrimitiveType newType, Strin

Re: [PR] API, Core: Support default values in UpdateSchema [iceberg]

2025-02-11 Thread via GitHub
rdblue commented on code in PR #12211: URL: https://github.com/apache/iceberg/pull/12211#discussion_r1951676633 ## api/src/main/java/org/apache/iceberg/UpdateSchema.java: ## @@ -125,16 +185,23 @@ default UpdateSchema addColumn(String parent, String name, Type type) { * @par

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on code in PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#discussion_r1951696715 ## pyiceberg/table/update/snapshot.py: ## @@ -84,14 +84,14 @@ from pyiceberg.table import Transaction -def _new_manifest_path(location: str, num: int

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#issuecomment-2652211414 cc @Fokko @smaheshwar-pltr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Build: skip scheduled docker image publish workflows on forks [iceberg]

2025-02-11 Thread via GitHub
nastra merged PR #12218: URL: https://github.com/apache/iceberg/pull/12218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Docs: Add missing types to the spec v3 summary [iceberg]

2025-02-11 Thread via GitHub
nastra merged PR #12219: URL: https://github.com/apache/iceberg/pull/12219 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

[PR] Core: Remove duplicate definitions of MAX_FILE_GROUP_SIZE_BYTES [iceberg]

2025-02-11 Thread via GitHub
manuzhang opened a new pull request, #1: URL: https://github.com/apache/iceberg/pull/1 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Core: Fix numeric overflow of timestamp nano literal [iceberg]

2025-02-11 Thread via GitHub
jacobmarble commented on code in PR #11775: URL: https://github.com/apache/iceberg/pull/11775#discussion_r1951255807 ## api/src/main/java/org/apache/iceberg/expressions/Literals.java: ## @@ -300,8 +300,7 @@ public Literal to(Type type) { case TIMESTAMP: retu

Re: [PR] fix: Misleading error messages in `iceberg-catalog-rest` and allow `StatusCode::OK` in responses [iceberg-rust]

2025-02-11 Thread via GitHub
connortsui20 commented on PR #962: URL: https://github.com/apache/iceberg-rust/pull/962#issuecomment-2651484844 Ok so it turns out this introduces a bug where `namespace_exists` getting back a `StatusCode::OK` actually means that the namespace doesn't exist... I'm digging into this more -

Re: [PR] Core: Fix numeric overflow of timestamp nano literal [iceberg]

2025-02-11 Thread via GitHub
jacobmarble commented on code in PR #11775: URL: https://github.com/apache/iceberg/pull/11775#discussion_r1951255807 ## api/src/main/java/org/apache/iceberg/expressions/Literals.java: ## @@ -300,8 +300,7 @@ public Literal to(Type type) { case TIMESTAMP: retu

Re: [PR] Core: Fix numeric overflow of timestamp nano literal [iceberg]

2025-02-11 Thread via GitHub
jacobmarble commented on code in PR #11775: URL: https://github.com/apache/iceberg/pull/11775#discussion_r1951259929 ## api/src/main/java/org/apache/iceberg/expressions/Literals.java: ## @@ -300,8 +300,7 @@ public Literal to(Type type) { case TIMESTAMP: retu

Re: [I] How do I find if there is residual in the table scan/plan files? [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on issue #785: URL: https://github.com/apache/iceberg-python/issues/785#issuecomment-2651522031 ResidualEvaluator has been added in #1388. Closing this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] How do I find if there is residual in the table scan/plan files? [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu closed issue #785: How do I find if there is residual in the table scan/plan files? URL: https://github.com/apache/iceberg-python/issues/785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Update documentation / add missing Iceberg table read properties [iceberg]

2025-02-11 Thread via GitHub
cornelcreanga commented on code in PR #12163: URL: https://github.com/apache/iceberg/pull/12163#discussion_r1951287418 ## docs/docs/configuration.md: ## @@ -26,114 +26,117 @@ Iceberg tables support table properties to configure table behavior, like the de ### Read properties

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
mattmartin14 commented on PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2651601160 @Fokko @kevinjqliu - i was able to run the "make lint" command and I've resolved i think 99% of it; the only thing that make lint is flagging is: ```bash pyiceberg/t

Re: [I] software.amazon.awssdk.services.s3.model.S3Exception: The bucket you are attempting to access must be addressed using the specified endpoint. [iceberg]

2025-02-11 Thread via GitHub
steveloughran commented on issue #11997: URL: https://github.com/apache/iceberg/issues/11997#issuecomment-2651610540 look at the s3a troubleshooting docs. Tip: extended request IDs *always* indicate you are talking to an AWS endpoing -- This is an automated message from the Apache Git Ser

[PR] Implement table format version enum [iceberg-python]

2025-02-11 Thread via GitHub
iyad-f opened a new pull request, #1645: URL: https://github.com/apache/iceberg-python/pull/1645 This PR addresses issue #851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Add ResidualVisitor to compute residuals [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388#issuecomment-2650619004 Thanks @tusharchou for working on this 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add ResidualVisitor to compute residuals [iceberg-python]

2025-02-11 Thread via GitHub
Fokko merged PR #1388: URL: https://github.com/apache/iceberg-python/pull/1388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2025-02-11 Thread via GitHub
Fokko closed issue #1223: Count rows as a metadata-only operation URL: https://github.com/apache/iceberg-python/issues/1223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Count rows as a metadata-only operation [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on issue #1223: URL: https://github.com/apache/iceberg-python/issues/1223#issuecomment-2650625059 Closing this issue, https://github.com/apache/iceberg-python/pull/1388 has been merged. Thanks everyone! -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
mattmartin14 commented on PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2651174761 > > otherwise, given my company's firewall rules, i won't ever be able to reach them. > > That's strict! > > I'm also happy to create a PR to the `StateFarmIns:ma

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
Fokko commented on PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2651078297 > otherwise, given my company's firewall rules, i won't ever be able to reach them. That's strict! I'm also happy to create a PR to the `StateFarmIns:main` branch to fix

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-02-11 Thread via GitHub
mattmartin14 commented on PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2651624948 @kevinjqliu - I also updated the original description of this PR. Please let me know if you have any more recommendations/tweaks you would like to do on it. Thanks,

Re: [I] [feature] Table Scan should take into account the table's sort order [iceberg-python]

2025-02-11 Thread via GitHub
iyad-f commented on issue #1637: URL: https://github.com/apache/iceberg-python/issues/1637#issuecomment-2651641432 I would like to work on this, but i need a bit of clarification on what to do exactly? -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] Core: Make totalRecordCount optional in PartitionStats [iceberg]

2025-02-11 Thread via GitHub
ajantha-bhat opened a new pull request, #12226: URL: https://github.com/apache/iceberg/pull/12226 Spec was already optional for totalRecordCount. https://iceberg.apache.org/spec/#partition-statistics-file During the implementation, we decided to make all the counters to be initiali

Re: [I] Configure timestamp downcast programmatically [iceberg-python]

2025-02-11 Thread via GitHub
github-actions[bot] commented on issue #960: URL: https://github.com/apache/iceberg-python/issues/960#issuecomment-2652346846 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apac

Re: [I] Kafka Connect: Include design docs [iceberg]

2025-02-11 Thread via GitHub
github-actions[bot] commented on issue #10841: URL: https://github.com/apache/iceberg/issues/10841#issuecomment-2652343862 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] Kafka Connect: Include design docs [iceberg]

2025-02-11 Thread via GitHub
github-actions[bot] closed issue #10841: Kafka Connect: Include design docs URL: https://github.com/apache/iceberg/issues/10841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Build: Bump mkdocstrings-python from 1.14.6 to 1.15.0 [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu merged PR #1649: URL: https://github.com/apache/iceberg-python/pull/1649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [I] Consolidate catalog behavior [iceberg-python]

2025-02-11 Thread via GitHub
github-actions[bot] commented on issue #813: URL: https://github.com/apache/iceberg-python/issues/813#issuecomment-2652346865 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity oc

Re: [I] Configure timestamp downcast programmatically [iceberg-python]

2025-02-11 Thread via GitHub
github-actions[bot] closed issue #960: Configure timestamp downcast programmatically URL: https://github.com/apache/iceberg-python/issues/960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Core: Adjust Jackson settings to handle large metadata json [iceberg]

2025-02-11 Thread via GitHub
bryanck commented on PR #12224: URL: https://github.com/apache/iceberg/pull/12224#issuecomment-2652364540 > [for my understanding] I thought we had a way to lazy load metadata in REST, the complete metadata parsing would only be required at the time of commit ? Are all the tables write heav

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-02-11 Thread via GitHub
stevenzwu commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1951829171 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition statist

Re: [PR] Spec: Allow Equality Deletes with Row Lineage and Define Behavior [iceberg]

2025-02-11 Thread via GitHub
stevenzwu commented on code in PR #12230: URL: https://github.com/apache/iceberg/pull/12230#discussion_r1951840824 ## format/spec.md: ## @@ -392,7 +392,7 @@ In v3 and later, an Iceberg table can track row lineage fields for all newly cre These fields are assigned and updated

Re: [PR] Spec: Allow Equality Deletes with Row Lineage and Define Behavior [iceberg]

2025-02-11 Thread via GitHub
stevenzwu commented on code in PR #12230: URL: https://github.com/apache/iceberg/pull/12230#discussion_r1951840824 ## format/spec.md: ## @@ -392,7 +392,7 @@ In v3 and later, an Iceberg table can track row lineage fields for all newly cre These fields are assigned and updated

[I] java.lang.ClassNotFoundException: org.apache.iceberg.spark.actions.ManifestFileBeanBeanInfo [iceberg]

2025-02-11 Thread via GitHub
melin opened a new issue, #12231: URL: https://github.com/apache/iceberg/issues/12231 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 ``` dfs://master:8020/user/superior/spark-jobserver/tempJars/laOAlRtRLJogYzjQPkBLkp0

Re: [PR] Spec: Allow Equality Deletes with Row Lineage and Define Behavior [iceberg]

2025-02-11 Thread via GitHub
pvary commented on code in PR #12230: URL: https://github.com/apache/iceberg/pull/12230#discussion_r1952013431 ## format/spec.md: ## @@ -1766,4 +1766,4 @@ The Geometry and Geography class hierarchy and its Well-known text (WKT) and Wel Points are always defined by the coordi

Re: [PR] Spec: Allow Equality Deletes with Row Lineage and Define Behavior [iceberg]

2025-02-11 Thread via GitHub
pvary commented on code in PR #12230: URL: https://github.com/apache/iceberg/pull/12230#discussion_r1952054468 ## format/spec.md: ## @@ -392,7 +392,7 @@ In v3 and later, an Iceberg table can track row lineage fields for all newly cre These fields are assigned and updated by

Re: [PR] Spec: Allow Equality Deletes with Row Lineage and Define Behavior [iceberg]

2025-02-11 Thread via GitHub
singhpk234 commented on code in PR #12230: URL: https://github.com/apache/iceberg/pull/12230#discussion_r1952049522 ## format/spec.md: ## @@ -392,7 +392,7 @@ In v3 and later, an Iceberg table can track row lineage fields for all newly cre These fields are assigned and update

Re: [PR] [WIP] Ignore UnknownType in General Parquet Writer [iceberg]

2025-02-11 Thread via GitHub
HonahX commented on code in PR #12177: URL: https://github.com/apache/iceberg/pull/12177#discussion_r1952090213 ## parquet/src/main/java/org/apache/iceberg/parquet/TypeToMessageType.java: ## @@ -56,6 +56,10 @@ public class TypeToMessageType { LogicalTypeAnnotation.timesta

Re: [PR] Materialized View Spec [iceberg]

2025-02-11 Thread via GitHub
JanKaul commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1952094747 ## format/view-spec.md: ## @@ -160,6 +179,56 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when t

Re: [PR] chore: use RowSelection::union from arrow-rs [iceberg-rust]

2025-02-11 Thread via GitHub
Xuanwo merged PR #953: URL: https://github.com/apache/iceberg-rust/pull/953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [I] Update Arrow deps once they release a version containing `RowSelection::union' [iceberg-rust]

2025-02-11 Thread via GitHub
Xuanwo closed issue #605: Update Arrow deps once they release a version containing `RowSelection::union' URL: https://github.com/apache/iceberg-rust/issues/605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Spec: Update partition stats for V3 [iceberg]

2025-02-11 Thread via GitHub
aokolnychyi commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1951754710 ## format/spec.md: ## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition stati

Re: [PR] feat: Make some REST methods public [iceberg-rust]

2025-02-11 Thread via GitHub
peasee closed pull request #922: feat: Make some REST methods public URL: https://github.com/apache/iceberg-rust/pull/922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] feat: Make some REST methods public [iceberg-rust]

2025-02-11 Thread via GitHub
peasee commented on PR #922: URL: https://github.com/apache/iceberg-rust/pull/922#issuecomment-2652299260 Thanks for your response! I ended up taking a different direction that does not use Iceberg, so I'll be closing this PR. -- This is an automated message from the Apache Git Service. T

Re: [I] Rate limiting feature for structured streaming [iceberg]

2025-02-11 Thread via GitHub
wypoon commented on issue #7885: URL: https://github.com/apache/iceberg/issues/7885#issuecomment-2652290719 @singhpk234 for my understanding, can you please confirm or refute the following -- Suppose streaming-max-rows-per-micro-batch = 1000 and streaming-max-files-per-micro-batch > 1. S

Re: [I] Add unit tests for ColumnarBatchUtil using mocking [iceberg]

2025-02-11 Thread via GitHub
anuragmantri commented on issue #12054: URL: https://github.com/apache/iceberg/issues/12054#issuecomment-2652300353 Hi @ManasiRN @Monika-Rajendran-97 - Thanks for your interest. I have started working on this, but I don't wish to block progress. Please feel free to submit your patches. If

Re: [PR] Build: Bump mkdocs-autorefs from 1.3.0 to 1.3.1 [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu merged PR #1650: URL: https://github.com/apache/iceberg-python/pull/1650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Build: Bump griffe from 1.5.6 to 1.5.7 [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu merged PR #1647: URL: https://github.com/apache/iceberg-python/pull/1647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Build: Bump mkdocstrings-python from 1.14.6 to 1.15.0 [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on PR #1649: URL: https://github.com/apache/iceberg-python/pull/1649#issuecomment-2652322297 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] Build: Bump cython from 3.0.11 to 3.0.12 [iceberg-python]

2025-02-11 Thread via GitHub
dependabot[bot] opened a new pull request, #1646: URL: https://github.com/apache/iceberg-python/pull/1646 Bumps [cython](https://github.com/cython/cython) from 3.0.11 to 3.0.12. Changelog Sourced from https://github.com/cython/cython/blob/master/CHANGES.rst";>cython's changelog.

[PR] Build: Bump griffe from 1.5.6 to 1.5.7 [iceberg-python]

2025-02-11 Thread via GitHub
dependabot[bot] opened a new pull request, #1647: URL: https://github.com/apache/iceberg-python/pull/1647 Bumps [griffe](https://github.com/mkdocstrings/griffe) from 1.5.6 to 1.5.7. Release notes Sourced from https://github.com/mkdocstrings/griffe/releases";>griffe's releases.

Re: [PR] Core: Adjust Jackson settings to handle large metadata json [iceberg]

2025-02-11 Thread via GitHub
stevenzwu commented on PR #12224: URL: https://github.com/apache/iceberg/pull/12224#issuecomment-265384 @bryanck thanks for the experimentation with canonicalization. do you have any micro/jmh benchmark for the parser performance? if yes, maybe it would be useful to add it to the Iceber

Re: [PR] Spec: Typo - missing be [iceberg]

2025-02-11 Thread via GitHub
RussellSpitzer merged PR #12229: URL: https://github.com/apache/iceberg/pull/12229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [I] Add unit tests for ColumnarBatchUtil using mocking [iceberg]

2025-02-11 Thread via GitHub
ManasiRN commented on issue #12054: URL: https://github.com/apache/iceberg/issues/12054#issuecomment-2652282252 Hi @anuragmantri, I’d like to contribute to this issue by adding unit tests for ColumnarBatchUtil using mocking. Let me know if you have any specific considerations or if you’ve a

Re: [I] Rate limiting feature for structured streaming [iceberg]

2025-02-11 Thread via GitHub
singhpk234 commented on issue #7885: URL: https://github.com/apache/iceberg/issues/7885#issuecomment-2652333168 yes, that true @wypoon, presently its the limitation as for the initial implementation, I didn't wanted to block it on opening a file and reading it partially. As there we

Re: [I] Improve ThreadPools for graceful shutdown [iceberg]

2025-02-11 Thread via GitHub
ochanism commented on issue #12220: URL: https://github.com/apache/iceberg/issues/12220#issuecomment-2652510589 @pvary Oh, thanks for the information! That's very helpful to understand the current status. Here are my issue details. - error stack trace ``` server error: Er

Re: [PR] Feat/add support kerberize hivemetastore [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on code in PR #1634: URL: https://github.com/apache/iceberg-python/pull/1634#discussion_r1951938200 ## pyproject.toml: ## @@ -80,6 +80,8 @@ sqlalchemy = { version = "^2.0.18", optional = true } getdaft = { version = ">=0.2.12", optional = true } cachetools

Re: [PR] partitioned write support [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on PR #353: URL: https://github.com/apache/iceberg-python/pull/353#issuecomment-2652609001 hey @sungwy @jqin61 just wanted to double check that this PR is no longer relevant. I believe all components of partitioned write support has been already merged -- This is an

Re: [PR] Core,Api: Add overwrite option when register external table to catalog [iceberg]

2025-02-11 Thread via GitHub
dramaticlly commented on PR #12228: URL: https://github.com/apache/iceberg/pull/12228#issuecomment-2652589821 [Java CI Failure](https://github.com/apache/iceberg/actions/runs/13275773190/job/37064995172?pr=12228) is timing out on concurrent fast append and seems unrelated to the change.

Re: [PR] Clean up old metadata [iceberg-python]

2025-02-11 Thread via GitHub
kevinjqliu commented on code in PR #1607: URL: https://github.com/apache/iceberg-python/pull/1607#discussion_r1951956541 ## tests/catalog/test_sql.py: ## @@ -1613,3 +1614,50 @@ def test_merge_manifests_local_file_system(catalog: SqlCatalog, arrow_table_with tbl.append(

  1   2   >