Re: [PR] Docs: Add newline so that subsection is correctly rendered [iceberg]

2024-02-05 Thread via GitHub
Fokko merged PR #9656: URL: https://github.com/apache/iceberg/pull/9656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Spark: Move the Writer to a visitor [iceberg]

2024-02-05 Thread via GitHub
Fokko commented on code in PR #9440: URL: https://github.com/apache/iceberg/pull/9440#discussion_r1479321515 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java: ## @@ -136,46 +135,126 @@ private ParquetValueWriter newOption(Type fieldType,

Re: [I] python dependency nightmare [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on issue #375: URL: https://github.com/apache/iceberg-python/issues/375#issuecomment-1928922071 @djouallah Thanks for raising this. Could you double check the Pydantic version in your Python environment? ``` python3 Python 3.11.7 (main, Dec 4 2023, 18:10:11) [Cl

Re: [PR] Add Daft examples and code into PyIceberg docs and Table [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on PR #355: URL: https://github.com/apache/iceberg-python/pull/355#issuecomment-1928918841 @jaychia Recently Hive integration tests have been added. First, you want to make sure that you're on a recent version of Docker. Also, it is good to periodically run `make test-integr

[I] python dependency nightmare [iceberg-python]

2024-02-05 Thread via GitHub
djouallah opened a new issue, #375: URL: https://github.com/apache/iceberg-python/issues/375 ### Apache Iceberg version main (development) ### Please describe the bug 🐞 trying to install rc2 in Fabric runtime ``` Installing collected packages: sortedcontainers,

Re: [PR] Add release schedule on the releases page [iceberg-docs]

2024-02-05 Thread via GitHub
jbonofre commented on PR #298: URL: https://github.com/apache/iceberg-docs/pull/298#issuecomment-1928881243 @bitsondatadev sure thing ! I will. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Implement Centralized Management of Table Properties [iceberg-python]

2024-02-05 Thread via GitHub
HonahX commented on issue #365: URL: https://github.com/apache/iceberg-python/issues/365#issuecomment-1928871173 I can take this if no one has started it :). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Partition Evolution [iceberg-python]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1479272548 ## pyiceberg/table/__init__.py: ## @@ -871,6 +924,12 @@ def sort_orders(self) -> Dict[int, SortOrder]: """Return a dict of the sort orders of thi

Re: [PR] Partition Evolution [iceberg-python]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1479272548 ## pyiceberg/table/__init__.py: ## @@ -871,6 +924,12 @@ def sort_orders(self) -> Dict[int, SortOrder]: """Return a dict of the sort orders of thi

Re: [PR] Partition Evolution [iceberg-python]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1479267152 ## tests/test_integration_partition_evolution.py: ## @@ -0,0 +1,423 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributo

Re: [PR] Partition Evolution [iceberg-python]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1479267152 ## tests/test_integration_partition_evolution.py: ## @@ -0,0 +1,423 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributo

Re: [PR] Partition Evolution [iceberg-python]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1479266072 ## tests/catalog/test_hive.py: ## @@ -277,7 +277,7 @@ def test_create_table(table_schema_simple: Schema, hive_database: HiveDatabase, )

Re: [PR] Partition Evolution [iceberg-python]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1479265846 ## pyiceberg/table/metadata.py: ## @@ -308,7 +308,8 @@ def construct_partition_specs(cls, data: Dict[str, Any]) -> Dict[str, Any]: data[

Re: [PR] Partition Evolution [iceberg-python]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1479262998 ## pyiceberg/table/__init__.py: ## @@ -533,6 +551,39 @@ def _(update: SetCurrentSchemaUpdate, base_metadata: TableMetadata, context: _Ta return base

Re: [I] InMemoryCatalog's FiloIO in memory map isn't persistent in RestCatalog [iceberg]

2024-02-05 Thread via GitHub
geruh commented on issue #9604: URL: https://github.com/apache/iceberg/issues/9604#issuecomment-1928799128 Based on my current understanding, the `InMemoryCatalog` serves as the supporting catalog for the RESTCatalog. When a table is created the RESTTableOperations, returns a ResolvingFileI

Re: [PR] Support merge manifests on writes [iceberg-python]

2024-02-05 Thread via GitHub
HonahX commented on code in PR #363: URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1479215719 ## pyiceberg/table/__init__.py: ## @@ -2411,11 +2428,29 @@ def _fetch_existing_manifests() -> List[ManifestFile]: executor = ExecutorFactory.get_or_creat

Re: [PR] Build: Bump adlfs from 2024.1.0 to 2024.2.0 [iceberg-python]

2024-02-05 Thread via GitHub
HonahX merged PR #372: URL: https://github.com/apache/iceberg-python/pull/372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] Implement Centralized Management of Table Properties [iceberg-python]

2024-02-05 Thread via GitHub
HonahX commented on issue #365: URL: https://github.com/apache/iceberg-python/issues/365#issuecomment-1928773118 @Fokko Sounds good. My initial thought was to use a new file because table/__init__.py already has a substantial amount of content, and it's likely to grow as we develop write su

Re: [PR] Build: Bump adlfs from 2024.1.0 to 2024.2.0 [iceberg-python]

2024-02-05 Thread via GitHub
HonahX commented on PR #372: URL: https://github.com/apache/iceberg-python/pull/372#issuecomment-1928770299 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Build: Bump mypy-boto3-glue from 1.34.32 to 1.34.35 [iceberg-python]

2024-02-05 Thread via GitHub
HonahX merged PR #371: URL: https://github.com/apache/iceberg-python/pull/371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Build: Bump mkdocs-material from 9.5.6 to 9.5.7 [iceberg-python]

2024-02-05 Thread via GitHub
HonahX merged PR #369: URL: https://github.com/apache/iceberg-python/pull/369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] feat: add parquet writer [iceberg-rust]

2024-02-05 Thread via GitHub
ZENOTME commented on PR #176: URL: https://github.com/apache/iceberg-rust/pull/176#issuecomment-1928766039 I think this writer is ready to go now. PTAL @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [WIP] Migrate SparkExtensions DDL sub-classes to JUnit5 [iceberg]

2024-02-05 Thread via GitHub
tomtongue commented on PR #9624: URL: https://github.com/apache/iceberg/pull/9624#issuecomment-1928754268 @nastra Change the following 7 DDL extensions to JUnit 5 and AssertJ style. For now `Assertions.assertThatThrownBy` is not changed, and not statically imported (`assertThat` is s

Re: [PR] Spec: Clarify multi-arg transform behavior for different versions [iceberg]

2024-02-05 Thread via GitHub
manuzhang commented on code in PR #9661: URL: https://github.com/apache/iceberg/pull/9661#discussion_r1479179241 ## format/spec.md: ## @@ -1117,7 +1117,17 @@ Partition specs are serialized as a JSON object with the following fields: |**`spec-id`**|`JSON int`|`0`| |**`fields`*

Re: [PR] Spec: Clarify multi-arg transform behavior for different versions [iceberg]

2024-02-05 Thread via GitHub
manuzhang commented on code in PR #9661: URL: https://github.com/apache/iceberg/pull/9661#discussion_r1479178945 ## format/spec.md: ## @@ -1150,13 +1157,17 @@ Sort orders are serialized as a list of JSON object, each of which contains the Each sort field in the fields list i

Re: [PR] Spec: Clarify multi-arg transform behavior for different versions [iceberg]

2024-02-05 Thread via GitHub
manuzhang commented on code in PR #9661: URL: https://github.com/apache/iceberg/pull/9661#discussion_r1479178945 ## format/spec.md: ## @@ -1150,13 +1157,17 @@ Sort orders are serialized as a list of JSON object, each of which contains the Each sort field in the fields list i

Re: [PR] Build: Bump moto from 4.2.13 to 5.0.1 [iceberg-python]

2024-02-05 Thread via GitHub
dependabot[bot] closed pull request #370: Build: Bump moto from 4.2.13 to 5.0.1 URL: https://github.com/apache/iceberg-python/pull/370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Build: Bump moto from 4.2.13 to 5.0.1 [iceberg-python]

2024-02-05 Thread via GitHub
dependabot[bot] commented on PR #370: URL: https://github.com/apache/iceberg-python/pull/370#issuecomment-1928687775 Looks like moto is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Build: Bump moto from 4.2.13 to 5.0.1 [iceberg-python]

2024-02-05 Thread via GitHub
HonahX merged PR #373: URL: https://github.com/apache/iceberg-python/pull/373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spec: Clarify multi-arg transform behavior for different versions [iceberg]

2024-02-05 Thread via GitHub
advancedxy commented on PR #9661: URL: https://github.com/apache/iceberg/pull/9661#issuecomment-1928683803 Thanks for taking this over @szehon-ho, I will review it in today or tomorrow. > hence wanted to get this change in before the 1.5 release I do agree that we should get t

Re: [PR] API, Core: add multi-arg transform and add zOrder as the first one [iceberg]

2024-02-05 Thread via GitHub
advancedxy commented on PR #9662: URL: https://github.com/apache/iceberg/pull/9662#issuecomment-1928681862 @szehon-ho Thanks for taking #9661 over and sorry for the late response. I was busy finishing a big internal feature last week. I extracted the API/Core part from previous POC P

Re: [PR] Spec: Clarify multi-arg transform behavior for different versions [iceberg]

2024-02-05 Thread via GitHub
szehon-ho commented on PR #9661: URL: https://github.com/apache/iceberg/pull/9661#issuecomment-1928666370 @manuzhang i believe #8579 is not published yet, hence wanted to get this change in before the 1.5 release, if we want to add the clarification. -- This is an automated message from t

Re: [PR] Hive: Refactor hive-table commit operation to be used for other operations like view [iceberg]

2024-02-05 Thread via GitHub
szehon-ho commented on code in PR #9461: URL: https://github.com/apache/iceberg/pull/9461#discussion_r1479132215 ## core/src/main/java/org/apache/iceberg/BaseMetastoreOperations.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Spec: Clarify multi-arg transform behavior for different versions [iceberg]

2024-02-05 Thread via GitHub
manuzhang commented on PR #9661: URL: https://github.com/apache/iceberg/pull/9661#issuecomment-1928640445 Do we also need to update https://github.com/apache/iceberg/blob/main/site/docs/spec.md? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Reorganize identifier field requirement [iceberg-docs]

2024-02-05 Thread via GitHub
manuzhang closed pull request #306: Reorganize identifier field requirement URL: https://github.com/apache/iceberg-docs/pull/306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Spark 3.4: Support executor cache locality [iceberg]

2024-02-05 Thread via GitHub
amogh-jahagirdar merged PR #9658: URL: https://github.com/apache/iceberg/pull/9658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Spark 3.4: Support executor cache locality [iceberg]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on PR #9658: URL: https://github.com/apache/iceberg/pull/9658#issuecomment-1928592224 I'll go ahead and merge since this was a straightforward cherry-pick, thanks @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] Updating a property map in a iceberg table [iceberg]

2024-02-05 Thread via GitHub
namrathamyske commented on issue #9659: URL: https://github.com/apache/iceberg/issues/9659#issuecomment-1928560017 @amogh-jahagirdar There might be concurrency issues if we directly serialize it to json string. Ex: Current state of property-val key: property-val : { "a1":1 }

Re: [I] Updating a property map in a iceberg table [iceberg]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on issue #9659: URL: https://github.com/apache/iceberg/issues/9659#issuecomment-1928553290 IMO I'd prefer to keep the properties API in it's current simple state. To handle the case where the value is a map, a user could always just serialize the value into JSON

Re: [I] [InputFormat Followup] Add residual evaluation for Iceberg generics [iceberg]

2024-02-05 Thread via GitHub
github-actions[bot] commented on issue #866: URL: https://github.com/apache/iceberg/issues/866#issuecomment-1928542833 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Clean orphan data files [iceberg]

2024-02-05 Thread via GitHub
github-actions[bot] commented on issue #873: URL: https://github.com/apache/iceberg/issues/873#issuecomment-1928542850 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] [Proposal] An iceberg-unstructured module [iceberg]

2024-02-05 Thread via GitHub
github-actions[bot] commented on issue #859: URL: https://github.com/apache/iceberg/issues/859#issuecomment-1928542810 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

[PR] Refactor pyarrow to look like fsspec [iceberg-python]

2024-02-05 Thread via GitHub
kevinjqliu opened a new pull request, #374: URL: https://github.com/apache/iceberg-python/pull/374 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Spec: Clarify multi-arg transform behavior for different versions [iceberg]

2024-02-05 Thread via GitHub
szehon-ho commented on PR #9661: URL: https://github.com/apache/iceberg/pull/9661#issuecomment-1928524876 @rdblue @advancedxy @aokolnychyi @emkornfield , wanted to get the conversation started, on this proposal to clarify for V1-V3 behaviors for multi-arg transforms as discussed, let me kn

Re: [PR] Spec: Clarify multi-arg transform behavior for different versions [iceberg]

2024-02-05 Thread via GitHub
szehon-ho commented on code in PR #9661: URL: https://github.com/apache/iceberg/pull/9661#discussion_r1479063142 ## format/spec.md: ## @@ -1130,14 +1140,11 @@ Each partition field in the fields list is stored as an object. See the table fo |**`hour`**|`JSON string: "hour"`|`"h

[PR] Spec: Clarify iceberg-spec.md [iceberg]

2024-02-05 Thread via GitHub
szehon-ho opened a new pull request, #9661: URL: https://github.com/apache/iceberg/pull/9661 This pr clarifies multi-arg transform behavior in relation to different Iceberg versions. It proposes to make the behavior default in V3 but enabled in V1/V2 with a new table config. It also cleans

Re: [PR] Add section on how to use snapshots [iceberg-docs]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #162: URL: https://github.com/apache/iceberg-docs/pull/162#issuecomment-1928404168 @Fokko, would you mind moving this over to https://github.com/apache/iceberg/pulls please? -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Retypeset the Flink document [iceberg-docs]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #207: URL: https://github.com/apache/iceberg-docs/pull/207#issuecomment-1928401321 @amogh-jahagirdar would you mind closing this one as well please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Update how to release instructions with more details [iceberg-docs]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #210: URL: https://github.com/apache/iceberg-docs/pull/210#issuecomment-1928397077 @jackye1995, I'll be updating how to release. Once I'm done with that, would you mind moving this over to https://github.com/apache/iceberg/pulls please? -- This is an automate

Re: [PR] Fix branching and tagging images on 1.2.1 branch [iceberg-docs]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #231: URL: https://github.com/apache/iceberg-docs/pull/231#issuecomment-1928389000 @amogh-jahagirdar, would you mind moving this over to https://github.com/apache/iceberg/pulls please? Unless this should just be closed. -- This is an automated message

Re: [PR] add Wayang / DataBloom to vendors supporting Iceberg [iceberg-docs]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #283: URL: https://github.com/apache/iceberg-docs/pull/283#issuecomment-1928378162 Hey @2pk03, would you mind moving this over to https://github.com/apache/iceberg/pulls please? -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Add release schedule on the releases page [iceberg-docs]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #298: URL: https://github.com/apache/iceberg-docs/pull/298#issuecomment-1928373285 Hey @jbonofre, would you mind moving this over to https://github.com/apache/iceberg/pulls please? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Add a Chinese version Documentation & Fix bug while building website. [iceberg-docs]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #307: URL: https://github.com/apache/iceberg-docs/pull/307#issuecomment-1928369140 @Waterkin, would you mind moving this over to https://github.com/apache/iceberg/pulls please? This repository is being moved to the main repository. -- This is an automated me

Re: [PR] Reorganize identifier field requirement [iceberg-docs]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #306: URL: https://github.com/apache/iceberg-docs/pull/306#issuecomment-1928354012 @manuzhang, would you mind moving this over to https://github.com/apache/iceberg/pulls please? -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Update README.md tp reflect it is archived [iceberg-docs]

2024-02-05 Thread via GitHub
danielcweeks merged PR #310: URL: https://github.com/apache/iceberg-docs/pull/310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ice

Re: [PR] Update README.md tp reflect it is archived [iceberg-docs]

2024-02-05 Thread via GitHub
danielcweeks commented on code in PR #310: URL: https://github.com/apache/iceberg-docs/pull/310#discussion_r1479000645 ## README.md: ## @@ -17,170 +17,11 @@ - under the License. --> -# Apache Iceberg Documentation Site +# Apache Iceberg Documentation Site (Archived) -T

[PR] Build: Bump adlfs from 2024.1.0 to 2024.2.0 [iceberg-python]

2024-02-05 Thread via GitHub
dependabot[bot] opened a new pull request, #372: URL: https://github.com/apache/iceberg-python/pull/372 Bumps [adlfs](https://github.com/fsspec/adlfs) from 2024.1.0 to 2024.2.0. Release notes Sourced from https://github.com/fsspec/adlfs/releases";>adlfs's releases. 2024.2.0

[PR] Build: Bump mypy-boto3-glue from 1.34.32 to 1.34.35 [iceberg-python]

2024-02-05 Thread via GitHub
dependabot[bot] opened a new pull request, #371: URL: https://github.com/apache/iceberg-python/pull/371 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.34.32 to 1.34.35. Commits See full diff in https://github.com/youtype/mypy_boto3_builder/commits

Re: [PR] Build: Bump moto from 4.2.13 to 5.0.0 [iceberg-python]

2024-02-05 Thread via GitHub
dependabot[bot] commented on PR #321: URL: https://github.com/apache/iceberg-python/pull/321#issuecomment-1928203293 Superseded by #370. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Build: Bump moto from 4.2.13 to 5.0.0 [iceberg-python]

2024-02-05 Thread via GitHub
dependabot[bot] closed pull request #321: Build: Bump moto from 4.2.13 to 5.0.0 URL: https://github.com/apache/iceberg-python/pull/321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[PR] Build: Bump moto from 4.2.13 to 5.0.1 [iceberg-python]

2024-02-05 Thread via GitHub
dependabot[bot] opened a new pull request, #370: URL: https://github.com/apache/iceberg-python/pull/370 Bumps [moto](https://github.com/getmoto/moto) from 4.2.13 to 5.0.1. Changelog Sourced from https://github.com/getmoto/moto/blob/master/CHANGELOG.md";>moto's changelog. 5.0

[PR] Build: Bump mkdocs-material from 9.5.6 to 9.5.7 [iceberg-python]

2024-02-05 Thread via GitHub
dependabot[bot] opened a new pull request, #369: URL: https://github.com/apache/iceberg-python/pull/369 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.6 to 9.5.7. Release notes Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mkdo

Re: [PR] Spark: Move the Writer to a visitor [iceberg]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on code in PR #9440: URL: https://github.com/apache/iceberg/pull/9440#discussion_r1478907380 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/SparkParquetWriters.java: ## @@ -136,46 +135,126 @@ private ParquetValueWriter newOption(Type f

Re: [PR] Add Daft examples and code into PyIceberg docs and Table [iceberg-python]

2024-02-05 Thread via GitHub
jaychia commented on PR #355: URL: https://github.com/apache/iceberg-python/pull/355#issuecomment-1928068519 > Should we also have some sanity checks, for example: > > https://github.com/apache/iceberg-python/blob/a4856bc2eadf90ac85dec96d4502ca3517bb1bb5/tests/integration/test_reads.p

Re: [I] Cannot load a binary column of many rows via the `to_arrow` method. [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on issue #344: URL: https://github.com/apache/iceberg-python/issues/344#issuecomment-1928067916 @castedice Also feel free to open up a draft if you want some early feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] `pyiceberg.io.pyarrow.write_file` does not take into account compression settings [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on issue #345: URL: https://github.com/apache/iceberg-python/issues/345#issuecomment-1928067044 Fixed in https://github.com/apache/iceberg-python/pull/358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] `pyiceberg.io.pyarrow.write_file` does not take into account compression settings [iceberg-python]

2024-02-05 Thread via GitHub
Fokko closed issue #345: `pyiceberg.io.pyarrow.write_file` does not take into account compression settings URL: https://github.com/apache/iceberg-python/issues/345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] feat(catalog): add initial rest catalog impl [iceberg-go]

2024-02-05 Thread via GitHub
zeroshade commented on code in PR #58: URL: https://github.com/apache/iceberg-go/pull/58#discussion_r1478892511 ## catalog/catalog.go: ## @@ -47,19 +52,136 @@ func WithAwsConfig(cfg aws.Config) Option { } } +func WithCredential(cred string) Option { Review Comment:

Re: [I] Use latest Parquet version for writing [iceberg-python]

2024-02-05 Thread via GitHub
jonashaag commented on issue #359: URL: https://github.com/apache/iceberg-python/issues/359#issuecomment-1928060787 Shall we then just stick to the PyArrow default? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Distributed writes in the same iceberg transaction [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on issue #357: URL: https://github.com/apache/iceberg-python/issues/357#issuecomment-1928041180 Hey @rahij This is something that we're planning on supporting. I know that the folks at Daft are already working on this. Out of curiosity, how much data are we talking about, an

Re: [I] Use latest Parquet version for writing [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on issue #359: URL: https://github.com/apache/iceberg-python/issues/359#issuecomment-1928038141 Hey @jonashaag Thanks for raising this. I think I made a mistake here. I thought it was referring to the data-page-version, but that one is set to `1.0`. I think we should bump th

Re: [I] Support reading and writing snapshot properties [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on issue #367: URL: https://github.com/apache/iceberg-python/issues/367#issuecomment-1928025807 Hey @brianfromoregon, I agree. For writing, I left a comment in https://github.com/apache/iceberg-python/issues/368 For reading, I think we're missing some documentation since it

Re: [I] Implement Centralized Management of Table Properties [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on issue #365: URL: https://github.com/apache/iceberg-python/issues/365#issuecomment-1928022429 I think this is a great idea @HonahX. I try to avoid creating a lot of new files, since imports in Python are slow. What do you think of adding a class `TableProperties` to `table

Re: [I] Support setting a snapshot property in same commit as spark.sql [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on issue #368: URL: https://github.com/apache/iceberg-python/issues/368#issuecomment-1928020308 Thanks for raising this @brianfromoregon! I think it would be a great addition. We need to extend the `.append` and `.overwrite` API and allow passing in a map. And then it

Re: [PR] Support merge manifests on writes [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on code in PR #363: URL: https://github.com/apache/iceberg-python/pull/363#discussion_r1478849196 ## pyiceberg/table/__init__.py: ## @@ -2411,11 +2428,29 @@ def _fetch_existing_manifests() -> List[ManifestFile]: executor = ExecutorFactory.get_or_create

[PR] Add pagination to open api spec for listing of namespaces, tables, views [iceberg]

2024-02-05 Thread via GitHub
rahil-c opened a new pull request, #9660: URL: https://github.com/apache/iceberg/pull/9660 Dev List discussion thread around adding support for pagination in list namespaces, tables, and views: https://lists.apache.org/thread/lql05h02qtp8mgq74ovhb0ndd76ck4f3 Credit to @emkornfield f

Re: [PR] Improve error message in case of a mismatch [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on PR #352: URL: https://github.com/apache/iceberg-python/pull/352#issuecomment-1927962033 @HonahX Thanks 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Retry with new Access Token on 419 response [iceberg-python]

2024-02-05 Thread via GitHub
Fokko commented on code in PR #340: URL: https://github.com/apache/iceberg-python/pull/340#discussion_r1478822071 ## pyiceberg/catalog/rest.py: ## @@ -118,6 +119,19 @@ class Endpoints: NAMESPACE_SEPARATOR = b"\x1F".decode(UTF8) +def _retry_hook(retry_state: RetryCallState)

[I] Use self. default_spec_id in else clause [iceberg-rust]

2024-02-05 Thread via GitHub
odysa opened a new issue, #189: URL: https://github.com/apache/iceberg-rust/issues/189 Should it be `self.partition_spec_by_id(self.default_spec_id)` in the else clause? https://github.com/apache/iceberg-rust/blob/09765db611a65a21b88e839d781780c75924e560/crates/iceberg/src/spec/table_met

Re: [PR] Spark 3.4: Support executor cache locality [iceberg]

2024-02-05 Thread via GitHub
aokolnychyi commented on PR #9658: URL: https://github.com/apache/iceberg/pull/9658#issuecomment-1927844378 @nastra @amogh-jahagirdar @szehon-ho @ajantha-bhat, could you take a look at this cherry-pick? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] partitioned write support [iceberg-python]

2024-02-05 Thread via GitHub
jqin61 commented on code in PR #353: URL: https://github.com/apache/iceberg-python/pull/353#discussion_r1478509127 ## pyiceberg/manifest.py: ## @@ -308,6 +308,7 @@ def data_file_with_partition(partition_type: StructType, format_version: Literal field_id=field.field

Re: [PR] Core: Fix retry behavior for Jdbc Client [iceberg]

2024-02-05 Thread via GitHub
cccs-br commented on PR #7561: URL: https://github.com/apache/iceberg/pull/7561#issuecomment-1927761923 Since the JdbcCatalog provides the means to specify your own JdbcClientPool by providing a [client pool builder](https://github.com/apache/iceberg/blob/c4cb0fb9993d6743d81a232def6801ea7db

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-05 Thread via GitHub
aokolnychyi merged PR #9563: URL: https://github.com/apache/iceberg/pull/9563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-05 Thread via GitHub
aokolnychyi commented on PR #9563: URL: https://github.com/apache/iceberg/pull/9563#issuecomment-1927759989 Thanks, @advancedxy @rdblue! I am going to test this with our RC on a cluster. I can't cover everything locally. I tested the initial prototype on a cluster and it worked well. --

Re: [PR] Iceberg site fixes [iceberg]

2024-02-05 Thread via GitHub
amogh-jahagirdar commented on PR #9642: URL: https://github.com/apache/iceberg/pull/9642#issuecomment-1927652917 Thanks @bitsondatadev for fixing these, and @Fokko @nastra for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Iceberg site fixes [iceberg]

2024-02-05 Thread via GitHub
amogh-jahagirdar merged PR #9642: URL: https://github.com/apache/iceberg/pull/9642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Add Daft examples and code into PyIceberg docs and Table [iceberg-python]

2024-02-05 Thread via GitHub
jaychia commented on code in PR #355: URL: https://github.com/apache/iceberg-python/pull/355#discussion_r1478653195 ## pyproject.toml: ## @@ -105,6 +105,7 @@ pyarrow = ["pyarrow"] pandas = ["pandas", "pyarrow"] duckdb = ["duckdb", "pyarrow"] ray = ["ray", "pyarrow", "pandas"]

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-05 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1478650850 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -25,31 +25,36 @@ import java.util.Map; import java.util.Properties; import java.util.Set; +import

Re: [PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2024-02-05 Thread via GitHub
jbonofre commented on PR #9008: URL: https://github.com/apache/iceberg/pull/9008#issuecomment-1927543271 I wonder why not using something similar to what we have for `decimal` with `(P,S)` for `timestamp` ? If we want to have "open precision" for `timestamp` we could imagine to have seco

Re: [PR] Core: only trim slash when warehouse location is not root path [iceberg]

2024-02-05 Thread via GitHub
abmo-x commented on code in PR #9619: URL: https://github.com/apache/iceberg/pull/9619#discussion_r1478622304 ## core/src/test/java/org/apache/iceberg/util/TestLocationUtil.java: ## @@ -46,6 +46,21 @@ public void testStripTrailingSlash() { assertThat(LocationUtil.stripTrail

Re: [I] Consolidate FileIO [iceberg-python]

2024-02-05 Thread via GitHub
kevinjqliu commented on issue #310: URL: https://github.com/apache/iceberg-python/issues/310#issuecomment-1927505635 I see. I was under the assumption that PyArrow could completely replace fsspec. But it seems like there are a few use cases where we would prefer fsspec. > fsspec is

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-05 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1478542258 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -81,85 +90,103 @@ final class JdbcUtil { + TABLE_NAME + ")" + ")";

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-05 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1478540570 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcUtil.java: ## @@ -18,14 +18,116 @@ */ package org.apache.iceberg.jdbc; +import static org.assertj.core.api.A

Re: [PR] AWS: Add S3 Access Grants Documentation [iceberg]

2024-02-05 Thread via GitHub
jackye1995 merged PR #9590: URL: https://github.com/apache/iceberg/pull/9590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] AWS: Add S3 Access Grants Documentation [iceberg]

2024-02-05 Thread via GitHub
jackye1995 commented on PR #9590: URL: https://github.com/apache/iceberg/pull/9590#issuecomment-1927400479 Thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Flink: backport #9547 to 1.17 and 1.16 for Adds the ability to read from a branch on the Flink Iceberg Source [iceberg]

2024-02-05 Thread via GitHub
pvary merged PR #9627: URL: https://github.com/apache/iceberg/pull/9627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] partitioned write support [iceberg-python]

2024-02-05 Thread via GitHub
jqin61 commented on code in PR #353: URL: https://github.com/apache/iceberg-python/pull/353#discussion_r1478509127 ## pyiceberg/manifest.py: ## @@ -308,6 +308,7 @@ def data_file_with_partition(partition_type: StructType, format_version: Literal field_id=field.field

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-05 Thread via GitHub
nastra commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1478448616 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java: ## @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [PR] Docs: Add newline so that subsection is correctly rendered [iceberg]

2024-02-05 Thread via GitHub
bitsondatadev commented on PR #9656: URL: https://github.com/apache/iceberg/pull/9656#issuecomment-1927265427 Agreed, I'll make a pass at this some time this week! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] docs: Document Parquet write options [iceberg-python]

2024-02-05 Thread via GitHub
Fokko merged PR #364: URL: https://github.com/apache/iceberg-python/pull/364 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

  1   2   >