Re: [PR] Pushed filters to Parquet file on best effort basis in Vectorized Reader [iceberg]

2024-01-17 Thread via GitHub
gvramana commented on PR #9479: URL: https://github.com/apache/iceberg/pull/9479#issuecomment-1897971862 @ajantha-bhat can you check and help on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-17 Thread via GitHub
ajantha-bhat commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1457044287 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -157,8 +159,10 @@ private void initializeCatalogTables() throws InterruptedException, SQLExc

Re: [PR] Core: Close the MetricsReporter when the Catalog is closed. [iceberg]

2024-01-17 Thread via GitHub
nastra merged PR #9353: URL: https://github.com/apache/iceberg/pull/9353 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-17 Thread via GitHub
ajantha-bhat commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1897944857 Tagging @amogh-jahagirdar, as I feel this PR can help in reducing the changes at Trino side for REST catalog view support (https://github.com/trinodb/trino/pull/19818#discussion_r14

Re: [PR] Core: Close the MetricsReporter when the Catalog is closed. [iceberg]

2024-01-17 Thread via GitHub
nastra commented on code in PR #9353: URL: https://github.com/apache/iceberg/pull/9353#discussion_r1457034301 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -482,7 +489,13 @@ public boolean removeProperties(Namespace namespace, Set properties) @Overr

Re: [PR] feat: init file writer interface [iceberg-rust]

2024-01-17 Thread via GitHub
Xuanwo commented on code in PR #168: URL: https://github.com/apache/iceberg-rust/pull/168#discussion_r1456983545 ## crates/iceberg/src/writer/file_writer/mod.rs: ## @@ -0,0 +1,38 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Spark: Ensure that partition stats files are considered for GC procedures [iceberg]

2024-01-17 Thread via GitHub
ajantha-bhat commented on PR #9284: URL: https://github.com/apache/iceberg/pull/9284#issuecomment-1897865498 > Oops, I probably forgot to post the comment. I was thinking about statisticsFileLocations to match the naming of other methods like metadataFileLocations (Files -> File). It is opt

Re: [PR] Core: Fix setting updated parquet compression property [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #9503: URL: https://github.com/apache/iceberg/pull/9503#discussion_r1456949302 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -90,10 +90,19 @@ private static Map unreservedProperties(Map rawP private static Map per

Re: [PR] Core: Fix setting updated parquet compression property [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #9503: URL: https://github.com/apache/iceberg/pull/9503#discussion_r1456949302 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -90,10 +90,19 @@ private static Map unreservedProperties(Map rawP private static Map per

Re: [PR] Add small test on concurrent changes [iceberg-python]

2024-01-17 Thread via GitHub
HonahX merged PR #273: URL: https://github.com/apache/iceberg-python/pull/273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[PR] Core: Fix setting updated parquet compression property [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar opened a new pull request, #9503: URL: https://github.com/apache/iceberg/pull/9503 Fixes #9490 . As part of 1.4.0 we made the default compression for Parquet to zstd. However, this property is being set for every new table regardless of the underlying file formats (e

Re: [PR] Apply Name mapping, new_schema_for_table [iceberg-python]

2024-01-17 Thread via GitHub
HonahX commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1456947775 ## pyiceberg/io/pyarrow.py: ## @@ -733,42 +854,178 @@ def _get_field_id(field: pa.Field) -> Optional[int]: ) -class _ConvertToIceberg(PyArrowSchemaVisitor[

Re: [PR] [Bug Fix] TruncateTransform for falsey values [iceberg-python]

2024-01-17 Thread via GitHub
HonahX merged PR #276: URL: https://github.com/apache/iceberg-python/pull/276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Apply Name mapping, new_schema_for_table [iceberg-python]

2024-01-17 Thread via GitHub
HonahX commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1456922888 ## pyiceberg/io/pyarrow.py: ## @@ -733,42 +854,178 @@ def _get_field_id(field: pa.Field) -> Optional[int]: ) -class _ConvertToIceberg(PyArrowSchemaVisitor[

Re: [PR] Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable [iceberg]

2024-01-17 Thread via GitHub
hsiang-c commented on PR #9335: URL: https://github.com/apache/iceberg/pull/9335#issuecomment-1897829796 Hello @RussellSpitzer, @szehon-ho and I had a discussion about adopting the `reachableManifests` method. If I understand #8856 correctly, once we associate `as_of_snapshot`

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2024-01-17 Thread via GitHub
stevenzwu commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1456707875 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Founda

Re: [I] `write.parquet.compression-codec` being set even if file-format is not parquet [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar commented on issue #9490: URL: https://github.com/apache/iceberg/issues/9490#issuecomment-1897791241 Yeah looks like we should conditionally persist those properties based on the format properties. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1897787045 I'll also take a look at this tomorrow morning as well, thanks @advancedxy ! -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] feat: init file writer interface [iceberg-rust]

2024-01-17 Thread via GitHub
ZENOTME commented on PR #168: URL: https://github.com/apache/iceberg-rust/pull/168#issuecomment-1897739115 cc @liurenjie1024 @Xuanwo @Fokko @JanKaul -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] feat: init file writer interface [iceberg-rust]

2024-01-17 Thread via GitHub
ZENOTME opened a new pull request, #168: URL: https://github.com/apache/iceberg-rust/pull/168 related issue: https://github.com/apache/iceberg-rust/issues/34 It's the part of #135. The file writer interface is about the writer for data file format, e.g. parquet, orc. -- This is

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-17 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1456811454 ## format/spec.md: ## @@ -1119,21 +1156,30 @@ Partition specs are serialized as a JSON object with the following fields: Each partition field in the fields list

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-17 Thread via GitHub
advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1456810557 ## format/spec.md: ## @@ -1060,6 +1076,27 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | **`float`**| `hashLong(doub

Re: [PR] Spark: Ensure that partition stats files are considered for GC procedures [iceberg]

2024-01-17 Thread via GitHub
aokolnychyi commented on PR #9284: URL: https://github.com/apache/iceberg/pull/9284#issuecomment-1897734810 Oops, I probably forgot to post the comment. I was thinking about `statisticsFileLocations` to match the naming of other methods like `metadataFileLocations` (`Files` -> `File`). It i

Re: [PR] Core: Close the MetricsReporter when the Catalog is closed. [iceberg]

2024-01-17 Thread via GitHub
huyuanfeng2018 commented on code in PR #9353: URL: https://github.com/apache/iceberg/pull/9353#discussion_r1456729227 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -482,7 +489,13 @@ public boolean removeProperties(Namespace namespace, Set properties)

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2024-01-17 Thread via GitHub
stevenzwu commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1456707875 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2024-01-17 Thread via GitHub
stevenzwu commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1456529072 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Add formatting for toml files [iceberg-rust]

2024-01-17 Thread via GitHub
Tyler-Sch commented on code in PR #167: URL: https://github.com/apache/iceberg-rust/pull/167#discussion_r1456660556 ## Makefile: ## @@ -32,7 +32,11 @@ cargo-sort: cargo install cargo-sort cargo sort -c -w -check: check-fmt check-clippy cargo-sort +fmt-toml: +

Re: [I] ORC does not support Iceberg generics [iceberg]

2024-01-17 Thread via GitHub
github-actions[bot] commented on issue #127: URL: https://github.com/apache/iceberg/issues/127#issuecomment-1897528977 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Support plaintext Data (CSV, TSV, etc.) in Iceberg Tables [iceberg]

2024-01-17 Thread via GitHub
github-actions[bot] commented on issue #118: URL: https://github.com/apache/iceberg/issues/118#issuecomment-1897528898 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] Spark based functional test-cases [iceberg]

2024-01-17 Thread via GitHub
github-actions[bot] commented on issue #116: URL: https://github.com/apache/iceberg/issues/116#issuecomment-1897528804 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. T

Re: [I] ORC does not use InputFile and OutputFile abstractions [iceberg]

2024-01-17 Thread via GitHub
github-actions[bot] commented on issue #96: URL: https://github.com/apache/iceberg/issues/96#issuecomment-1897528719 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To

Re: [I] Encryption KeyManager implementation that is backed by KMS [iceberg]

2024-01-17 Thread via GitHub
github-actions[bot] commented on issue #81: URL: https://github.com/apache/iceberg/issues/81#issuecomment-1897528650 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To

Re: [I] Additional Metrics and Statistics [iceberg]

2024-01-17 Thread via GitHub
github-actions[bot] commented on issue #76: URL: https://github.com/apache/iceberg/issues/76#issuecomment-1897528610 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To

Re: [I] Support cryptographic integrity [iceberg]

2024-01-17 Thread via GitHub
github-actions[bot] commented on issue #44: URL: https://github.com/apache/iceberg/issues/44#issuecomment-1897528582 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To

Re: [I] Add an API to maintain external schema mappings [iceberg]

2024-01-17 Thread via GitHub
github-actions[bot] commented on issue #41: URL: https://github.com/apache/iceberg/issues/41#issuecomment-1897528557 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To

[PR] [Reference PR] [API + Avro] Add default value APIs and Avro implementation [iceberg]

2024-01-17 Thread via GitHub
wmoustafa opened a new pull request, #9502: URL: https://github.com/apache/iceberg/pull/9502 This PR adds default value APIs according to the default value spec, and implements it in the `GenericAvroReader` case. It uses a `ConstantReader` to fill in the default values of fields from their

Re: [PR] Fix community link [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar merged PR #9500: URL: https://github.com/apache/iceberg/pull/9500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1456586630 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala: ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache S

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2024-01-17 Thread via GitHub
stevenzwu commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1456527386 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: Added error handling and default logic for Flink version detection [iceberg]

2024-01-17 Thread via GitHub
gjacoby126 commented on code in PR #9452: URL: https://github.com/apache/iceberg/pull/9452#discussion_r1456547407 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/util/FlinkPackage.java: ## @@ -19,15 +19,31 @@ package org.apache.iceberg.flink.util; import org.apac

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1456546379 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache S

[PR] [Bug Fix] TruncateTransform for falsey values [iceberg-python]

2024-01-17 Thread via GitHub
syun64 opened a new pull request, #276: URL: https://github.com/apache/iceberg-python/pull/276 Currently, any falsey values will return None for their **TruncateTransform**. This results in **fill_parquet_file_metadata** throwing an exception whenever there is a falsey lower bound as the mi

Re: [PR] Add 1.4.3 docs [iceberg]

2024-01-17 Thread via GitHub
bitsondatadev commented on code in PR #9499: URL: https://github.com/apache/iceberg/pull/9499#discussion_r1456528989 ## 1.4.3/mkdocs.yml: ## @@ -0,0 +1,70 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE

[PR] Fix community link [iceberg]

2024-01-17 Thread via GitHub
bitsondatadev opened a new pull request, #9500: URL: https://github.com/apache/iceberg/pull/9500 The community link works only on top-level site links. Link this to the static site for now, eventually we need to consider a site-wide variable solution but that's not important for now. --

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2024-01-17 Thread via GitHub
stevenzwu commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1456529072 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2024-01-17 Thread via GitHub
stevenzwu commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1456527386 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2024-01-17 Thread via GitHub
stevenzwu commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1456527386 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Add 1.4.3 docs [iceberg]

2024-01-17 Thread via GitHub
dramaticlly commented on code in PR #9499: URL: https://github.com/apache/iceberg/pull/9499#discussion_r1456526671 ## 1.4.3/mkdocs.yml: ## @@ -0,0 +1,70 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE fi

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on PR #41: URL: https://github.com/apache/iceberg-python/pull/41#issuecomment-1896944638 @Fokko, this works great and I don't see any blockers so I've approved it. I think there are a few things to consider in terms of how we want to do this moving forward (whether to

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456523045 ## mkdocs/docs/api.md: ## @@ -175,6 +175,104 @@ static_table = StaticTable.from_metadata( The static-table is considered read-only. +## Write support + +With PyI

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456502771 ## pyiceberg/table/__init__.py: ## @@ -1935,3 +2043,184 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -

[PR] Add 1.4.3 docs [iceberg]

2024-01-17 Thread via GitHub
bitsondatadev opened a new pull request, #9499: URL: https://github.com/apache/iceberg/pull/9499 Add 1.4.3 docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456498171 ## pyiceberg/table/__init__.py: ## @@ -1935,3 +2043,184 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456495179 ## pyiceberg/table/__init__.py: ## @@ -1935,3 +2043,184 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456493214 ## pyiceberg/table/__init__.py: ## @@ -1935,3 +2043,184 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456491428 ## pyiceberg/table/__init__.py: ## @@ -1935,3 +2043,184 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456489014 ## pyiceberg/table/__init__.py: ## @@ -1935,3 +2043,184 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456487418 ## pyiceberg/table/__init__.py: ## @@ -1935,3 +2043,184 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456486632 ## pyiceberg/table/__init__.py: ## @@ -1935,3 +2043,184 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -

Re: [PR] Core: Fix lock acquisition logic in HadoopTableOperations rename [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar merged PR #9498: URL: https://github.com/apache/iceberg/pull/9498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [I] [HadoopCatalog]: [HadoopTableOperations]: Commit flow, renameToFinal does not actually check if lock acquired [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar closed issue #9485: [HadoopCatalog]: [HadoopTableOperations]: Commit flow, renameToFinal does not actually check if lock acquired URL: https://github.com/apache/iceberg/issues/9485 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456478299 ## pyiceberg/table/__init__.py: ## @@ -831,6 +887,46 @@ def history(self) -> List[SnapshotLogEntry]: def update_schema(self, allow_incompatible_changes: bool = F

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456469457 ## pyiceberg/table/__init__.py: ## @@ -856,6 +909,61 @@ def history(self) -> List[SnapshotLogEntry]: def update_schema(self, allow_incompatible_changes: bool = F

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456467981 ## pyiceberg/table/__init__.py: ## @@ -856,6 +909,61 @@ def history(self) -> List[SnapshotLogEntry]: def update_schema(self, allow_incompatible_changes: bool = F

Re: [PR] Apply Name mapping, new_schema_for_table [iceberg-python]

2024-01-17 Thread via GitHub
syun64 commented on code in PR #219: URL: https://github.com/apache/iceberg-python/pull/219#discussion_r1456437583 ## pyiceberg/io/pyarrow.py: ## @@ -733,42 +854,178 @@ def _get_field_id(field: pa.Field) -> Optional[int]: ) -class _ConvertToIceberg(PyArrowSchemaVisitor[

Re: [I] Purge support for Iceberg view [iceberg]

2024-01-17 Thread via GitHub
nk1506 commented on issue #9433: URL: https://github.com/apache/iceberg/issues/9433#issuecomment-1896529369 With purge enablement similar like [dropTable](https://github.com/apache/iceberg/blob/66b1aa662761606d4d68d99371c62505e7ac2f1e/api/src/main/java/org/apache/iceberg/catalog/Catalog.java

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456327592 ## mkdocs/docs/api.md: ## @@ -175,6 +175,104 @@ static_table = StaticTable.from_metadata( The static-table is considered read-only. +## Write support + +With PyIc

Re: [PR] Write support [iceberg-python]

2024-01-17 Thread via GitHub
rdblue commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1456300895 ## mkdocs/docs/api.md: ## @@ -175,6 +175,104 @@ static_table = StaticTable.from_metadata( The static-table is considered read-only. +## Write support + +With PyI

Re: [PR] API, Core, Spark: Add fastForwardOrCreate API and integrate that with Spark fast forward procedure [iceberg]

2024-01-17 Thread via GitHub
rdblue commented on PR #9196: URL: https://github.com/apache/iceberg/pull/9196#issuecomment-1896389067 @amogh-jahagirdar, I think I would prefer the second alternative, to change the behavior of fast-forward. I doubt that anyone relies on fast-forward _not_ creating a branch and failing ins

Re: [PR] Core: Fix lock acquisition logic in HadoopTableOperations rename [iceberg]

2024-01-17 Thread via GitHub
N-o-Z commented on PR #9498: URL: https://github.com/apache/iceberg/pull/9498#issuecomment-1896383730 @amogh-jahagirdar Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Two-level parquet read EOF error: org.apache.parquet.io.ParquetDecodingException: Can't read value in column [a, array] repeated int32 array = 2 at value 4 out of 4 in current page. repetition

2024-01-17 Thread via GitHub
gaoshihang commented on issue #9497: URL: https://github.com/apache/iceberg/issues/9497#issuecomment-1896369837 And here is the parquet file we used to add_files. (need to change the .log to .parquet) [user_error_parquet.log](https://github.com/apache/iceberg/files/13967114/user_error_

Re: [I] Two-level parquet read EOF error: org.apache.parquet.io.ParquetDecodingException: Can't read value in column [a, array] repeated int32 array = 2 at value 4 out of 4 in current page. repetition

2024-01-17 Thread via GitHub
gaoshihang commented on issue #9497: URL: https://github.com/apache/iceberg/issues/9497#issuecomment-1896362482 And here is the iceberg schema [v8.metadata.json](https://github.com/apache/iceberg/files/13967089/v8.metadata.json) -- This is an automated message from the Apache Git S

Re: [I] Cannot write nullable values to non-null column in the Iceberg Table [iceberg]

2024-01-17 Thread via GitHub
abharath9 commented on issue #9488: URL: https://github.com/apache/iceberg/issues/9488#issuecomment-1896331390 @nastra Yes i am aware of that. How do i write optional fields data to the mandatory fields data. It is mentioned in this issue that it is possible by setting "spark.sql.iceberg.ch

Re: [PR] Core: Fix lock acquisition logic in HadoopTableOperations rename [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar commented on PR #9498: URL: https://github.com/apache/iceberg/pull/9498#issuecomment-1896295071 looks like spotless checks are failing: if you could run ``` ./gradlew spotlessApply ``` before pushing your next changes that would fix it! -- This

Re: [PR] Core: Hadoop: Fix: HadoopTableOperations renameToFinal [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #9498: URL: https://github.com/apache/iceberg/pull/9498#discussion_r1456088754 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -360,7 +360,10 @@ int findVersion() { */ private void renameToFinal(Fi

Re: [PR] Core: Hadoop: Fix: HadoopTableOperations renameToFinal [iceberg]

2024-01-17 Thread via GitHub
amogh-jahagirdar commented on code in PR #9498: URL: https://github.com/apache/iceberg/pull/9498#discussion_r1456088754 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -360,7 +360,10 @@ int findVersion() { */ private void renameToFinal(Fi

Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-01-17 Thread via GitHub
stevenzwu commented on PR #9464: URL: https://github.com/apache/iceberg/pull/9464#issuecomment-1896232286 > If I understand correctly, the FileScanTask json will contain the Schema. The Schema has a doc field for comments. Do we have restrictions defined for the doc field? @pvary yo

Re: [I] Purge support for Iceberg view [iceberg]

2024-01-17 Thread via GitHub
rdblue commented on issue #9433: URL: https://github.com/apache/iceberg/issues/9433#issuecomment-1896190645 What is the proposed behavior for a purge operation? How does this apply to views? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Core: Hadoop: Fix: HadoopTableOperations renameToFinal [iceberg]

2024-01-17 Thread via GitHub
N-o-Z commented on PR #9498: URL: https://github.com/apache/iceberg/pull/9498#issuecomment-1896152644 @amogh-jahagirdar, FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Core: Hadoop: Fix: HadoopTableOperations renameToFinal [iceberg]

2024-01-17 Thread via GitHub
N-o-Z opened a new pull request, #9498: URL: https://github.com/apache/iceberg/pull/9498 Closes #9485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [PR] Add Hive integration tests [iceberg-python]

2024-01-17 Thread via GitHub
Fokko merged PR #207: URL: https://github.com/apache/iceberg-python/pull/207 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Add Hive integration tests [iceberg-python]

2024-01-17 Thread via GitHub
Fokko commented on PR #207: URL: https://github.com/apache/iceberg-python/pull/207#issuecomment-1896052914 Thanks @HonahX for the review 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] Two-level parquet read EOF error: org.apache.parquet.io.ParquetDecodingException: Can't read value in column [a, array] repeated int32 array = 2 at value 4 out of 4 in current page. repetition lev

2024-01-17 Thread via GitHub
gaoshihang opened a new issue, #9497: URL: https://github.com/apache/iceberg/issues/9497 ### Apache Iceberg version 1.4.3 (latest release) ### Query engine Spark ### Please describe the bug 🐞 We have a two-level parquet list, the schema like below: ![ima

[I] Hive Catalog: Implement `_commit_table` [iceberg-python]

2024-01-17 Thread via GitHub
Fokko opened a new issue, #275: URL: https://github.com/apache/iceberg-python/issues/275 ### Feature Request / Improvement Probably very similar to the Glue/Sql one :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Spark 3.5: Spark action to compute the partition stats [iceberg]

2024-01-17 Thread via GitHub
ajantha-bhat commented on code in PR #9437: URL: https://github.com/apache/iceberg/pull/9437#discussion_r1455766754 ## api/src/main/java/org/apache/iceberg/actions/ComputePartitionStats.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Spark 3.5: Spark action to compute the partition stats [iceberg]

2024-01-17 Thread via GitHub
ajantha-bhat commented on code in PR #9437: URL: https://github.com/apache/iceberg/pull/9437#discussion_r1455763573 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java: ## @@ -150,6 +154,21 @@ protected Dataset contentFileDS(Table table, Set

Re: [PR] Add Hive integration tests [iceberg-python]

2024-01-17 Thread via GitHub
Fokko commented on code in PR #207: URL: https://github.com/apache/iceberg-python/pull/207#discussion_r1455755578 ## tests/integration/test_hive.py: ## @@ -0,0 +1,409 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] Spark 3.5: Spark action to compute the partition stats [iceberg]

2024-01-17 Thread via GitHub
ajantha-bhat commented on code in PR #9437: URL: https://github.com/apache/iceberg/pull/9437#discussion_r1454359159 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java: ## @@ -150,6 +154,21 @@ protected Dataset contentFileDS(Table table, Set

Re: [PR] Build: Upgrade to Apache RAT 0.16, scanning hidden directories and adding missing ASF header [iceberg]

2024-01-17 Thread via GitHub
ajantha-bhat commented on code in PR #9495: URL: https://github.com/apache/iceberg/pull/9495#discussion_r1455712834 ## dev/check-license: ## @@ -68,7 +68,7 @@ mkdir -p "$FWDIR"/lib } mkdir -p build -$java_cmd -jar "$rat_jar" -E "$FWDIR"/dev/.rat-excludes -d "$FWDIR" > build

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-17 Thread via GitHub
nastra commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1455627217 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ViewCheck.scala: ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] Set `ghp_path` to `/` [iceberg]

2024-01-17 Thread via GitHub
Fokko merged PR #9493: URL: https://github.com/apache/iceberg/pull/9493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Docs: Fix typo in tag reading example [iceberg]

2024-01-17 Thread via GitHub
nastra merged PR #9496: URL: https://github.com/apache/iceberg/pull/9496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[PR] Add small test on duplicate changes [iceberg-python]

2024-01-17 Thread via GitHub
Fokko opened a new pull request, #273: URL: https://github.com/apache/iceberg-python/pull/273 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

[PR] Docs: Fix typo in tag reading example [iceberg]

2024-01-17 Thread via GitHub
pvary opened a new pull request, #9496: URL: https://github.com/apache/iceberg/pull/9496 Small fix in the docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Build: Upgrade to Apache RAT 0.16, scanning hidden directories and adding missing ASF header [iceberg]

2024-01-17 Thread via GitHub
jbonofre commented on PR #9495: URL: https://github.com/apache/iceberg/pull/9495#issuecomment-1895818508 @Fokko can you please take a look ? Thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Build: Upgrade to Apache RAT 0.16, scanning hidden directories and adding missing ASF header [iceberg]

2024-01-17 Thread via GitHub
jbonofre opened a new pull request, #9495: URL: https://github.com/apache/iceberg/pull/9495 This PR does: - upgrade to Apache RAT 0.16 - add `--scan-hidden-directories` option - add ASF header where missing - add new excluded file from RAT check -- This is an automated message

[I] Build/Release: Upgrade to Apache RAT 0.16 and scan hidden directories [iceberg]

2024-01-17 Thread via GitHub
jbonofre opened a new issue, #9494: URL: https://github.com/apache/iceberg/issues/9494 ### Feature Request / Improvement As identified on a previous Iceberg release, apache-rat 0.15 doesn't scan hidden directories. It's not good as the hidden directories are part of the released Iceb

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-17 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1455474276 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -139,7 +134,22 @@ final class JdbcUtil { + " LIKE ? ESCAPE '\\' " + " ) "

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-17 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1455473193 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -53,21 +58,19 @@ final class JdbcUtil { + " WHERE " + CATALOG_NAME

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-17 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1455472169 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -157,8 +159,10 @@ private void initializeCatalogTables() throws InterruptedException, SQLExcepti

  1   2   >