Re: [PR] Spark: Fix issue when partitioning by UUID [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8250: URL: https://github.com/apache/iceberg/pull/8250#discussion_r1361595378 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java: ## @@ -40,9 +43,12 @@ class InternalRowWrapper implements StructLike { priv

Re: [I] [Spark] Cannot append to Glue table - StorageDescriptor#InputFormat cannot be null for table [iceberg]

2023-10-16 Thread via GitHub
AlbertoSoto25 commented on issue #5565: URL: https://github.com/apache/iceberg/issues/5565#issuecomment-1765768056 > Hello again @singhpk234 , > > I was able to fix the issue by rewriting all the code based on official AWS docs, append function started to working again. I was not able

Re: [PR] Deliver key metadata to parquet encryption [iceberg]

2023-10-16 Thread via GitHub
ggershinsky commented on code in PR #6762: URL: https://github.com/apache/iceberg/pull/6762#discussion_r1361569389 ## core/src/main/java/org/apache/iceberg/encryption/BaseEncryptedOutputFile.java: ## @@ -24,10 +24,19 @@ class BaseEncryptedOutputFile implements EncryptedOutputFi

[PR] add Wayang / DataBloom to vendors supporting Iceberg [iceberg-docs]

2023-10-16 Thread via GitHub
2pk03 opened a new pull request, #283: URL: https://github.com/apache/iceberg-docs/pull/283 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[PR] Doc: Fix "Verifying Checksums" script in verify-release.md [iceberg-python]

2023-10-16 Thread via GitHub
HonahX opened a new pull request, #82: URL: https://github.com/apache/iceberg-python/pull/82 This PR removes the additional `.asc` in the script for verifying checksums -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] fast_forward does not work for the first commit in Spark [iceberg]

2023-10-16 Thread via GitHub
ajantha-bhat commented on issue #8849: URL: https://github.com/apache/iceberg/issues/8849#issuecomment-1765668208 > long currentRef = table.currentSnapshot().snapshotId(); Since the `audit-branch` is empty we have `table.currentSnapshot()` as null and it leads to NPE, this should be f

Re: [I] Flink: revert the automatic application of custom partitioner for bucketing column with hash distribution [iceberg]

2023-10-16 Thread via GitHub
stevenzwu commented on issue #8847: URL: https://github.com/apache/iceberg/issues/8847#issuecomment-1765662039 > We should be careful about default behavior changes agree. This is on me with the wrong assumption that bucketing column is the only thing need to be distributed. -- Th

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
lirui-apache commented on PR #8844: URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1765657323 Thanks @zhangminglei for working on this. I guess we also need to update the doc in `CatalogProperties` to indicate the change. -- This is an automated message from the Apache Git

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
lirui-apache commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1361499834 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestCachingCatalogExpirationAfterWrite.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software

Re: [I] v1 table data file spec id is None [iceberg-python]

2023-10-16 Thread via GitHub
puchengy closed issue #46: v1 table data file spec id is None URL: https://github.com/apache/iceberg-python/issues/46 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
zhangminglei commented on PR #8844: URL: https://github.com/apache/iceberg/pull/8844#issuecomment-1765624174 @nastra Thanks for your review, code updated done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] How to read data in the order in which files are commited? [iceberg]

2023-10-16 Thread via GitHub
Zhanxiao-Ma closed issue #8802: How to read data in the order in which files are commited? URL: https://github.com/apache/iceberg/issues/8802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
liurenjie1024 commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361435611 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361422612 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-81

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361422612 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-81

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361422612 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-81

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361422612 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-81

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361422612 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-81

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361425336 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-81

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361422612 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-81

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-10-16 Thread via GitHub
chenwyi2 commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1765530967 In normal conditition, only the data of current minute will be written. However, if the data is delayed, for example, at 11:50, the data has not been written until 11:55, then at 11:56

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
liurenjie1024 commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1361411731 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe

Re: [I] DeleteOrphanFiles or ExpireSnapshots outofmemory [iceberg]

2023-10-16 Thread via GitHub
RLashofRegas commented on issue #3703: URL: https://github.com/apache/iceberg/issues/3703#issuecomment-1765512670 @dchristle What was your solution that fixed the `Cannot broadcast the table that is larger than 8GB` issue? I just ran into the same problem on expire snapshots. I am using `ma

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-16 Thread via GitHub
fengjiajie commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1765499071 > > Some systems like older versions of Impala do not annotate String type as UTF-8 columns in Parquet files. When importing these Parquet files into Iceberg, reading these Binary colu

Re: [PR] Support timestamp type in partition string when importing files [iceberg]

2023-10-16 Thread via GitHub
camper42 commented on PR #7291: URL: https://github.com/apache/iceberg/pull/7291#issuecomment-1765491318 Any progress on this PR? We're having the same problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[I] [BUG] to_arrow conversion does not support iceberg table column name containing slash [iceberg-python]

2023-10-16 Thread via GitHub
puchengy opened a new issue, #81: URL: https://github.com/apache/iceberg-python/issues/81 ### Apache Iceberg version main (development) ### Please describe the bug 🐞 PR to reproduce https://github.com/puchengy/iceberg-python/commit/68081491641b0d7bada13a18b98ded3e08e127a

Re: [I] Enable Partition Transforms and/or Spark SQL In Spark `rewrite_data_files` Procedure [iceberg]

2023-10-16 Thread via GitHub
RussellSpitzer commented on issue #8846: URL: https://github.com/apache/iceberg/issues/8846#issuecomment-1765475158 This is supported in Iceberg 1.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Flink: revert the automatic application of custom partitioner for bucketing column with hash distribution [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on issue #8847: URL: https://github.com/apache/iceberg/issues/8847#issuecomment-1765460065 Thanks, @stevenzwu! I agree that reverting the behavior change makes the most sense. We should be careful about default behavior changes and rolling back the change (but not the featu

Re: [I] Error when add custom Spark logicalPlan using injectResolutionRule [iceberg]

2023-10-16 Thread via GitHub
github-actions[bot] commented on issue #7271: URL: https://github.com/apache/iceberg/issues/7271#issuecomment-1765457375 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] DeleteOrphanFilesSparkAction doesn't use the Catalog's FileIO [iceberg]

2023-10-16 Thread via GitHub
github-actions[bot] closed issue #7280: DeleteOrphanFilesSparkAction doesn't use the Catalog's FileIO URL: https://github.com/apache/iceberg/issues/7280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Error when add custom Spark logicalPlan using injectResolutionRule [iceberg]

2023-10-16 Thread via GitHub
github-actions[bot] closed issue #7271: Error when add custom Spark logicalPlan using injectResolutionRule URL: https://github.com/apache/iceberg/issues/7271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] DeleteOrphanFilesSparkAction doesn't use the Catalog's FileIO [iceberg]

2023-10-16 Thread via GitHub
github-actions[bot] commented on issue #7280: URL: https://github.com/apache/iceberg/issues/7280#issuecomment-1765457345 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Flink: revert the automatic application of custom partitioner for bucketing column with hash distribution [iceberg]

2023-10-16 Thread via GitHub
stevenzwu commented on issue #8847: URL: https://github.com/apache/iceberg/issues/8847#issuecomment-1765429213 @rdblue here is the recap from the discussions on the PR #7161. https://github.com/apache/iceberg/pull/7161#issuecomment-1761169778 PR #7161 automatically apply the custom bu

Re: [I] Flink: revert the automatic custom partitioner for bucketing column with hash distribution [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on issue #8847: URL: https://github.com/apache/iceberg/issues/8847#issuecomment-1765411414 @stevenzwu, can you help us understand what is a problem with this and why it should be removed from the 1.4.1 release? -- This is an automated message from the Apache Git Service.

Re: [PR] Python: Add support for Python 3.12 [iceberg-python]

2023-10-16 Thread via GitHub
jayceslesar commented on PR #35: URL: https://github.com/apache/iceberg-python/pull/35#issuecomment-1765396170 @steinsgateted looks like there are no 3.12 wheels yet see the discussion on https://github.com/aio-libs/aiohttp/issues/7639 -- This is an automated message from the Apache G

[I] Enable Partition Transforms and/or Spark SQL In Spark `rewrite_data_files` Procedure [iceberg]

2023-10-16 Thread via GitHub
RLashofRegas opened a new issue, #8846: URL: https://github.com/apache/iceberg/issues/8846 ### Feature Request / Improvement I am using iceberg v0.14.0 w/ Spark 3.3.0 on Amazon EMR 6.8.0. We are trying to implement regular table maintenance on a table that uses partition transf

Re: [I] De-Duping Rows While Compacting [iceberg]

2023-10-16 Thread via GitHub
W-I-D-EE commented on issue #8702: URL: https://github.com/apache/iceberg/issues/8702#issuecomment-1765308407 Further to this, i have actually had a lot of trouble getting delete from or merge into working with removing duplicate rows. Today the only way i have been able to remove deuplicat

Re: [PR] Remove `example` since it is deprecated [iceberg-python]

2023-10-16 Thread via GitHub
Fokko merged PR #79: URL: https://github.com/apache/iceberg-python/pull/79 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Remove `example` since it is deprecated [iceberg-python]

2023-10-16 Thread via GitHub
Fokko commented on PR #79: URL: https://github.com/apache/iceberg-python/pull/79#issuecomment-1765243284 Thanks @rdblue ! 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Don't fail on warning when releasing [iceberg-python]

2023-10-16 Thread via GitHub
Fokko merged PR #80: URL: https://github.com/apache/iceberg-python/pull/80 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Don't fail on warning when releasing [iceberg-python]

2023-10-16 Thread via GitHub
Fokko commented on PR #80: URL: https://github.com/apache/iceberg-python/pull/80#issuecomment-1765242399 @rdblue Yes, this effort is going on at https://github.com/apache/iceberg-python/pull/33. It is tricky because it also catches warnings from external libraries (Ray threw some warnings),

Re: [PR] Don't fail on warning when releasing [iceberg-python]

2023-10-16 Thread via GitHub
rdblue commented on PR #80: URL: https://github.com/apache/iceberg-python/pull/80#issuecomment-1765238294 Should we make -Werror part of CI? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] JDBC catalog fix namespaceExists check [iceberg]

2023-10-16 Thread via GitHub
ismailsimsek commented on code in PR #8340: URL: https://github.com/apache/iceberg/pull/8340#discussion_r1361153332 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -135,7 +135,7 @@ final class JdbcUtil { + CATALOG_NAME + " = ? AND "

[PR] Don't fail on warning when releasing [iceberg-python]

2023-10-16 Thread via GitHub
Fokko opened a new pull request, #80: URL: https://github.com/apache/iceberg-python/pull/80 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [I] [bug] Spark SQL phase optimization failed on concurrent write attempt [iceberg]

2023-10-16 Thread via GitHub
kangyang-wang commented on issue #7800: URL: https://github.com/apache/iceberg/issues/7800#issuecomment-1765093921 Got the same issue here while trying to write to s3... Any solutions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
jacobmarble commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1361135225 ## format/spec.md: ## @@ -971,8 +985,10 @@ The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with | **`decimal(P,S)`** | `hashBytes(minBi

[PR] Remove`example` since it is deprecated [iceberg-python]

2023-10-16 Thread via GitHub
Fokko opened a new pull request, #79: URL: https://github.com/apache/iceberg-python/pull/79 ``` E pydantic.warnings.PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'example'). Depreca

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
jacobmarble commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1361006657 ## format/spec.md: ## @@ -874,6 +878,11 @@ Maps with non-string keys must use an array representation with the `map` logica |**`list`**|`array`|| |**`map`**|`arr

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
jacobmarble commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1361004482 ## format/spec.md: ## @@ -948,6 +961,7 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo Notes: 1. ORC's [TimestampColumnVec

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
jacobmarble commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1361003504 ## format/spec.md: ## @@ -177,8 +177,10 @@ A **`map`** is a collection of key-value pairs with a key type and a value type. | **`decimal(P,S)`** | Fixed-point dec

Re: [PR] JDBC catalog fix namespaceExists check [iceberg]

2023-10-16 Thread via GitHub
dramaticlly commented on code in PR #8340: URL: https://github.com/apache/iceberg/pull/8340#discussion_r1361000799 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -135,7 +135,7 @@ final class JdbcUtil { + CATALOG_NAME + " = ? AND "

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-16 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764912976 @huaxingao So finally it is working but without `between `and `<=` operators. Yes, I have to tweak my query to adjust the timezone so that entire partition is picked by query. ```

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
jacobmarble commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360997998 ## format/spec.md: ## @@ -971,8 +985,10 @@ The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with | **`decimal(P,S)`** | `hashBytes(minBi

Re: [PR] JDBC catalog fix namespaceExists check [iceberg]

2023-10-16 Thread via GitHub
amogh-jahagirdar commented on code in PR #8340: URL: https://github.com/apache/iceberg/pull/8340#discussion_r1360982717 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -135,7 +135,7 @@ final class JdbcUtil { + CATALOG_NAME + " = ? AND "

Re: [PR] Remove python working directory [iceberg-python]

2023-10-16 Thread via GitHub
rdblue merged PR #71: URL: https://github.com/apache/iceberg-python/pull/71 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
Fokko commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360945526 ## format/spec.md: ## @@ -874,6 +878,11 @@ Maps with non-string keys must use an array representation with the `map` logica |**`list`**|`array`|| |**`map`**|`array` of

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-16 Thread via GitHub
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764861269 @atifiu Based on the log, only `IsNotNull(initial_page_view_dtm)` is completely evaluated on iceberg side. Both `(initial_page_view_dtm#3 >= 2023-06-02 06:00:00)` and `initial_page_view

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
jacobmarble commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360938772 ## format/spec.md: ## @@ -874,6 +878,11 @@ Maps with non-string keys must use an array representation with the `map` logica |**`list`**|`array`|| |**`map`**|`arr

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
Fokko commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360934237 ## format/spec.md: ## @@ -874,6 +878,11 @@ Maps with non-string keys must use an array representation with the `map` logica |**`list`**|`array`|| |**`map`**|`array` of

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
jacobmarble commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360930771 ## format/spec.md: ## @@ -874,6 +878,11 @@ Maps with non-string keys must use an array representation with the `map` logica |**`list`**|`array`|| |**`map`**|`arr

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
Fokko commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360913216 ## format/spec.md: ## @@ -874,6 +878,11 @@ Maps with non-string keys must use an array representation with the `map` logica |**`list`**|`array`|| |**`map`**|`array` of

[PR] Core: Do not use a lazy split offset list in manifests (#8834) [iceberg]

2023-10-16 Thread via GitHub
nastra opened a new pull request, #8845: URL: https://github.com/apache/iceberg/pull/8845 This backports #8834 to 1.4.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Core: fix reading of split offsets in manifests [iceberg]

2023-10-16 Thread via GitHub
rdblue merged PR #8834: URL: https://github.com/apache/iceberg/pull/8834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Core: fix reading of split offsets in manifests [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on code in PR #8834: URL: https://github.com/apache/iceberg/pull/8834#discussion_r1360880736 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -463,11 +460,7 @@ public ByteBuffer keyMetadata() { @Override public List splitOffsets() { -if

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360873696 ## format/spec.md: ## @@ -177,8 +177,10 @@ A **`map`** is a collection of key-value pairs with a key type and a value type. | **`decimal(P,S)`** | Fixed-point decimal;

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360871361 ## format/spec.md: ## @@ -971,8 +985,10 @@ The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with | **`decimal(P,S)`** | `hashBytes(minBigEndi

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360870420 ## format/spec.md: ## @@ -948,6 +961,7 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo Notes: 1. ORC's [TimestampColumnVector](

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360866229 ## format/spec.md: ## @@ -874,6 +878,11 @@ Maps with non-string keys must use an array representation with the `map` logica |**`list`**|`array`|| |**`map`**|`array` o

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360863357 ## format/spec.md: ## @@ -862,10 +864,12 @@ Maps with non-string keys must use an array representation with the `map` logica |**`float`**|`float`|| |**`double`**|`dou

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-16 Thread via GitHub
rdblue commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1360863357 ## format/spec.md: ## @@ -862,10 +864,12 @@ Maps with non-string keys must use an array representation with the `map` logica |**`float`**|`float`|| |**`double`**|`dou

Re: [PR] feat: Implement Iceberg values [iceberg-rust]

2023-10-16 Thread via GitHub
ZENOTME commented on code in PR #20: URL: https://github.com/apache/iceberg-rust/pull/20#discussion_r1360827510 ## crates/iceberg/src/spec/values.rs: ## @@ -0,0 +1,964 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] Spark 3.4: Fix issue when partitioning by UUID [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8250: URL: https://github.com/apache/iceberg/pull/8250#discussion_r1360824697 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java: ## @@ -71,8 +77,12 @@ public void set(int pos, T value) { row.update(pos

Re: [PR] feat: Implement Iceberg values [iceberg-rust]

2023-10-16 Thread via GitHub
ZENOTME commented on code in PR #20: URL: https://github.com/apache/iceberg-rust/pull/20#discussion_r1360827510 ## crates/iceberg/src/spec/values.rs: ## @@ -0,0 +1,964 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] Core: fix reading of split offsets in manifests [iceberg]

2023-10-16 Thread via GitHub
bryanck commented on code in PR #8834: URL: https://github.com/apache/iceberg/pull/8834#discussion_r1360817555 ## core/src/test/java/org/apache/iceberg/TestManifestReader.java: ## @@ -61,16 +70,14 @@ public void testReaderWithFilterWithoutSelect() throws IOException { Mani

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-16 Thread via GitHub
nastra commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1764681891 > Some systems like older versions of Impala do not annotate String type as UTF-8 columns in Parquet files. When importing these Parquet files into Iceberg, reading these Binary columns wi

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
zhangminglei commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360798631 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
zhangminglei commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360727462 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -110,6 +110,8 @@ private Cache createTableCache(Ticker ticker) { .removalListener(n

[I] Bug: PostgreSql integration [iceberg-python]

2023-10-16 Thread via GitHub
mobley-trent opened a new issue, #78: URL: https://github.com/apache/iceberg-python/issues/78 ### Apache Iceberg version 0.5.0 (latest release) ### Please describe the bug 🐞 Python = 3.11 PostgreSql = v16 I'm having issues setting up the initial connection to po

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-16 Thread via GitHub
Fokko commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1360636839 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1025,108 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-810f

Re: [PR] Nessie: Adapt to Nessie 0.71.1 release [iceberg]

2023-10-16 Thread via GitHub
dimas-b commented on code in PR #8798: URL: https://github.com/apache/iceberg/pull/8798#discussion_r1360635935 ## nessie/src/test/java/org/apache/iceberg/nessie/TestCustomNessieClient.java: ## @@ -78,30 +77,11 @@ public void testNonExistentCustomClient() {

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360600506 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360589867 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360589513 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360588962 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestCachingCatalogExpirationAfterWrite.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the

Re: [PR] Core: fix reading of split offsets in manifests [iceberg]

2023-10-16 Thread via GitHub
advancedxy commented on PR #8834: URL: https://github.com/apache/iceberg/pull/8834#issuecomment-1764363047 > > Other than to revert the optimize in #8336, is it better to invalidate the cached `splitOffsetList`? The proposed change is in the `org.apache.iceberg.BaseFile#put` function: >

Re: [PR] Core: fix reading of split offsets in manifests [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8834: URL: https://github.com/apache/iceberg/pull/8834#discussion_r1360559226 ## core/src/test/java/org/apache/iceberg/TestManifestReader.java: ## @@ -61,16 +70,14 @@ public void testReaderWithFilterWithoutSelect() throws IOException { Manif

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-16 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1764246045 @huaxingao Based on your suggestion, I have narrowed the filter criteria so that even considering the timezone problem, we dont filter on more than two partitions so that filter can be pus

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360480900 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -110,6 +110,8 @@ private Cache createTableCache(Ticker ticker) { .removalListener(new Met

Re: [PR] Add an expireAfterWrite cache eviction policy to CachingCatalog [iceberg]

2023-10-16 Thread via GitHub
nastra commented on code in PR #8844: URL: https://github.com/apache/iceberg/pull/8844#discussion_r1360480900 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -110,6 +110,8 @@ private Cache createTableCache(Ticker ticker) { .removalListener(new Met

Re: [PR] Core: fix reading of split offsets in manifests [iceberg]

2023-10-16 Thread via GitHub
bryanck commented on PR #8834: URL: https://github.com/apache/iceberg/pull/8834#issuecomment-1764216831 > Other than to revert the optimize in #8336, is it better to invalidate the cached `splitOffsetList`? The proposed change is in the `org.apache.iceberg.BaseFile#put` function: > `

Re: [PR] feat: suport read/write Manifest [iceberg-rust]

2023-10-16 Thread via GitHub
ZENOTME commented on code in PR #79: URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1360421087 ## crates/iceberg/src/spec/manifest.rs: ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] JDBC: JDBC Catalog should do exact namespace search for get namespace queries [iceberg]

2023-10-16 Thread via GitHub
amogh-jahagirdar commented on PR #8833: URL: https://github.com/apache/iceberg/pull/8833#issuecomment-1764118823 Looks like a similar PR was already put up in the past https://github.com/apache/iceberg/pull/8340, we can just review that. -- This is an automated message from the Apache Git

Re: [I] [JdbcCatalog] Issue with Namespace Exists [iceberg]

2023-10-16 Thread via GitHub
amogh-jahagirdar commented on issue #8832: URL: https://github.com/apache/iceberg/issues/8832#issuecomment-1764106743 > Sounds like a similar issue to #8321. @amogh-jahagirdar should we get #8321 reviewed first and then we can address any leftovers in this PR? Ooh missed #8321, yes le

Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-16 Thread via GitHub
liurenjie1024 commented on code in PR #78: URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1360370023 ## crates/iceberg-rest/Cargo.toml: ## @@ -0,0 +1,41 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-16 Thread via GitHub
Xuanwo commented on code in PR #78: URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1360351583 ## crates/iceberg-rest/Cargo.toml: ## @@ -0,0 +1,41 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-16 Thread via GitHub
liurenjie1024 commented on code in PR #78: URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1360337146 ## crates/iceberg-rest/Cargo.toml: ## @@ -0,0 +1,41 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-16 Thread via GitHub
Xuanwo commented on code in PR #78: URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1360330437 ## crates/iceberg-rest/Cargo.toml: ## @@ -0,0 +1,41 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

Re: [I] Manage dependencies using workspace. [iceberg-rust]

2023-10-16 Thread via GitHub
liurenjie1024 commented on issue #24: URL: https://github.com/apache/iceberg-rust/issues/24#issuecomment-1763942829 > > Following #15 , we should provide workspace wide dependency management. > > From my daily routine, I find it inconvenient when a library has workspace-wide dependenc

Re: [PR] Build: add gradle configuration to enforce reproducible build [iceberg]

2023-10-16 Thread via GitHub
nastra merged PR #8826: URL: https://github.com/apache/iceberg/pull/8826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] Build: enforce reproducible build [iceberg]

2023-10-16 Thread via GitHub
nastra closed issue #8825: Build: enforce reproducible build URL: https://github.com/apache/iceberg/issues/8825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

  1   2   >