Re: [PR] Spec: Clarify which columns can be used for equality delete files. [iceberg]

2023-11-08 Thread via GitHub
gaborkaszab commented on code in PR #8981: URL: https://github.com/apache/iceberg/pull/8981#discussion_r1386184942 ## format/spec.md: ## @@ -842,7 +842,8 @@ The rows in the delete file must be sorted by `file_path` then `pos` to optimize Equality delete files identify delete

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2023-11-08 Thread via GitHub
Fokko commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1386152391 ## site/dev/common.sh: ## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreement

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2023-11-08 Thread via GitHub
bitsondatadev commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1386194229 ## site/.gitignore: ## @@ -1,13 +1,3 @@ -## Temp remove for first phase Review Comment: Yeah, fair enough...I added a few specifics from my previous one --

Re: [I] [Feature Request] Implement `equals` for `RESTMessage` [iceberg]

2023-11-08 Thread via GitHub
Fokko commented on issue #9003: URL: https://github.com/apache/iceberg/issues/9003#issuecomment-1801334447 @liurenjie1024 No problem and I think that makes sense to be able to check those for equality. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2023-11-08 Thread via GitHub
bitsondatadev commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1386208436 ## site/dev/common.sh: ## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license a

Re: [I] [Feature Request] Implement `equals` for `RESTMessage` [iceberg]

2023-11-08 Thread via GitHub
liurenjie1024 commented on issue #9003: URL: https://github.com/apache/iceberg/issues/9003#issuecomment-1801337442 Cool, I will take this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2023-11-08 Thread via GitHub
bitsondatadev commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1386210561 ## site/Makefile: ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE fil

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2023-11-08 Thread via GitHub
bitsondatadev commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1386213158 ## site/README.md: ## @@ -83,59 +99,27 @@ All previously versioned docs will be committed in `docs-` branches and       └── ... ``` -### Install - -1. (Opt

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2023-11-08 Thread via GitHub
Fokko commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1386219088 ## site/dev/common.sh: ## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreement

Re: [I] [Feature Request] Implement `equals` for `RESTMessage` [iceberg]

2023-11-08 Thread via GitHub
liurenjie1024 commented on issue #9003: URL: https://github.com/apache/iceberg/issues/9003#issuecomment-1801347467 cc @Fokko Please help to assign it to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Spec: Clarify which columns can be used for equality delete files. [iceberg]

2023-11-08 Thread via GitHub
liurenjie1024 commented on code in PR #8981: URL: https://github.com/apache/iceberg/pull/8981#discussion_r138616 ## format/spec.md: ## @@ -842,7 +842,8 @@ The rows in the delete file must be sorted by `file_path` then `pos` to optimize Equality delete files identify dele

Re: [PR] feat: Implement load table api. [iceberg-rust]

2023-11-08 Thread via GitHub
Fokko commented on code in PR #89: URL: https://github.com/apache/iceberg-rust/pull/89#discussion_r1386222676 ## crates/catalog/rest/src/catalog.rs: ## @@ -312,11 +316,43 @@ impl Catalog for RestCatalog { } /// Load table from the catalog. -async fn load_table(&s

Re: [PR] feat: Implement load table api. [iceberg-rust]

2023-11-08 Thread via GitHub
Fokko merged PR #89: URL: https://github.com/apache/iceberg-rust/pull/89 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] feat: Implement load table api. [iceberg-rust]

2023-11-08 Thread via GitHub
Fokko commented on code in PR #89: URL: https://github.com/apache/iceberg-rust/pull/89#discussion_r1386224255 ## crates/iceberg/src/table.rs: ## @@ -17,10 +17,33 @@ //! Table API for Apache Iceberg +use crate::io::FileIO; use crate::spec::TableMetadata; +use crate::TableId

[PR] Deploy 1.4.2 to docs branch / fix 1.4.x docs for search [iceberg]

2023-11-08 Thread via GitHub
bitsondatadev opened a new pull request, #9005: URL: https://github.com/apache/iceberg/pull/9005 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Core: Add a constructor to StaticTableOperations [iceberg]

2023-11-08 Thread via GitHub
nastra merged PR #8996: URL: https://github.com/apache/iceberg/pull/8996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] DELETE fails with "java.lang.IllegalArgumentException: info must be ExtendedLogicalWriteInfo" [iceberg]

2023-11-08 Thread via GitHub
bknbkn commented on issue #8926: URL: https://github.com/apache/iceberg/issues/8926#issuecomment-1801484253 may be you can add .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") and try again -- This is an automated message from the

Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2023-11-08 Thread via GitHub
nk1506 commented on PR #8907: URL: https://github.com/apache/iceberg/pull/8907#issuecomment-1801545652 @pvary , My Bad i didn't understand the initial commends on loading all the table data from HMS. As you mentioned initial implementation was not filtering to construct the `tableIdentifier

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2023-11-08 Thread via GitHub
bitsondatadev commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1386414038 ## site/dev/common.sh: ## @@ -0,0 +1,125 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license a

Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]

2023-11-08 Thread via GitHub
nk1506 commented on code in PR #8918: URL: https://github.com/apache/iceberg/pull/8918#discussion_r1386428086 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -261,6 +261,12 @@ public void renameTable(TableIdentifier from, TableIdentifier original

[I] table created by pyiceberg could not interoperate well with trino [iceberg-python]

2023-11-08 Thread via GitHub
zeddit opened a new issue, #134: URL: https://github.com/apache/iceberg-python/issues/134 ### Apache Iceberg version None ### Please describe the bug 🐞 I am using hive metastore. when accessing a table created with pyiceberg using below statements ``` schema

Re: [PR] Deploy 1.4.2 to docs branch / fix 1.4.x docs for search [iceberg]

2023-11-08 Thread via GitHub
Fokko merged PR #9005: URL: https://github.com/apache/iceberg/pull/9005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [I] hive integration iceberg related problems [iceberg]

2023-11-08 Thread via GitHub
pvary commented on issue #8993: URL: https://github.com/apache/iceberg/issues/8993#issuecomment-1801696625 > @pvary However, writing data does have problems. The data directory is generated normally, and the data file is also generated normally, but the metadata directory only has the mated

Re: [PR] feat: support ser/deser of value [iceberg-rust]

2023-11-08 Thread via GitHub
liurenjie1024 commented on PR #82: URL: https://github.com/apache/iceberg-rust/pull/82#issuecomment-1801698341 cc @Xuanwo @Fokko Any other comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Flink: Add support for Flink 1.18 [iceberg]

2023-11-08 Thread via GitHub
pvary commented on issue #8930: URL: https://github.com/apache/iceberg/issues/8930#issuecomment-1801714514 I have 2 PRs in progress, which I would like to merge first (#8803 and #8553). If nobody starts working on this until they are merged, then I will move forward with this one. I plan to

Re: [PR] Docs: Add note that snapshot expiration and cleanup orphan files could corrupt Flink job state [iceberg]

2023-11-08 Thread via GitHub
pvary merged PR #9002: URL: https://github.com/apache/iceberg/pull/9002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Docs: Add note that snapshot expiration and cleanup orphan files could corrupt Flink job state [iceberg]

2023-11-08 Thread via GitHub
pvary commented on PR #9002: URL: https://github.com/apache/iceberg/pull/9002#issuecomment-1801718060 Thanks @lirui-apache! Merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Hive: Refactor TestHiveCatalog tests to use the core CatalogTests [iceberg]

2023-11-08 Thread via GitHub
pvary commented on code in PR #8918: URL: https://github.com/apache/iceberg/pull/8918#discussion_r1386492127 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java: ## @@ -261,6 +261,12 @@ public void renameTable(TableIdentifier from, TableIdentifier originalT

Re: [PR] Spec: Clarify which columns can be used for equality delete files. [iceberg]

2023-11-08 Thread via GitHub
gaborkaszab commented on code in PR #8981: URL: https://github.com/apache/iceberg/pull/8981#discussion_r1386511343 ## format/spec.md: ## @@ -842,7 +842,8 @@ The rows in the delete file must be sorted by `file_path` then `pos` to optimize Equality delete files identify delete

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2023-11-08 Thread via GitHub
Fokko commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1386524118 ## site/dev/common.sh: ## @@ -0,0 +1,122 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreement

Re: [PR] Docs: Add note that snapshot expiration and cleanup orphan files could corrupt Flink job state [iceberg]

2023-11-08 Thread via GitHub
lirui-apache commented on PR #9002: URL: https://github.com/apache/iceberg/pull/9002#issuecomment-1801779507 Thanks @pvary ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

2023-11-08 Thread via GitHub
cccs-jc commented on code in PR #8980: URL: https://github.com/apache/iceberg/pull/8980#discussion_r1386580961 ## core/src/main/java/org/apache/iceberg/MicroBatches.java: ## @@ -92,7 +92,7 @@ private static List> indexManifests( for (ManifestFile manifest : manifestFiles

Re: [PR] Spec: Clarify which columns can be used for equality delete files. [iceberg]

2023-11-08 Thread via GitHub
liurenjie1024 commented on code in PR #8981: URL: https://github.com/apache/iceberg/pull/8981#discussion_r1386629666 ## format/spec.md: ## @@ -842,7 +842,8 @@ The rows in the delete file must be sorted by `file_path` then `pos` to optimize Equality delete files identify dele

[I] Spark does not support time [iceberg]

2023-11-08 Thread via GitHub
tundraraj opened a new issue, #9006: URL: https://github.com/apache/iceberg/issues/9006 ### Feature Request / Improvement Spark does not support time per https://iceberg.apache.org/docs/latest/spark-writes/#iceberg-type-to-spark-type. see thread https://apache-iceberg.slack.com/ar

Re: [PR] Spark 3.5: Fix Migrate procedure renaming issue for custom catalog [iceberg]

2023-11-08 Thread via GitHub
tomtongue commented on code in PR #8931: URL: https://github.com/apache/iceberg/pull/8931#discussion_r1386791377 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java: ## @@ -108,6 +109,23 @@ public MigrateTableSparkAction backupTableNam

Re: [PR] Core: Add View support for REST catalog [iceberg]

2023-11-08 Thread via GitHub
nastra commented on code in PR #7913: URL: https://github.com/apache/iceberg/pull/7913#discussion_r1386793375 ## open-api/rest-catalog-open-api.yaml: ## @@ -1630,6 +1990,102 @@ components: metadata-log: $ref: '#/components/schemas/MetadataLog' +SQLViewR

Re: [PR] Core: Add View support for REST catalog [iceberg]

2023-11-08 Thread via GitHub
nastra commented on code in PR #7913: URL: https://github.com/apache/iceberg/pull/7913#discussion_r1386806741 ## core/src/main/java/org/apache/iceberg/view/ViewMetadata.java: ## @@ -257,8 +258,8 @@ public Builder addVersion(ViewVersion version) { return this; } -

Re: [PR] Spark 3.5: Fix Migrate procedure renaming issue for custom catalog [iceberg]

2023-11-08 Thread via GitHub
tomtongue commented on code in PR #8931: URL: https://github.com/apache/iceberg/pull/8931#discussion_r1386791377 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java: ## @@ -108,6 +109,23 @@ public MigrateTableSparkAction backupTableNam

Re: [I] Support adding an additional `opType` column when creating a table [iceberg]

2023-11-08 Thread via GitHub
klion26 commented on issue #8973: URL: https://github.com/apache/iceberg/issues/8973#issuecomment-1802132754 @nastra thanks for the reply. Yes, it looks the same, but seems the `change-data-capture` feature was a view created on a exist iceberg table? our customer want some iceberg ta

Re: [PR] Spark 3.5: Fix Migrate procedure renaming issue for custom catalog [iceberg]

2023-11-08 Thread via GitHub
singhpk234 commented on code in PR #8931: URL: https://github.com/apache/iceberg/pull/8931#discussion_r1386866648 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java: ## @@ -108,6 +109,23 @@ public MigrateTableSparkAction backupTableNa

Re: [I] Default table properties not respected when using Spark DataFrame API [iceberg]

2023-11-08 Thread via GitHub
boushphong commented on issue #8265: URL: https://github.com/apache/iceberg/issues/8265#issuecomment-1802311411 Still reproducible on 1.4.2. Can I take this one? :bow: @nastra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[I] using pyiceberg with kerberized hive metastore [iceberg-python]

2023-11-08 Thread via GitHub
saidixith002 opened a new issue, #135: URL: https://github.com/apache/iceberg-python/issues/135 ### Question Hi, Can anyone share examples of using pyiceberg with a kerberized hive metastore? ``` raise TTransportException(type=TTransportException.END_OF_FILE, thrift.tr

Re: [I] Default table properties not respected when using Spark DataFrame API [iceberg]

2023-11-08 Thread via GitHub
boushphong commented on issue #8265: URL: https://github.com/apache/iceberg/issues/8265#issuecomment-1802356534 actually I don't think this is a bug This would produce DDL like Spark-SQL API. ```scala df.write.partitionBy("vendor_id").saveAsTable("local.nyc.taxis_df") ``` -- Th

Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

2023-11-08 Thread via GitHub
zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1802370780 @rdblue it's great to find `pyiceberg` load data in a **deterministic** way, as what Fokko said, the manifests are read sequentially. thus we only need to control the way how d

Re: [I] DELETE fails with "java.lang.IllegalArgumentException: info must be ExtendedLogicalWriteInfo" [iceberg]

2023-11-08 Thread via GitHub
rafoid commented on issue #8926: URL: https://github.com/apache/iceberg/issues/8926#issuecomment-1802375058 When adding ``` --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ ``` in addition to the set of configs listed above, I can't

Re: [PR] minor: Provide Debug impl for pub structs #73 [iceberg-rust]

2023-11-08 Thread via GitHub
DeaconDesperado commented on PR #92: URL: https://github.com/apache/iceberg-rust/pull/92#issuecomment-1802389582 Rebased and resolved conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Spec: Clarify which columns can be used for equality delete files. [iceberg]

2023-11-08 Thread via GitHub
emkornfield commented on code in PR #8981: URL: https://github.com/apache/iceberg/pull/8981#discussion_r1387018427 ## format/spec.md: ## @@ -842,7 +842,8 @@ The rows in the delete file must be sorted by `file_path` then `pos` to optimize Equality delete files identify delete

Re: [PR] API, Core: implement types timestamp_ns and timestamptz_ns [iceberg]

2023-11-08 Thread via GitHub
jacobmarble commented on PR #8971: URL: https://github.com/apache/iceberg/pull/8971#issuecomment-1802450224 > @jacobmarble can you break this into smaller commits? There are a ton of files changed here and I'm concerned about catching problems with such a large PR. > I agree with @rd

Re: [PR] API, Core: implement types timestamp_ns and timestamptz_ns [iceberg]

2023-11-08 Thread via GitHub
jacobmarble closed pull request #8971: API, Core: implement types timestamp_ns and timestamptz_ns URL: https://github.com/apache/iceberg/pull/8971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2023-11-08 Thread via GitHub
jacobmarble opened a new pull request, #9008: URL: https://github.com/apache/iceberg/pull/9008 Helps #8657 This change adds field `ChronoUnit unit` to `TimestampType`, such that `TimestampType` now represents four specified types: - `timestamp` (existing) - `timestamptz` (existi

Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

2023-11-08 Thread via GitHub
zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1802508681 @Fokko In conclusion, even though pyiceberg loads data in a deterministic way, which means the results is preserved between runs, the results is far from arranged, which means w

Re: [PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2023-11-08 Thread via GitHub
jacobmarble commented on PR #9008: URL: https://github.com/apache/iceberg/pull/9008#issuecomment-1802578562 Do these need to be addressed in this PR? ```console TestSpark3Util > testDescribeSortOrder FAILED org.junit.ComparisonFailure: Sort order isn't correct. expected:<[ho

Re: [PR] Spec: clarify ns timestamps for ORC deserialization [iceberg]

2023-11-08 Thread via GitHub
rdblue commented on PR #9007: URL: https://github.com/apache/iceberg/pull/9007#issuecomment-1802656478 Thanks, @jacobmarble! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Spec: clarify ns timestamps for ORC deserialization [iceberg]

2023-11-08 Thread via GitHub
rdblue merged PR #9007: URL: https://github.com/apache/iceberg/pull/9007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Core: Support replacing delete manifests [iceberg]

2023-11-08 Thread via GitHub
RussellSpitzer commented on code in PR #9000: URL: https://github.com/apache/iceberg/pull/9000#discussion_r1387196142 ## core/src/test/java/org/apache/iceberg/TestRewriteManifests.java: ## @@ -1105,6 +1108,499 @@ public void testRewriteManifestsOnBranchUnsupported() {

Re: [PR] Core: Support replacing delete manifests [iceberg]

2023-11-08 Thread via GitHub
aokolnychyi commented on code in PR #9000: URL: https://github.com/apache/iceberg/pull/9000#discussion_r1387199381 ## core/src/test/java/org/apache/iceberg/TestRewriteManifests.java: ## @@ -1105,6 +1108,499 @@ public void testRewriteManifestsOnBranchUnsupported() {

Re: [PR] Core: Support replacing delete manifests [iceberg]

2023-11-08 Thread via GitHub
RussellSpitzer commented on code in PR #9000: URL: https://github.com/apache/iceberg/pull/9000#discussion_r1387199271 ## core/src/test/java/org/apache/iceberg/TestRewriteManifests.java: ## @@ -1105,6 +1108,499 @@ public void testRewriteManifestsOnBranchUnsupported() {

Re: [PR] Core: Support replacing delete manifests [iceberg]

2023-11-08 Thread via GitHub
aokolnychyi commented on code in PR #9000: URL: https://github.com/apache/iceberg/pull/9000#discussion_r1387199381 ## core/src/test/java/org/apache/iceberg/TestRewriteManifests.java: ## @@ -1105,6 +1108,499 @@ public void testRewriteManifestsOnBranchUnsupported() {

Re: [PR] Core: Support replacing delete manifests [iceberg]

2023-11-08 Thread via GitHub
aokolnychyi commented on code in PR #9000: URL: https://github.com/apache/iceberg/pull/9000#discussion_r1387200273 ## core/src/test/java/org/apache/iceberg/TestRewriteManifests.java: ## @@ -1105,6 +1108,499 @@ public void testRewriteManifestsOnBranchUnsupported() {

Re: [PR] Core: Support replacing delete manifests [iceberg]

2023-11-08 Thread via GitHub
aokolnychyi commented on code in PR #9000: URL: https://github.com/apache/iceberg/pull/9000#discussion_r1387199381 ## core/src/test/java/org/apache/iceberg/TestRewriteManifests.java: ## @@ -1105,6 +1108,499 @@ public void testRewriteManifestsOnBranchUnsupported() {

[PR] Spark 3.4: Display more read metrics on Spark SQL UI [iceberg]

2023-11-08 Thread via GitHub
karuppayya opened a new pull request, #9009: URL: https://github.com/apache/iceberg/pull/9009 This is cherry-pick of the following comment in 3.5 ``` a44592501 Spark 3.5: Display more read metrics on Spark SQL UI (#8717) ``` -- This is an automated message from the Apache Git Serv

Re: [PR] Core: Support replacing delete manifests [iceberg]

2023-11-08 Thread via GitHub
aokolnychyi commented on code in PR #9000: URL: https://github.com/apache/iceberg/pull/9000#discussion_r1387249948 ## core/src/test/java/org/apache/iceberg/TestRewriteManifests.java: ## @@ -1105,6 +1108,499 @@ public void testRewriteManifestsOnBranchUnsupported() {

[PR] Build: Bump pyarrow from 14.0.0 to 14.0.1 [iceberg-python]

2023-11-08 Thread via GitHub
dependabot[bot] opened a new pull request, #136: URL: https://github.com/apache/iceberg-python/pull/136 Bumps [pyarrow](https://github.com/apache/arrow) from 14.0.0 to 14.0.1. Commits https://github.com/apache/arrow/commit/ba537483618196f50c67a90a473039e4d5dc35e0";>ba53748 MINO

Re: [PR] Core: Support replacing delete manifests [iceberg]

2023-11-08 Thread via GitHub
aokolnychyi closed pull request #9000: Core: Support replacing delete manifests URL: https://github.com/apache/iceberg/pull/9000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] planFiles with ParallelIterator OOM(Out of memory) [iceberg]

2023-11-08 Thread via GitHub
github-actions[bot] commented on issue #7594: URL: https://github.com/apache/iceberg/issues/7594#issuecomment-1802943749 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] How to ensure the data is not repeated when using spark to write to the iceberg table [iceberg]

2023-11-08 Thread via GitHub
github-actions[bot] commented on issue #7554: URL: https://github.com/apache/iceberg/issues/7554#issuecomment-1802943769 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Spark 3.4: Display more read metrics on Spark SQL UI [iceberg]

2023-11-08 Thread via GitHub
aokolnychyi merged PR #9009: URL: https://github.com/apache/iceberg/pull/9009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] DELETE fails with "java.lang.IllegalArgumentException: info must be ExtendedLogicalWriteInfo" [iceberg]

2023-11-08 Thread via GitHub
bknbkn commented on issue #8926: URL: https://github.com/apache/iceberg/issues/8926#issuecomment-1803040845 check dependence spark-extensions module correctly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]

2023-11-08 Thread via GitHub
manuzhang commented on PR #8988: URL: https://github.com/apache/iceberg/pull/8988#issuecomment-1803050048 Any more comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

2023-11-08 Thread via GitHub
zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803117179 Here are my experiments and main findings. ### 1. checking for consistent ordering of pyiceberg firstly, I create an emtry table with no partition and sorted_by properties,

Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

2023-11-08 Thread via GitHub
zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803131783 ### 2. adding sorted_by properties, or the term of sort_order in iceberg because we want a sorted iceberg table to store and return time series data. so we create a sorted_by

Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

2023-11-08 Thread via GitHub
zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803136799 ### 3. how about partitioned tables it is a common case for people to use partition to manage tables, and in iceberg partition will lead to a great performance gain by skipping

Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

2023-11-08 Thread via GitHub
zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803138817 ### 4. limitations about the experiments I have not tested the case for - 1. when data is large enough and get split with a partition, i.e. multiple data-files and they

Re: [I] to_pandas() API which converts iceberg table scan to a pd.DataFrame will lost datetime data type and row order [iceberg-python]

2023-11-08 Thread via GitHub
zeddit commented on issue #132: URL: https://github.com/apache/iceberg-python/issues/132#issuecomment-1803140124 I have a first step idea that we could add a parse module which inspect the manifests and determine an order for how to concat pa.tables to assemble the final output. I don't kno

[PR] Refactor HiveTableOperations with common code for View. [iceberg]

2023-11-08 Thread via GitHub
nk1506 opened a new pull request, #9011: URL: https://github.com/apache/iceberg/pull/9011 As part of issues #8698 , All the common piece between table and view has been moved from `HiveTableOperations` to a new helper class. -- This is an automated message from the Apache Git Service. To

Re: [I] manifest exception [iceberg]

2023-11-08 Thread via GitHub
innocent123 commented on issue #8994: URL: https://github.com/apache/iceberg/issues/8994#issuecomment-1803186989 > @innocent123: I do not really understand your question, but I think your problem might be similar to #5846. when i use spark api rewriteDataFiles is reported "org.apache

Re: [I] manifest exception [iceberg]

2023-11-08 Thread via GitHub
innocent123 commented on issue #8994: URL: https://github.com/apache/iceberg/issues/8994#issuecomment-1803189818 > > @innocent123: I do not really understand your question, but I think your problem might be similar to #5846. > > when i use spark api rewriteDataFiles is reported "org.

Re: [I] manifest exception [iceberg]

2023-11-08 Thread via GitHub
innocent123 commented on issue #8994: URL: https://github.com/apache/iceberg/issues/8994#issuecomment-1803192032 spark version 3.0.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] manifest exception [iceberg]

2023-11-08 Thread via GitHub
innocent123 commented on issue #8994: URL: https://github.com/apache/iceberg/issues/8994#issuecomment-1803192419 iceberg version1.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2023-11-08 Thread via GitHub
nk1506 commented on code in PR #8907: URL: https://github.com/apache/iceberg/pull/8907#discussion_r1387522573 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveViewOperations.java: ## @@ -0,0 +1,389 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]

2023-11-08 Thread via GitHub
nastra commented on PR #8988: URL: https://github.com/apache/iceberg/pull/8988#issuecomment-1803297752 I think it would be good to also get the opinion of @RussellSpitzer or @aokolnychyi on this one -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Default table properties not respected when using Spark DataFrame API [iceberg]

2023-11-08 Thread via GitHub
nastra commented on issue #8265: URL: https://github.com/apache/iceberg/issues/8265#issuecomment-1803299179 @boushphong yes feel free to work on this in case you're interested -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-08 Thread via GitHub
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1387607715 ## core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java: ## @@ -400,8 +400,15 @@ public void replaceTableViaTransactionThatAlreadyExistsAsView() {