Re: [PR] Build: Add note about running tests/itests on MacOS [iceberg]

2023-10-11 Thread via GitHub
nastra commented on PR #8766: URL: https://github.com/apache/iceberg/pull/8766#issuecomment-1758972459 @Fokko could you review and confirm this one please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Nessie: Remove deprecated usage of Operation.Put.of() [iceberg]

2023-10-11 Thread via GitHub
nastra merged PR #8796: URL: https://github.com/apache/iceberg/pull/8796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] rewrite v2 tables by skip deletes planning and join deletes data tables [iceberg]

2023-10-11 Thread via GitHub
zinking commented on PR #8807: URL: https://github.com/apache/iceberg/pull/8807#issuecomment-1758951782 https://github.com/apache/iceberg/assets/118241/c7590c53-f4cd-4729-ac8f-29088feb004e";> here is a demo, the setup is to rewrite 1 single partition with 20K data files and 20K equali

Re: [PR] rewrite v2 tables by skip deletes planning and join deletes data tables [iceberg]

2023-10-11 Thread via GitHub
zinking commented on code in PR #8807: URL: https://github.com/apache/iceberg/pull/8807#discussion_r1356082451 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkJoinedBinPackDataRewriter.java: ## @@ -0,0 +1,349 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Add ASF DOAP rdf file [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on PR #8586: URL: https://github.com/apache/iceberg/pull/8586#issuecomment-1758946291 As soon as the DOAP file is merged, I will add to the ASF records. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-11 Thread via GitHub
ZENOTME commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1356073854 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -90,24 +113,72 @@ impl ManifestList { ]; Schema::builder().with_fields(fields).build().unwrap()

Re: [PR] Spark: Inject `DataSourceV2Relation` when missing [iceberg]

2023-10-11 Thread via GitHub
Fokko closed pull request #7910: Spark: Inject `DataSourceV2Relation` when missing URL: https://github.com/apache/iceberg/pull/7910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-11 Thread via GitHub
ZENOTME commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1356071291 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -69,6 +73,25 @@ impl ManifestList { &self.entries } +/// Get the v1 schema of the manifest list

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758873680 @amogh-jahagirdar format version defined is 2 and I have explicitly defined copy on write for delete, update and merge. I have deleted some partitions and have noticed that in the snapshot

Re: [PR] Prevent dropping last column. [iceberg]

2023-10-11 Thread via GitHub
RussellSpitzer commented on PR #8523: URL: https://github.com/apache/iceberg/pull/8523#issuecomment-1758859862 My general gut on most apis is, Once they exist and have a defined behavior, we should only change that behavior or remove them if we have a good reason. I'm not sure this change p

[I] manifest lost [iceberg]

2023-10-11 Thread via GitHub
chenwyi2 opened a new issue, #8806: URL: https://github.com/apache/iceberg/issues/8806 ### Apache Iceberg version 1.2.1 ### Query engine Flink ### Please describe the bug 🐞 recently i met a job failed with "Failed to open input stream for file: xxx/metadata

Re: [PR] Consider moving to ParallelIterable in Deletes::toPositionIndex [iceberg]

2023-10-11 Thread via GitHub
aokolnychyi commented on PR #6432: URL: https://github.com/apache/iceberg/pull/6432#issuecomment-1758777424 I think we should use a thread pool but I think the implementation should be changed a bit. I explain [here](https://docs.google.com/document/d/1M4L6o-qnGRwGhbhkW8BnravoTwvCrJV8VvzVQD

Re: [PR] Core: Use avro compression properties from table properties when writing manifests and manifest lists [iceberg]

2023-10-11 Thread via GitHub
aokolnychyi commented on PR #6799: URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1758776115 Will do a review by the end of this week, sorry for the delay. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Spark: Inject `DataSourceV2Relation` when missing [iceberg]

2023-10-11 Thread via GitHub
aokolnychyi commented on code in PR #7910: URL: https://github.com/apache/iceberg/pull/7910#discussion_r1355941817 ## spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/SetMissingRelation.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache S

Re: [I] Structured streaming writes to partitioned table fails when spark.sql.extensions is set to IcebergSparkSessionExtensions [iceberg]

2023-10-11 Thread via GitHub
aokolnychyi commented on issue #7226: URL: https://github.com/apache/iceberg/issues/7226#issuecomment-1758774011 If I remember correctly, Spark 3.3 does not have the function catalog. Therefore, we can't resolve Iceberg transforms. I think the right approach is either to use `toTable` in Sp

Re: [PR] Spark: Fix specific field values treated as unequal while comparing rows for carry-over removal [iceberg]

2023-10-11 Thread via GitHub
mkluo8787 commented on PR #8799: URL: https://github.com/apache/iceberg/pull/8799#issuecomment-1758757200 @flyrain Would you mind having a look at it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Failed to decode value as UTF-8: java.nio.HeapByteBuffer [iceberg]

2023-10-11 Thread via GitHub
github-actions[bot] commented on issue #7232: URL: https://github.com/apache/iceberg/issues/7232#issuecomment-1758719765 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Failed to decode value as UTF-8: java.nio.HeapByteBuffer [iceberg]

2023-10-11 Thread via GitHub
github-actions[bot] closed issue #7232: Failed to decode value as UTF-8: java.nio.HeapByteBuffer URL: https://github.com/apache/iceberg/issues/7232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] UnsupportedOperationException when using `IcebergGenerics.read` to read metadata tables [iceberg]

2023-10-11 Thread via GitHub
github-actions[bot] commented on issue #7351: URL: https://github.com/apache/iceberg/issues/7351#issuecomment-1758719706 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[PR] build(deps): bump golang.org/x/net from 0.15.0 to 0.17.0 [iceberg-go]

2023-10-11 Thread via GitHub
dependabot[bot] opened a new pull request, #18: URL: https://github.com/apache/iceberg-go/pull/18 Bumps [golang.org/x/net](https://github.com/golang/net) from 0.15.0 to 0.17.0. Commits https://github.com/golang/net/commit/b225e7ca6dde1ef5a5ae5ce922861bda011cfabd";>b225e7c http

Re: [I] Support create table `PRIMARY KEY` column via Spark sql? [iceberg]

2023-10-11 Thread via GitHub
W-I-D-EE commented on issue #5069: URL: https://github.com/apache/iceberg/issues/5069#issuecomment-1758634548 No but i would love to hear one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat(manifests): Adding implementation of manifest files [iceberg-go]

2023-10-11 Thread via GitHub
zeroshade commented on code in PR #3: URL: https://github.com/apache/iceberg-go/pull/3#discussion_r1355788936 ## manifest.go: ## @@ -0,0 +1,655 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +//

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-11 Thread via GitHub
jacobmarble commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1355775890 ## format/spec.md: ## @@ -177,20 +177,24 @@ A **`map`** is a collection of key-value pairs with a key type and a value type. | **`decimal(P,S)`** | Fixed-point de

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-11 Thread via GitHub
Fokko commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1355723510 ## format/spec.md: ## @@ -908,10 +919,12 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo | **`float`**| `float`

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-11 Thread via GitHub
Fokko commented on code in PR #8683: URL: https://github.com/apache/iceberg/pull/8683#discussion_r1355722230 ## format/spec.md: ## @@ -177,20 +177,24 @@ A **`map`** is a collection of key-value pairs with a key type and a value type. | **`decimal(P,S)`** | Fixed-point decimal;

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355713680 ## api/src/main/java/org/apache/iceberg/Table.java: ## @@ -333,6 +333,15 @@ default UpdateStatistics updateStatistics() { */ Map refs(); + /** + *

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-11 Thread via GitHub
jacobmarble commented on PR #8683: URL: https://github.com/apache/iceberg/pull/8683#issuecomment-1758430601 > @jacobmarble Yes, this looks good to me. Thanks for joining the community sync. Could you avoid reformatting the table? History is especially important in the spec. Done. -

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355680042 ## core/src/main/java/org/apache/iceberg/BaseMetadataTable.java: ## @@ -199,6 +199,11 @@ public Map refs() { return table().refs(); } + @Override +

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355680042 ## core/src/main/java/org/apache/iceberg/BaseMetadataTable.java: ## @@ -199,6 +199,11 @@ public Map refs() { return table().refs(); } + @Override +

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355680042 ## core/src/main/java/org/apache/iceberg/BaseMetadataTable.java: ## @@ -199,6 +199,11 @@ public Map refs() { return table().refs(); } + @Override +

Re: [PR] Prevent dropping last column. [iceberg]

2023-10-11 Thread via GitHub
RussellSpitzer commented on PR #8523: URL: https://github.com/apache/iceberg/pull/8523#issuecomment-1758393552 Say I have a code that takes data from a streaming source with a changing schema. In one delta the source has removed the last column, I see always check for the delta and apply it

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
rdblue commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355662817 ## core/src/main/java/org/apache/iceberg/BaseMetadataTable.java: ## @@ -199,6 +199,11 @@ public Map refs() { return table().refs(); } + @Override + public St

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
rdblue commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355661550 ## api/src/main/java/org/apache/iceberg/Table.java: ## @@ -333,6 +333,15 @@ default UpdateStatistics updateStatistics() { */ Map refs(); + /** + * Returns th

Re: [PR] Add method and property around sequence-numbers [iceberg-python]

2023-10-11 Thread via GitHub
Fokko commented on code in PR #60: URL: https://github.com/apache/iceberg-python/pull/60#discussion_r1355660181 ## pyiceberg/table/__init__.py: ## @@ -529,6 +529,13 @@ def location(self) -> str: """Return the table's base location.""" return self.metadata.locat

Re: [PR] Fix column rename doc example to reflect correct API [iceberg-python]

2023-10-11 Thread via GitHub
rdblue commented on PR #59: URL: https://github.com/apache/iceberg-python/pull/59#issuecomment-1758386526 Thanks, @cabhishek! I'll merge when tests are green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add method and property around sequence-numbers [iceberg-python]

2023-10-11 Thread via GitHub
rdblue commented on code in PR #60: URL: https://github.com/apache/iceberg-python/pull/60#discussion_r1355656751 ## pyiceberg/table/__init__.py: ## @@ -529,6 +529,13 @@ def location(self) -> str: """Return the table's base location.""" return self.metadata.loca

Re: [PR] Thread.sleep() method is replaced with Awaitility [iceberg]

2023-10-11 Thread via GitHub
nk1506 commented on code in PR #8725: URL: https://github.com/apache/iceberg/pull/8725#discussion_r1355627816 ## flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java: ## @@ -113,7 +114,7 @@ public void testConsumeWithoutStartSnapshotI

Re: [PR] Core: Widen exceptions ignored while deleting files in RollingFileWriter [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #8597: URL: https://github.com/apache/iceberg/pull/8597#discussion_r1355502745 ## core/src/main/java/org/apache/iceberg/io/RollingFileWriter.java: ## @@ -127,9 +130,8 @@ private void closeCurrentWriter() { if (currentFileRows == 0L

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-11 Thread via GitHub
Fokko commented on PR #8683: URL: https://github.com/apache/iceberg/pull/8683#issuecomment-1758166192 @jacobmarble Yes, this looks good to me. Thanks for joining the community sync. Could you avoid reformatting the table? History is especially important in the spec. -- This is an automat

Re: [PR] Prevent dropping last column. [iceberg]

2023-10-11 Thread via GitHub
rafoid commented on PR #8523: URL: https://github.com/apache/iceberg/pull/8523#issuecomment-1758164003 @RussellSpitzer I'm not sure I understand your scenario with incoming schema deltas. Can you please elaborate? -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Core: Support view metadata compression [iceberg]

2023-10-11 Thread via GitHub
rdblue commented on PR #8552: URL: https://github.com/apache/iceberg/pull/8552#issuecomment-1758159463 Thanks, @nastra! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Core: Support view metadata compression [iceberg]

2023-10-11 Thread via GitHub
rdblue merged PR #8552: URL: https://github.com/apache/iceberg/pull/8552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Core: Include summary without 'operation' when comparing view versions [iceberg]

2023-10-11 Thread via GitHub
rdblue commented on PR #8678: URL: https://github.com/apache/iceberg/pull/8678#issuecomment-1758155921 I think we may want to consider an alternative, which is to get rid of `operation` entirely. What is the value of `operation`? In a table, we use the operation to make assumptions about me

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-11 Thread via GitHub
jacobmarble commented on PR #8683: URL: https://github.com/apache/iceberg/pull/8683#issuecomment-1758143965 In the Iceberg community sync today, it was decided that we will not be adding millisecond timestamps after all. Current status of this PR: - millisecond timestamps removed

Re: [PR] Spec: Add partition stats spec [iceberg]

2023-10-11 Thread via GitHub
ajantha-bhat commented on PR #7105: URL: https://github.com/apache/iceberg/pull/7105#issuecomment-1758143237 > Sorry, we didn't get to discussing this during the sync. Shall we do a separate sync to talk about this? This is not the first time this happened. From past few community syn

Re: [PR] Spec: Add partition stats spec [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on PR #7105: URL: https://github.com/apache/iceberg/pull/7105#issuecomment-1758136834 @aokolnychyi sure, no problem to have a specific meeting about that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Spec: Add partition stats spec [iceberg]

2023-10-11 Thread via GitHub
aokolnychyi commented on PR #7105: URL: https://github.com/apache/iceberg/pull/7105#issuecomment-1758132685 Sorry, we didn't get to discussing this during the sync. Shall we do a separate sync to talk about this? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Kafka Connect: Initial project setup and event data structures [iceberg]

2023-10-11 Thread via GitHub
ajantha-bhat commented on code in PR #8701: URL: https://github.com/apache/iceberg/pull/8701#discussion_r1355382194 ## kafka-connect/kafka-connect-events/src/main/java/org/apache/iceberg/connect/events/CommitCompletePayload.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apach

Re: [PR] Kafka Connect: Initial project setup and event data structures [iceberg]

2023-10-11 Thread via GitHub
ajantha-bhat commented on code in PR #8701: URL: https://github.com/apache/iceberg/pull/8701#discussion_r1355276522 ## kafka-connect/kafka-connect-events/src/main/java/org/apache/iceberg/connect/events/CommitCompletePayload.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apach

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355360614 ## api/src/main/java/org/apache/iceberg/Table.java: ## @@ -333,6 +333,15 @@ default UpdateStatistics updateStatistics() { */ Map refs(); + /** + *

Re: [I] v1 table data file spec id is None [iceberg-python]

2023-10-11 Thread via GitHub
Fokko commented on issue #46: URL: https://github.com/apache/iceberg-python/issues/46#issuecomment-1758086469 @puchengy Sorry for not replying. I think we can include this in the next release, it shouldn't be too hard to carry this information from the manifest-list -- This is an automat

Re: [I] v1 table data file spec id is None [iceberg-python]

2023-10-11 Thread via GitHub
puchengy commented on issue #46: URL: https://github.com/apache/iceberg-python/issues/46#issuecomment-1758067827 @Fokko do you know? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Disable merge commit [iceberg-go]

2023-10-11 Thread via GitHub
Fokko opened a new pull request, #17: URL: https://github.com/apache/iceberg-go/pull/17 In the other Iceberg repositories, the merge commit has been disabled to avoid non-linear history. What are your thoughts on setting this on `iceberg-go` as well to maintain some level of consistency?

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758034473 Huh that is strange if you are hitting this log line and have CoW defined, I don't see how that can be possible. @atifiu do you mind creating a new issue and include these detail

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1758010815 I am pretty sure that I don't have any delete files because I have defined copy on write for update, merge, delete. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
RussellSpitzer commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757995661 You would need to remove all delete files from the snapshot. I think this currently requires a rewrite data files + rewrite delete files -- This is an automated message from the

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757993486 @RussellSpitzer What can be done to resolve it ? Is rewriting the data file will resolve it ? -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-11 Thread via GitHub
zeroshade commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1355226646 ## table/metadata.go: ## @@ -0,0 +1,401 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-11 Thread via GitHub
zeroshade commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1355188292 ## table/table.go: ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-11 Thread via GitHub
zeroshade commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1355179485 ## table/snapshots_test.go: ## @@ -0,0 +1,115 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NO

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-11 Thread via GitHub
zeroshade commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1355178227 ## table/metadata.go: ## @@ -0,0 +1,401 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-11 Thread via GitHub
zeroshade commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1355176342 ## table/metadata.go: ## @@ -0,0 +1,401 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-11 Thread via GitHub
zeroshade commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1355167503 ## table/metadata.go: ## @@ -0,0 +1,401 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE f

Re: [PR] Rename master branch to main [iceberg]

2023-10-11 Thread via GitHub
ajantha-bhat commented on PR #8722: URL: https://github.com/apache/iceberg/pull/8722#issuecomment-1757880953 @jbonofre: LGTM. Thanks for fixing the formatting too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Flink: Emit watermarks from the IcebergSource [iceberg]

2023-10-11 Thread via GitHub
pvary commented on PR #8553: URL: https://github.com/apache/iceberg/pull/8553#issuecomment-1757872875 Created #8803 to have the possibility to avoid keeping all of the stats when creating the ScanTasks -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Nessie: Remove deprecated usage of Operation.Put.of() [iceberg]

2023-10-11 Thread via GitHub
ajantha-bhat commented on PR #8796: URL: https://github.com/apache/iceberg/pull/8796#issuecomment-1757870092 > Please expand the commit message for the rationale to be available in git log (without having to access GH). Also, please reference https://github.com/projectnessie/nessie/pull/643

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
RussellSpitzer commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757857480 @atifiu Pushdown cannot happen if there are row level deletes as indicated in that log line. Row level deletes mean the file statistics are not accurate so they cannot be used for

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-11 Thread via GitHub
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1757848584 @huaxingao Thanks for your response. Even in the case of max on non filter column, aggregate pushdown is not working. In the below explain plan partition is defined on initial_page_v

Re: [PR] Build: Add note about running tests/itests on MacOS [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on PR #8766: URL: https://github.com/apache/iceberg/pull/8766#issuecomment-1757845688 @nastra are you OK to merge this one ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] How can I quickly insert data into an iceberg table in a Python environment? [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on issue #8801: URL: https://github.com/apache/iceberg/issues/8801#issuecomment-1757843273 I guess there are a few different topics here: > Is it too slow to import data into an iceberg table that exists in minio through presto. Slow or fast depends o

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355122364 ## core/src/main/java/org/apache/iceberg/BaseTransaction.java: ## @@ -770,6 +770,11 @@ public Map refs() { return current.refs(); } +@Override +p

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355115550 ## api/src/main/java/org/apache/iceberg/Table.java: ## @@ -333,6 +333,15 @@ default UpdateStatistics updateStatistics() { */ Map refs(); + /** + * Returns

Re: [I] How to read data in the order in which files are commited? [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on issue #8802: URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1757829625 Thanks @pvary, I have a maybe naive question @MarsKT > want the data read from iceberg to be in the same order every time. Can I ask what's driving this need?

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
RussellSpitzer commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355114035 ## api/src/main/java/org/apache/iceberg/Table.java: ## @@ -333,6 +333,15 @@ default UpdateStatistics updateStatistics() { */ Map refs(); + /** + * Re

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
RussellSpitzer commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355112214 ## api/src/main/java/org/apache/iceberg/Table.java: ## @@ -333,6 +333,15 @@ default UpdateStatistics updateStatistics() { */ Map refs(); + /** + * Re

Re: [PR] Rename master branch to main [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on PR #8722: URL: https://github.com/apache/iceberg/pull/8722#issuecomment-1757827191 As soon as @ajantha-bhat vote +1 on this PR, I will create the ticket at Apache INFRA to trigger the rename. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Rename master branch to main [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on PR #8722: URL: https://github.com/apache/iceberg/pull/8722#issuecomment-1757826172 @ajantha-bhat I did a rebase to remove the python related resources, and the "formatting" should be OK now. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Rename master branch to main [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on code in PR #8722: URL: https://github.com/apache/iceberg/pull/8722#discussion_r1355110756 ## .github/workflows/python-release.yml: ## @@ -51,7 +51,7 @@ jobs: - name: Set version run: python -m poetry version "${{ inputs.version }}"

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on PR #8800: URL: https://github.com/apache/iceberg/pull/8800#issuecomment-1757821683 > this LGTM, but I think there should be a check in CatalogTests#testCompleteCreateTable() that makes sure the UUID on the table is set Sure thing, I can add a test for th

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355106715 ## core/src/main/java/org/apache/iceberg/BaseTransaction.java: ## @@ -770,6 +770,11 @@ public Map refs() { return current.refs(); } +@Overrid

Re: [PR] Nessie: Adopt to Nessie 0.71.1 release [iceberg]

2023-10-11 Thread via GitHub
ajantha-bhat commented on code in PR #8798: URL: https://github.com/apache/iceberg/pull/8798#discussion_r1355092633 ## nessie/src/test/java/org/apache/iceberg/nessie/TestCustomNessieClient.java: ## @@ -78,30 +77,11 @@ public void testNonExistentCustomClient() {

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-11 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1355087396 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/enumerator/TestContinuousSplitPlannerImpl.java: ## @@ -533,6 +533,116 @@ public void testMaxPlanningSna

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2023-10-11 Thread via GitHub
zinking commented on code in PR #8797: URL: https://github.com/apache/iceberg/pull/8797#discussion_r1355083626 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -146,13 +148,31 @@ public RewriteDataFilesSparkAction filter(

Re: [I] Python write support [iceberg-python]

2023-10-11 Thread via GitHub
Fokko commented on issue #23: URL: https://github.com/apache/iceberg-python/issues/23#issuecomment-1757790524 @mgmarino Writing is part of https://github.com/apache/iceberg-python/pull/41 👍 Writing the parquet file is actually quite trivial. -- This is an automated message from the Apac

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2023-10-11 Thread via GitHub
zinking commented on code in PR #8797: URL: https://github.com/apache/iceberg/pull/8797#discussion_r1355078213 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java: ## @@ -109,6 +110,8 @@ public InternalRow[] call(InternalRow args)

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-11 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1355072299 ## .palantir/revapi.yml: ## @@ -866,6 +866,15 @@ acceptedBreaks: old: "method void org.apache.iceberg.encryption.Ciphers::()" new: "method void org.apache.i

Re: [I] Python write support [iceberg-python]

2023-10-11 Thread via GitHub
mgmarino commented on issue #23: URL: https://github.com/apache/iceberg-python/issues/23#issuecomment-1757763297 It looks like this is partially done (🎉), but writing to parquet files is not yet supported even though the associated ticket is closed? Am I reading this correctly? Thanks! -

Re: [PR] Nessie: From Nessie - 0.71.0, use custom client builder name instead of custom client builder class name. [iceberg]

2023-10-11 Thread via GitHub
nk1506 commented on code in PR #8798: URL: https://github.com/apache/iceberg/pull/8798#discussion_r1355060956 ## nessie/src/test/java/org/apache/iceberg/nessie/TestCustomNessieClient.java: ## @@ -78,30 +77,11 @@ public void testNonExistentCustomClient() {

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
jbonofre commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1355045037 ## core/src/main/java/org/apache/iceberg/BaseTransaction.java: ## @@ -770,6 +770,11 @@ public Map refs() { return current.refs(); } +@Override +p

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-11 Thread via GitHub
nastra commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1355022285 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkDataFile.java: ## @@ -191,6 +192,11 @@ public DataFile copy() { throw new UnsupportedOperationExcept

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-11 Thread via GitHub
nastra commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1355020627 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/source/enumerator/TestContinuousSplitPlannerImpl.java: ## @@ -533,6 +533,116 @@ public void testMaxPlanningSn

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-11 Thread via GitHub
nastra commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1355016941 ## .palantir/revapi.yml: ## @@ -866,6 +866,15 @@ acceptedBreaks: old: "method void org.apache.iceberg.encryption.Ciphers::()" new: "method void org.apache.

Re: [PR] WIP: Write support [iceberg-python]

2023-10-11 Thread via GitHub
samplec0de commented on PR #41: URL: https://github.com/apache/iceberg-python/pull/41#issuecomment-1757687913 Very relevant! I'm looking forward to it, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] API, Core: Add UUID API to Table [iceberg]

2023-10-11 Thread via GitHub
nastra commented on code in PR #8800: URL: https://github.com/apache/iceberg/pull/8800#discussion_r1354909112 ## core/src/main/java/org/apache/iceberg/BaseTransaction.java: ## @@ -770,6 +770,11 @@ public Map refs() { return current.refs(); } +@Override +pub

Re: [I] How to read data in the order in which files are commited? [iceberg]

2023-10-11 Thread via GitHub
pvary commented on issue #8802: URL: https://github.com/apache/iceberg/issues/8802#issuecomment-1757603105 Currently there is no way to order the scan task. The planning side specifically makes sure that even the planning could be done by parallel threads (reading manifests files parallel)

[PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-11 Thread via GitHub
pvary opened a new pull request, #8803: URL: https://github.com/apache/iceberg/pull/8803 Based on our discussion on the dev list, I have created the PR which makes possible to narrow down the retained column statistics in the `ScanTask` returned from planning. For reference the discu

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2023-10-11 Thread via GitHub
rakesh-das08 commented on code in PR #8797: URL: https://github.com/apache/iceberg/pull/8797#discussion_r1354860039 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java: ## @@ -109,6 +110,8 @@ public InternalRow[] call(InternalRow

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2023-10-11 Thread via GitHub
rakesh-das08 commented on code in PR #8797: URL: https://github.com/apache/iceberg/pull/8797#discussion_r1354860039 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java: ## @@ -109,6 +110,8 @@ public InternalRow[] call(InternalRow

Re: [PR] Add method and property around sequence-numbers [iceberg-python]

2023-10-11 Thread via GitHub
Fokko merged PR #60: URL: https://github.com/apache/iceberg-python/pull/60 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Add method and property around sequence-numbers [iceberg-python]

2023-10-11 Thread via GitHub
Fokko commented on PR #60: URL: https://github.com/apache/iceberg-python/pull/60#issuecomment-1757531574 @amogh-jahagirdar thanks for the review, appreciate it. I'm adding them once we start using them, but if you see anything missing that would be useful on its own, feel free to rais

  1   2   >