Re: [I] Kafka Connect: Record projection Index out of bounds error [iceberg]

2024-09-08 Thread via GitHub


ismailsimsek commented on issue #11099:
URL: https://github.com/apache/iceberg/issues/11099#issuecomment-2336596778

   Just found the issue! it was the issue when generating 
`GenericAppenderFactory` in which full table schema was given, instead of key 
schema
   
   
https://github.com/tabular-io/iceberg-kafka-connect/blob/595f835f5d9174e57660b12f407dabc84781e500/kafka-connect/src/test/java/io/tabular/iceberg/connect/data2/IcebergUtil.java#L96-L103


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.27.30 to 1.27.33 [iceberg-go]

2024-09-08 Thread via GitHub


dependabot[bot] opened a new pull request, #140:
URL: https://github.com/apache/iceberg-go/pull/140

   Bumps 
[github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) 
from 1.27.30 to 1.27.33.
   
   Commits
   
   https://github.com/aws/aws-sdk-go-v2/commit/f1d71c59a1499981873f70db8c92d07242779670";>f1d71c5
 Release 2024-09-04
   https://github.com/aws/aws-sdk-go-v2/commit/e4813e114f3c2ce8db07624f656073440adc1140";>e4813e1
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/0e8bb90d8c0d03824fc8bf8656bd2de6346e48f8";>0e8bb90
 Update partitions file
   https://github.com/aws/aws-sdk-go-v2/commit/6a5875b13d01e8270c1cbdfae0c63ee479386917";>6a5875b
 Update endpoints model
   https://github.com/aws/aws-sdk-go-v2/commit/f7030de42b2a92ad4f66b153ae4d88ad28a6b2fe";>f7030de
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/a2b751d1ba71f59175a41f9cae5f159f1044360f";>a2b751d
 Release 2024-09-03
   https://github.com/aws/aws-sdk-go-v2/commit/e22c2495bd38232c640776ef3c1a84fb3145a8b9";>e22c249
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/ff0cf6fbfa7c7a12388606d5568135f2beae6599";>ff0cf6f
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/3120376762853b3098fda7e9217fb39fe1bf5105";>3120376
 refactoring of buildQuery to accept a list of maintained headers to l… (https://redirect.github.com/aws/aws-sdk-go-v2/issues/2773";>#2773)
   https://github.com/aws/aws-sdk-go-v2/commit/4ed838eab2a963cb16301501c8b8c3e29dac4c20";>4ed838e
 Merge pull request https://redirect.github.com/aws/aws-sdk-go-v2/issues/2768";>#2768 from 
bhavya2109sharma/presignedurl-requestpayer-change
   Additional commits viewable in https://github.com/aws/aws-sdk-go-v2/compare/config/v1.27.30...config/v1.27.33";>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/aws/aws-sdk-go-v2/config&package-manager=go_modules&previous-version=1.27.30&new-version=1.27.33)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.30.4 to 1.30.5 [iceberg-go]

2024-09-08 Thread via GitHub


dependabot[bot] opened a new pull request, #141:
URL: https://github.com/apache/iceberg-go/pull/141

   Bumps [github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2) 
from 1.30.4 to 1.30.5.
   
   Commits
   
   https://github.com/aws/aws-sdk-go-v2/commit/a2b751d1ba71f59175a41f9cae5f159f1044360f";>a2b751d
 Release 2024-09-03
   https://github.com/aws/aws-sdk-go-v2/commit/e22c2495bd38232c640776ef3c1a84fb3145a8b9";>e22c249
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/ff0cf6fbfa7c7a12388606d5568135f2beae6599";>ff0cf6f
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/3120376762853b3098fda7e9217fb39fe1bf5105";>3120376
 refactoring of buildQuery to accept a list of maintained headers to l… (https://redirect.github.com/aws/aws-sdk-go-v2/issues/2773";>#2773)
   https://github.com/aws/aws-sdk-go-v2/commit/4ed838eab2a963cb16301501c8b8c3e29dac4c20";>4ed838e
 Merge pull request https://redirect.github.com/aws/aws-sdk-go-v2/issues/2768";>#2768 from 
bhavya2109sharma/presignedurl-requestpayer-change
   https://github.com/aws/aws-sdk-go-v2/commit/d4bd42fdc82c2954f429bd9eeac3096f5590eaac";>d4bd42f
 Merge branch 'main' into presignedurl-requestpayer-change
   https://github.com/aws/aws-sdk-go-v2/commit/0353706229a89749afa93324432ece53da37822b";>0353706
 Added Changelog
   https://github.com/aws/aws-sdk-go-v2/commit/97e2d3fec04bc8bfd69eaf22ce476b12f8673810";>97e2d3f
 Release 2024-08-30
   https://github.com/aws/aws-sdk-go-v2/commit/4cca52b4a81df57e91b1e5d0a65fd6df89606d02";>4cca52b
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/c8a5146734eb1e6d59acba18b5967c78dc7a5e42";>c8a5146
 Update endpoints model
   Additional commits viewable in https://github.com/aws/aws-sdk-go-v2/compare/v1.30.4...v1.30.5";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/aws/aws-sdk-go-v2&package-manager=go_modules&previous-version=1.30.4&new-version=1.30.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.29 to 1.17.32 [iceberg-go]

2024-09-08 Thread via GitHub


dependabot[bot] opened a new pull request, #143:
URL: https://github.com/apache/iceberg-go/pull/143

   Bumps 
[github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2)
 from 1.17.29 to 1.17.32.
   
   Commits
   
   https://github.com/aws/aws-sdk-go-v2/commit/f1d71c59a1499981873f70db8c92d07242779670";>f1d71c5
 Release 2024-09-04
   https://github.com/aws/aws-sdk-go-v2/commit/e4813e114f3c2ce8db07624f656073440adc1140";>e4813e1
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/0e8bb90d8c0d03824fc8bf8656bd2de6346e48f8";>0e8bb90
 Update partitions file
   https://github.com/aws/aws-sdk-go-v2/commit/6a5875b13d01e8270c1cbdfae0c63ee479386917";>6a5875b
 Update endpoints model
   https://github.com/aws/aws-sdk-go-v2/commit/f7030de42b2a92ad4f66b153ae4d88ad28a6b2fe";>f7030de
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/a2b751d1ba71f59175a41f9cae5f159f1044360f";>a2b751d
 Release 2024-09-03
   https://github.com/aws/aws-sdk-go-v2/commit/e22c2495bd38232c640776ef3c1a84fb3145a8b9";>e22c249
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/ff0cf6fbfa7c7a12388606d5568135f2beae6599";>ff0cf6f
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/3120376762853b3098fda7e9217fb39fe1bf5105";>3120376
 refactoring of buildQuery to accept a list of maintained headers to l… (https://redirect.github.com/aws/aws-sdk-go-v2/issues/2773";>#2773)
   https://github.com/aws/aws-sdk-go-v2/commit/4ed838eab2a963cb16301501c8b8c3e29dac4c20";>4ed838e
 Merge pull request https://redirect.github.com/aws/aws-sdk-go-v2/issues/2768";>#2768 from 
bhavya2109sharma/presignedurl-requestpayer-change
   Additional commits viewable in https://github.com/aws/aws-sdk-go-v2/compare/credentials/v1.17.29...credentials/v1.17.32";>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/aws/aws-sdk-go-v2/credentials&package-manager=go_modules&previous-version=1.17.29&new-version=1.17.32)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.29 to 1.17.30 [iceberg-go]

2024-09-08 Thread via GitHub


dependabot[bot] closed pull request #138: build(deps): bump 
github.com/aws/aws-sdk-go-v2/credentials from 1.17.29 to 1.17.30
URL: https://github.com/apache/iceberg-go/pull/138


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.29 to 1.17.30 [iceberg-go]

2024-09-08 Thread via GitHub


dependabot[bot] commented on PR #138:
URL: https://github.com/apache/iceberg-go/pull/138#issuecomment-2336599806

   Superseded by #143.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] feat: SortOrder methods should take schema ref if possible [iceberg-rust]

2024-09-08 Thread via GitHub


c-thiel opened a new pull request, #613:
URL: https://github.com/apache/iceberg-rust/pull/613

   Reduce clones
   
   I also added two tests that didn't fail but where missing IMO


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Feat: Reassign field ids for schema [iceberg-rust]

2024-09-08 Thread via GitHub


c-thiel opened a new pull request, #615:
URL: https://github.com/apache/iceberg-rust/pull/615

   Required for `TableMetadataBuilder` 
(https://github.com/apache/iceberg-rust/pull/587).
   
   When new TableMetadata is created, field ids should be reassign to ensure 
that field numbering is normalized and started from 0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] TableMetadataBuilder [iceberg-rust]

2024-09-08 Thread via GitHub


c-thiel commented on PR #587:
URL: https://github.com/apache/iceberg-rust/pull/587#issuecomment-2336618257

   @liurenjie1024 I tried to cut a few things out - but not along the lines of 
`TalbeUpdate`. I hope that's OK?
   
   1. https://github.com/apache/iceberg-rust/pull/611
   2. https://github.com/apache/iceberg-rust/pull/612
   3. https://github.com/apache/iceberg-rust/pull/613
   4. https://github.com/apache/iceberg-rust/pull/614
   5. https://github.com/apache/iceberg-rust/pull/615
   
   After they are all merged, I'll rebase this PR for the actual builder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat (datafusion integration): convert datafusion expr filters to Iceberg Predicate [iceberg-rust]

2024-09-08 Thread via GitHub


a-agmon commented on code in PR #588:
URL: https://github.com/apache/iceberg-rust/pull/588#discussion_r1749162764


##
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs:
##
@@ -0,0 +1,312 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::VecDeque;
+
+use datafusion::common::tree_node::{TreeNodeRecursion, TreeNodeVisitor};
+use datafusion::common::Column;
+use datafusion::error::DataFusionError;
+use datafusion::logical_expr::{Expr, Operator};
+use datafusion::scalar::ScalarValue;
+use iceberg::expr::{Predicate, Reference};
+use iceberg::spec::Datum;
+
+pub struct ExprToPredicateVisitor {
+stack: VecDeque>,
+}
+impl ExprToPredicateVisitor {
+/// Create a new predicate conversion visitor.
+pub fn new() -> Self {
+Self {
+stack: VecDeque::new(),
+}
+}
+/// Get the predicate from the stack.
+pub fn get_predicate(&self) -> Option {
+self.stack
+.iter()
+.filter_map(|opt| opt.clone())
+.reduce(Predicate::and)
+}
+
+/// Convert a column expression to an iceberg predicate.
+fn convert_column_expr(
+&self,
+col: &Column,
+op: &Operator,
+lit: &ScalarValue,
+) -> Option {
+let reference = Reference::new(col.name.clone());
+let datum = scalar_value_to_datum(lit)?;
+Some(binary_op_to_predicate(reference, op, datum))
+}
+
+/// Convert a compound expression to an iceberg predicate.
+///
+/// The strategy is to support the following cases:
+/// - if its an AND expression then the result will be the valid 
predicates, whether there are 2 or just 1
+/// - if its an OR expression then a predicate will be returned only if 
there are 2 valid predicates on both sides
+fn convert_compound_expr(&self, valid_preds: &[Predicate], op: &Operator) 
-> Option {
+let valid_preds_count = valid_preds.len();
+match (op, valid_preds_count) {
+(Operator::And, 1) => valid_preds.first().cloned(),
+(Operator::And, 2) => Some(Predicate::and(
+valid_preds[0].clone(),
+valid_preds[1].clone(),
+)),
+(Operator::Or, 2) => Some(Predicate::or(
+valid_preds[0].clone(),
+valid_preds[1].clone(),
+)),
+_ => None,
+}
+}
+}
+
+// Implement TreeNodeVisitor for ExprToPredicateVisitor
+impl<'n> TreeNodeVisitor<'n> for ExprToPredicateVisitor {
+type Node = Expr;
+
+fn f_down(&mut self, _node: &'n Self::Node) -> Result {
+Ok(TreeNodeRecursion::Continue)
+}
+
+fn f_up(&mut self, node: &'n Self::Node) -> Result {
+if let Expr::BinaryExpr(binary) = node {
+match (&*binary.left, &binary.op, &*binary.right) {
+// process simple expressions (involving a column, operator 
and literal)
+(Expr::Column(col), op, Expr::Literal(lit)) => {

Review Comment:
   Thanks for the comment @liurenjie1024 
   I have added support for both "directions" of binary statement: `X > 1` and 
`1 < X`.
   But I think there is something more to be clarified: 
   Both `x>1` and `1 1` is processed then there are 3 main leafs 
to handle:
   
   ```
   Column(Column { relation: Some(Bare { table: "my_table" }), name: "foo" })
   ```
   This ^  will be ignored as its not a binary expr (its a column expr)
   
   ```
   Literal(Int64(1))
   ```
   This ^  will also be ignored as its not a binary expr (its a literal expr)
   
   ```
   BinaryExpr(BinaryExpr { left: Column(Column { relation: Some(Bare { table: 
"my_table" }), name: "foo" }), op: Gt, right: Literal(Int64(1)) })
   ```
   This binary expr will be captured, deconstructed, and processed by this 
match arm 
   ```
 match (&*binary.left, &binary.op, &*binary.right) {
   (Expr::Column(col), op, Expr::Literal(lit)) => {
 ...
   }
   ```
   
   Therefore, in order to support `a>6` we dont need to capture `Expr:Column`, 
only the binary statement that includes it. 
   On the other hand, I dont think we want to be too flexible because there are 
e

Re: [PR] feat (datafusion integration): convert datafusion expr filters to Iceberg Predicate [iceberg-rust]

2024-09-08 Thread via GitHub


a-agmon commented on code in PR #588:
URL: https://github.com/apache/iceberg-rust/pull/588#discussion_r1749162764


##
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs:
##
@@ -0,0 +1,312 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::VecDeque;
+
+use datafusion::common::tree_node::{TreeNodeRecursion, TreeNodeVisitor};
+use datafusion::common::Column;
+use datafusion::error::DataFusionError;
+use datafusion::logical_expr::{Expr, Operator};
+use datafusion::scalar::ScalarValue;
+use iceberg::expr::{Predicate, Reference};
+use iceberg::spec::Datum;
+
+pub struct ExprToPredicateVisitor {
+stack: VecDeque>,
+}
+impl ExprToPredicateVisitor {
+/// Create a new predicate conversion visitor.
+pub fn new() -> Self {
+Self {
+stack: VecDeque::new(),
+}
+}
+/// Get the predicate from the stack.
+pub fn get_predicate(&self) -> Option {
+self.stack
+.iter()
+.filter_map(|opt| opt.clone())
+.reduce(Predicate::and)
+}
+
+/// Convert a column expression to an iceberg predicate.
+fn convert_column_expr(
+&self,
+col: &Column,
+op: &Operator,
+lit: &ScalarValue,
+) -> Option {
+let reference = Reference::new(col.name.clone());
+let datum = scalar_value_to_datum(lit)?;
+Some(binary_op_to_predicate(reference, op, datum))
+}
+
+/// Convert a compound expression to an iceberg predicate.
+///
+/// The strategy is to support the following cases:
+/// - if its an AND expression then the result will be the valid 
predicates, whether there are 2 or just 1
+/// - if its an OR expression then a predicate will be returned only if 
there are 2 valid predicates on both sides
+fn convert_compound_expr(&self, valid_preds: &[Predicate], op: &Operator) 
-> Option {
+let valid_preds_count = valid_preds.len();
+match (op, valid_preds_count) {
+(Operator::And, 1) => valid_preds.first().cloned(),
+(Operator::And, 2) => Some(Predicate::and(
+valid_preds[0].clone(),
+valid_preds[1].clone(),
+)),
+(Operator::Or, 2) => Some(Predicate::or(
+valid_preds[0].clone(),
+valid_preds[1].clone(),
+)),
+_ => None,
+}
+}
+}
+
+// Implement TreeNodeVisitor for ExprToPredicateVisitor
+impl<'n> TreeNodeVisitor<'n> for ExprToPredicateVisitor {
+type Node = Expr;
+
+fn f_down(&mut self, _node: &'n Self::Node) -> Result {
+Ok(TreeNodeRecursion::Continue)
+}
+
+fn f_up(&mut self, node: &'n Self::Node) -> Result {
+if let Expr::BinaryExpr(binary) = node {
+match (&*binary.left, &binary.op, &*binary.right) {
+// process simple expressions (involving a column, operator 
and literal)
+(Expr::Column(col), op, Expr::Literal(lit)) => {

Review Comment:
   Thanks for the comment @liurenjie1024 
   I have added support for both "directions" of binary statement: `X > 1` and 
`1 < X`.
   But I think there is something more to be clarified: 
   Both `x>1` and `1 1` is processed then there are 3 main leafs 
to handle:
   
   ```
   1 - Column(Column { relation: Some(Bare { table: "my_table" }), name: "foo" 
})
   ```
   This ^  will be ignored as its not a binary expr (its a column expr)
   
   ```
   2 - Literal(Int64(1))
   ```
   This ^  will also be ignored as its not a binary expr (its a literal expr)
   
   ```
   3 - BinaryExpr(BinaryExpr { left: Column(Column { relation: Some(Bare { 
table: "my_table" }), name: "foo" }), op: Gt, right: Literal(Int64(1)) })
   ```
   This binary expr will be captured, deconstructed, and processed by this 
match arm 
   ```
 match (&*binary.left, &binary.op, &*binary.right) {
   (Expr::Column(col), op, Expr::Literal(lit)) => {
 ...
   }
   ```
   
   Therefore, in order to support `a>6` we dont need to capture `Expr:Column`, 
only the binary statement that includes it. 
   On the other hand, I dont think we want to be too flexible because

Re: [PR] feat (datafusion integration): convert datafusion expr filters to Iceberg Predicate [iceberg-rust]

2024-09-08 Thread via GitHub


a-agmon commented on code in PR #588:
URL: https://github.com/apache/iceberg-rust/pull/588#discussion_r1749163419


##
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs:
##
@@ -0,0 +1,312 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::VecDeque;
+
+use datafusion::common::tree_node::{TreeNodeRecursion, TreeNodeVisitor};
+use datafusion::common::Column;
+use datafusion::error::DataFusionError;
+use datafusion::logical_expr::{Expr, Operator};
+use datafusion::scalar::ScalarValue;
+use iceberg::expr::{Predicate, Reference};
+use iceberg::spec::Datum;
+
+pub struct ExprToPredicateVisitor {
+stack: VecDeque>,
+}
+impl ExprToPredicateVisitor {
+/// Create a new predicate conversion visitor.
+pub fn new() -> Self {
+Self {
+stack: VecDeque::new(),
+}
+}
+/// Get the predicate from the stack.
+pub fn get_predicate(&self) -> Option {
+self.stack
+.iter()
+.filter_map(|opt| opt.clone())
+.reduce(Predicate::and)
+}
+
+/// Convert a column expression to an iceberg predicate.
+fn convert_column_expr(
+&self,
+col: &Column,
+op: &Operator,
+lit: &ScalarValue,
+) -> Option {
+let reference = Reference::new(col.name.clone());
+let datum = scalar_value_to_datum(lit)?;
+Some(binary_op_to_predicate(reference, op, datum))
+}
+
+/// Convert a compound expression to an iceberg predicate.
+///
+/// The strategy is to support the following cases:
+/// - if its an AND expression then the result will be the valid 
predicates, whether there are 2 or just 1
+/// - if its an OR expression then a predicate will be returned only if 
there are 2 valid predicates on both sides
+fn convert_compound_expr(&self, valid_preds: &[Predicate], op: &Operator) 
-> Option {
+let valid_preds_count = valid_preds.len();
+match (op, valid_preds_count) {
+(Operator::And, 1) => valid_preds.first().cloned(),

Review Comment:
   I basically designed the conversion of filters to be greedy, which means we 
need to try and convert as many filters as we can in order to scan less files. 
   suppose we have a case like this: 
   ```
   SELECT * FROM T WHERE A > 10 AND REGEXP_LIKE(name, '[a-z]A[a-zA-Z]{2}');
   ```
   This will translate into an expr that looks something like this
   ```
   BinaryExpr(BinaryExpr AND RegExpExpr) 
   ```
   After we processed both leafs: `( BinaryExpr and RegExpExpr)`, we have on 
the stack just one valid expression. 
   Then we get to the parent: BinaryExpr(A AND B), we try to get the 2 child 
expressions from the stack, but in this case we get only one (`A > 10`) because 
the second one is not supported (`REGEXP_LIKE`). 
   In this case, because its a logical **AND** expr, we proceed with just the 
valid predicate. If it was an **OR** expr that was wrapping both then we would 
have failed the whole predicate. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat (datafusion integration): convert datafusion expr filters to Iceberg Predicate [iceberg-rust]

2024-09-08 Thread via GitHub


a-agmon commented on code in PR #588:
URL: https://github.com/apache/iceberg-rust/pull/588#discussion_r1749163419


##
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs:
##
@@ -0,0 +1,312 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::VecDeque;
+
+use datafusion::common::tree_node::{TreeNodeRecursion, TreeNodeVisitor};
+use datafusion::common::Column;
+use datafusion::error::DataFusionError;
+use datafusion::logical_expr::{Expr, Operator};
+use datafusion::scalar::ScalarValue;
+use iceberg::expr::{Predicate, Reference};
+use iceberg::spec::Datum;
+
+pub struct ExprToPredicateVisitor {
+stack: VecDeque>,
+}
+impl ExprToPredicateVisitor {
+/// Create a new predicate conversion visitor.
+pub fn new() -> Self {
+Self {
+stack: VecDeque::new(),
+}
+}
+/// Get the predicate from the stack.
+pub fn get_predicate(&self) -> Option {
+self.stack
+.iter()
+.filter_map(|opt| opt.clone())
+.reduce(Predicate::and)
+}
+
+/// Convert a column expression to an iceberg predicate.
+fn convert_column_expr(
+&self,
+col: &Column,
+op: &Operator,
+lit: &ScalarValue,
+) -> Option {
+let reference = Reference::new(col.name.clone());
+let datum = scalar_value_to_datum(lit)?;
+Some(binary_op_to_predicate(reference, op, datum))
+}
+
+/// Convert a compound expression to an iceberg predicate.
+///
+/// The strategy is to support the following cases:
+/// - if its an AND expression then the result will be the valid 
predicates, whether there are 2 or just 1
+/// - if its an OR expression then a predicate will be returned only if 
there are 2 valid predicates on both sides
+fn convert_compound_expr(&self, valid_preds: &[Predicate], op: &Operator) 
-> Option {
+let valid_preds_count = valid_preds.len();
+match (op, valid_preds_count) {
+(Operator::And, 1) => valid_preds.first().cloned(),

Review Comment:
   I basically designed the conversion of filters to be greedy, which means we 
need to try and convert as many filters as we can in order to scan less files. 
   suppose we have a case like this: 
   ```
   SELECT * FROM T WHERE A > 10 AND REGEXP_LIKE(name, '[a-z]A[a-zA-Z]{2}');
   ```
   This will translate into an expr that looks something like this
   ```
   BinaryExpr(BinaryExpr AND RegExpExpr) 
   ```
   After we processed both leafs: `( BinaryExpr and RegExpExpr)`, we have on 
the stack just one valid expression. 
   Then we get to the parent: `BinaryExpr(A AND B)`, we try to get the 2 child 
expressions from the stack, but in this case we get only one (`A > 10`) because 
the second one is not supported (`REGEXP_LIKE`). 
   In this case, because its a logical **AND** expr, we proceed with just the 
valid predicate. If it was an **OR** expr that was wrapping both then we would 
have failed the whole predicate. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat (datafusion integration): convert datafusion expr filters to Iceberg Predicate [iceberg-rust]

2024-09-08 Thread via GitHub


a-agmon commented on code in PR #588:
URL: https://github.com/apache/iceberg-rust/pull/588#discussion_r1749163419


##
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs:
##
@@ -0,0 +1,312 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::VecDeque;
+
+use datafusion::common::tree_node::{TreeNodeRecursion, TreeNodeVisitor};
+use datafusion::common::Column;
+use datafusion::error::DataFusionError;
+use datafusion::logical_expr::{Expr, Operator};
+use datafusion::scalar::ScalarValue;
+use iceberg::expr::{Predicate, Reference};
+use iceberg::spec::Datum;
+
+pub struct ExprToPredicateVisitor {
+stack: VecDeque>,
+}
+impl ExprToPredicateVisitor {
+/// Create a new predicate conversion visitor.
+pub fn new() -> Self {
+Self {
+stack: VecDeque::new(),
+}
+}
+/// Get the predicate from the stack.
+pub fn get_predicate(&self) -> Option {
+self.stack
+.iter()
+.filter_map(|opt| opt.clone())
+.reduce(Predicate::and)
+}
+
+/// Convert a column expression to an iceberg predicate.
+fn convert_column_expr(
+&self,
+col: &Column,
+op: &Operator,
+lit: &ScalarValue,
+) -> Option {
+let reference = Reference::new(col.name.clone());
+let datum = scalar_value_to_datum(lit)?;
+Some(binary_op_to_predicate(reference, op, datum))
+}
+
+/// Convert a compound expression to an iceberg predicate.
+///
+/// The strategy is to support the following cases:
+/// - if its an AND expression then the result will be the valid 
predicates, whether there are 2 or just 1
+/// - if its an OR expression then a predicate will be returned only if 
there are 2 valid predicates on both sides
+fn convert_compound_expr(&self, valid_preds: &[Predicate], op: &Operator) 
-> Option {
+let valid_preds_count = valid_preds.len();
+match (op, valid_preds_count) {
+(Operator::And, 1) => valid_preds.first().cloned(),

Review Comment:
   I basically designed the conversion of filters to be greedy, which means we 
need to try and convert as many filters as we can in order to scan less files. 
   suppose we have a case like this: 
   ```
   SELECT * FROM T WHERE A > 10 AND REGEXP_LIKE(name, '[a-z]A[a-zA-Z]{2}');
   ```
   This will translate into an expr that looks something like this
   ```
   BinaryExpr(BinaryExpr AND RegExpExpr) 
   ```
   After we processed both leafs: `( BinaryExpr and RegExpExpr)`, we have on 
the stack just one valid expression. 
   Then we get to the parent: `BinaryExpr(A AND B)`, we try to get the 2 child 
expressions from the stack, but in this case we get only one (`A > 10`) because 
the second one is not supported (`REGEXP_LIKE`). 
   In this case, because its a logical **AND** expr, we proceed with just one 
valid predicate. Thats the case you are highlighinting - cases where we process 
logical AND expr where just one side is valid. in this case  - we are 
processing with just one predicate.
If it was an **OR** expr that was wrapping both then we would have failed 
the whole predicate. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat (datafusion integration): convert datafusion expr filters to Iceberg Predicate [iceberg-rust]

2024-09-08 Thread via GitHub


a-agmon commented on code in PR #588:
URL: https://github.com/apache/iceberg-rust/pull/588#discussion_r1749163419


##
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs:
##
@@ -0,0 +1,312 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::VecDeque;
+
+use datafusion::common::tree_node::{TreeNodeRecursion, TreeNodeVisitor};
+use datafusion::common::Column;
+use datafusion::error::DataFusionError;
+use datafusion::logical_expr::{Expr, Operator};
+use datafusion::scalar::ScalarValue;
+use iceberg::expr::{Predicate, Reference};
+use iceberg::spec::Datum;
+
+pub struct ExprToPredicateVisitor {
+stack: VecDeque>,
+}
+impl ExprToPredicateVisitor {
+/// Create a new predicate conversion visitor.
+pub fn new() -> Self {
+Self {
+stack: VecDeque::new(),
+}
+}
+/// Get the predicate from the stack.
+pub fn get_predicate(&self) -> Option {
+self.stack
+.iter()
+.filter_map(|opt| opt.clone())
+.reduce(Predicate::and)
+}
+
+/// Convert a column expression to an iceberg predicate.
+fn convert_column_expr(
+&self,
+col: &Column,
+op: &Operator,
+lit: &ScalarValue,
+) -> Option {
+let reference = Reference::new(col.name.clone());
+let datum = scalar_value_to_datum(lit)?;
+Some(binary_op_to_predicate(reference, op, datum))
+}
+
+/// Convert a compound expression to an iceberg predicate.
+///
+/// The strategy is to support the following cases:
+/// - if its an AND expression then the result will be the valid 
predicates, whether there are 2 or just 1
+/// - if its an OR expression then a predicate will be returned only if 
there are 2 valid predicates on both sides
+fn convert_compound_expr(&self, valid_preds: &[Predicate], op: &Operator) 
-> Option {
+let valid_preds_count = valid_preds.len();
+match (op, valid_preds_count) {
+(Operator::And, 1) => valid_preds.first().cloned(),

Review Comment:
   I basically designed the conversion of filters to be greedy, which means we 
need to try and convert as many filters as we can in order to scan less files. 
   suppose we have a case like this: 
   ```
   SELECT * FROM T WHERE A > 10 AND REGEXP_LIKE(name, '[a-z]A[a-zA-Z]{2}');
   ```
   This will translate into an expr that looks something like this
   ```
   BinaryExpr(BinaryExpr AND RegExpExpr) 
   ```
   After we processed both leafs: `( BinaryExpr and RegExpExpr)`, we have on 
the stack just one valid expression because the `RegExpExpr` is not supported.
   Then we get to the parent: `BinaryExpr(A AND B)`, we try to get the 2 child 
expressions from the stack, but in this case we get only one (`A > 10`) because 
the second one is not supported (`REGEXP_LIKE`). 
   In this case, because its a logical **AND** expr, we proceed with just one 
valid predicate. Thats the case you are highlighinting - cases where we process 
logical AND expr where just one side is valid. in this case  - we are 
processing with just one predicate.
If it was an **OR** expr that was wrapping both then we would have failed 
the whole predicate. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat (datafusion integration): convert datafusion expr filters to Iceberg Predicate [iceberg-rust]

2024-09-08 Thread via GitHub


a-agmon commented on PR #588:
URL: https://github.com/apache/iceberg-rust/pull/588#issuecomment-2336631735

   Thank you very much for the review and comments, @liurenjie1024
   I think I have addressed all your comments and questions and would be happy 
if you can take a another look. 
   
   I just want to make a comment which might explain some of my design choices. 
   I think that the main value of converting data fusion expr to iceberg 
predicates, is that by pushing down some predicate to an iceberg `scan` we can 
prune data files and have data fusion scan less data. So, if there is some Expr 
that we can convert to Iceberg Predicate and then scan less files we should do 
it. Therefore I tried to implement this as greedy as possible - trying to 
convert as much relevant expr as possible, even when its just part of all the 
filters, and basically leave to data fusion to process the rest (which can 
sometimes be done better there) 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] chore(deps): Bump crate-ci/typos from 1.24.3 to 1.24.5 [iceberg-rust]

2024-09-08 Thread via GitHub


dependabot[bot] opened a new pull request, #616:
URL: https://github.com/apache/iceberg-rust/pull/616

   Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.24.3 to 
1.24.5.
   
   Release notes
   Sourced from https://github.com/crate-ci/typos/releases";>crate-ci/typos's 
releases.
   
   v1.24.5
   [1.24.5] - 2024-09-04
   Features
   
   (action) Support windows
   
   v1.24.4
   [1.24.4] - 2024-09-03
   Fixes
   
   Offer a correction for grather
   
   
   
   
   Changelog
   Sourced from https://github.com/crate-ci/typos/blob/master/CHANGELOG.md";>crate-ci/typos's
 changelog.
   
   [1.24.5] - 2024-09-04
   Features
   
   (action) Support windows
   
   [1.24.4] - 2024-09-03
   Fixes
   
   Offer a correction for grather
   
   
   
   
   Commits
   
   https://github.com/crate-ci/typos/commit/945d407a5fc9097f020969446a16f581612ab4df";>945d407
 chore: Release
   https://github.com/crate-ci/typos/commit/e972e95d223e37b01e2c21a456b1c70e79836a5d";>e972e95
 docs: Update changelog
   https://github.com/crate-ci/typos/commit/fcfade456a71a13e7c3495a3d3055af176e1a55e";>fcfade4
 Merge pull request https://redirect.github.com/crate-ci/typos/issues/1095";>#1095 from 
zyf722/actions-windows
   https://github.com/crate-ci/typos/commit/264f549c13a4c604d08542635a846f750d512b6b";>264f549
 feat(action): Add Windows support to actions
   https://github.com/crate-ci/typos/commit/853bbe8898c2e3ce040ba6fb0b2cbb3d0ababec5";>853bbe8
 chore: Release
   https://github.com/crate-ci/typos/commit/b5f56600b2fce35c214fc337a4884c4eafa6407a";>b5f5660
 docs: Update changelog
   https://github.com/crate-ci/typos/commit/39f678e520527f50a251775ee38731def724d591";>39f678e
 Merge pull request https://redirect.github.com/crate-ci/typos/issues/1093";>#1093 from 
kachick/grather-suggest-candidates
   https://github.com/crate-ci/typos/commit/bb6905fdc963da291c50ef79072a8cbccac06e81";>bb6905f
 Merge pull request https://redirect.github.com/crate-ci/typos/issues/1092";>#1092 from 
crate-ci/renovate/embarkstudios-cargo-deny-acti...
   https://github.com/crate-ci/typos/commit/786c825f1753f87b02d24770e3f4ec8043d9084a";>786c825
 fix(dict): Add suggestions for the typo "grather"
   https://github.com/crate-ci/typos/commit/340058181bc154efb6751badb34d4e40285d654b";>3400581
 chore(deps): Update EmbarkStudios/cargo-deny-action action to v2
   See full diff in https://github.com/crate-ci/typos/compare/v1.24.3...v1.24.5";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=crate-ci/typos&package-manager=github_actions&previous-version=1.24.3&new-version=1.24.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] chore(deps): Update arrow-array requirement from 52 to 53 [iceberg-rust]

2024-09-08 Thread via GitHub


dependabot[bot] opened a new pull request, #617:
URL: https://github.com/apache/iceberg-rust/pull/617

   Updates the requirements on 
[arrow-array](https://github.com/apache/arrow-rs) to permit the latest version.
   
   Changelog
   Sourced from https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md";>arrow-array's
 changelog.
   
   https://github.com/apache/arrow-rs/tree/52.2.0";>52.2.0 
(2024-07-24)
   https://github.com/apache/arrow-rs/compare/52.1.0...52.2.0";>Full 
Changelog
   Implemented enhancements:
   
   Faster min/max for string/binary view arrays https://redirect.github.com/apache/arrow-rs/issues/6088";>#6088 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Support casting to/from Utf8View https://redirect.github.com/apache/arrow-rs/issues/6076";>#6076 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Min/max support for String/BinaryViewArray https://redirect.github.com/apache/arrow-rs/issues/6052";>#6052 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Improve performance of constructing ByteViews for small 
strings https://redirect.github.com/apache/arrow-rs/issues/6034";>#6034 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Fast UTF-8 validation when reading StringViewArray from Parquet https://redirect.github.com/apache/arrow-rs/issues/5995";>#5995 [https://github.com/apache/arrow-rs/labels/parquet";>parquet]
   Optimize StringView row decoding https://redirect.github.com/apache/arrow-rs/issues/5945";>#5945 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Implementing deduplicate / intern 
functionality for StringView https://redirect.github.com/apache/arrow-rs/issues/5910";>#5910 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Add FlightSqlServiceClient::new_from_inner https://redirect.github.com/apache/arrow-rs/pull/6003";>#6003 [https://github.com/apache/arrow-rs/labels/arrow";>arrow] [https://github.com/apache/arrow-rs/labels/arrow-flight";>arrow-flight] 
(https://github.com/lewiszlw";>lewiszlw)
   Complete StringViewArray and BinaryViewArray 
parquet decoder:  https://redirect.github.com/apache/arrow-rs/pull/6004";>#6004 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] (https://github.com/XiangpengHao";>XiangpengHao)
   Add begin/end_transaction methods in FlightSqlServiceClient https://redirect.github.com/apache/arrow-rs/pull/6026";>#6026 [https://github.com/apache/arrow-rs/labels/arrow";>arrow] [https://github.com/apache/arrow-rs/labels/arrow-flight";>arrow-flight] 
(https://github.com/lewiszlw";>lewiszlw)
   Read Parquet statistics as arrow Arrays  https://redirect.github.com/apache/arrow-rs/pull/6046";>#6046 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] (https://github.com/efredine";>efredine)
   
   Fixed bugs:
   
   Panic in ParquetMetadata::memory_size if no min/max set https://redirect.github.com/apache/arrow-rs/issues/6091";>#6091 [https://github.com/apache/arrow-rs/labels/parquet";>parquet]
   BinaryViewArray doesn't roundtrip a single Some(&[]) 
through parquet https://redirect.github.com/apache/arrow-rs/issues/6086";>#6086 [https://github.com/apache/arrow-rs/labels/parquet";>parquet]
   Parquet ColumnIndex for null columns is written even when 
statistics are disabled https://redirect.github.com/apache/arrow-rs/issues/6010";>#6010 [https://github.com/apache/arrow-rs/labels/parquet";>parquet]
   
   Documentation updates:
   
   Fix typo in GenericByteViewArray documentation https://redirect.github.com/apache/arrow-rs/pull/6054";>#6054 [https://github.com/apache/arrow-rs/labels/arrow";>arrow] (https://github.com/progval";>progval)
   Minor: Improve parquet PageIndex documentation https://redirect.github.com/apache/arrow-rs/pull/6042";>#6042 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] (https://github.com/alamb";>alamb)
   
   Closed issues:
   
   Potential performance improvements for reading Parquet to 
StringViewArray/BinaryViewArray https://redirect.github.com/apache/arrow-rs/issues/5904";>#5904 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   
   Merged pull requests:
   
   Faster GenericByteView construction https://redirect.github.com/apache/arrow-rs/pull/6102";>#6102 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] [https://github.com/apache/arrow-rs/labels/arrow";>arrow] (https://github.com/XiangpengHao";>XiangpengHao)
   Add benchmark to track byte-view construction performance https://redirect.github.com/apache/arrow-rs/pull/6101";>#6101 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] (https://github.com/XiangpengHao";>XiangpengHao)
   Optimize bool_or using max_boolean https://redirect.github.com/apache/arrow-rs/pull/6100";>#6100 [https://github.com/apache/arrow-rs/labels/arrow";>arrow] (https://github.com/simonvandel";>simonvandel)
   Optimize max_boolean by operating

[PR] chore(deps): Update parquet requirement from 52 to 53 [iceberg-rust]

2024-09-08 Thread via GitHub


dependabot[bot] opened a new pull request, #619:
URL: https://github.com/apache/iceberg-rust/pull/619

   Updates the requirements on [parquet](https://github.com/apache/arrow-rs) to 
permit the latest version.
   
   Changelog
   Sourced from https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md";>parquet's
 changelog.
   
   https://github.com/apache/arrow-rs/tree/52.2.0";>52.2.0 
(2024-07-24)
   https://github.com/apache/arrow-rs/compare/52.1.0...52.2.0";>Full 
Changelog
   Implemented enhancements:
   
   Faster min/max for string/binary view arrays https://redirect.github.com/apache/arrow-rs/issues/6088";>#6088 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Support casting to/from Utf8View https://redirect.github.com/apache/arrow-rs/issues/6076";>#6076 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Min/max support for String/BinaryViewArray https://redirect.github.com/apache/arrow-rs/issues/6052";>#6052 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Improve performance of constructing ByteViews for small 
strings https://redirect.github.com/apache/arrow-rs/issues/6034";>#6034 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Fast UTF-8 validation when reading StringViewArray from Parquet https://redirect.github.com/apache/arrow-rs/issues/5995";>#5995 [https://github.com/apache/arrow-rs/labels/parquet";>parquet]
   Optimize StringView row decoding https://redirect.github.com/apache/arrow-rs/issues/5945";>#5945 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Implementing deduplicate / intern 
functionality for StringView https://redirect.github.com/apache/arrow-rs/issues/5910";>#5910 [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   Add FlightSqlServiceClient::new_from_inner https://redirect.github.com/apache/arrow-rs/pull/6003";>#6003 [https://github.com/apache/arrow-rs/labels/arrow";>arrow] [https://github.com/apache/arrow-rs/labels/arrow-flight";>arrow-flight] 
(https://github.com/lewiszlw";>lewiszlw)
   Complete StringViewArray and BinaryViewArray 
parquet decoder:  https://redirect.github.com/apache/arrow-rs/pull/6004";>#6004 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] (https://github.com/XiangpengHao";>XiangpengHao)
   Add begin/end_transaction methods in FlightSqlServiceClient https://redirect.github.com/apache/arrow-rs/pull/6026";>#6026 [https://github.com/apache/arrow-rs/labels/arrow";>arrow] [https://github.com/apache/arrow-rs/labels/arrow-flight";>arrow-flight] 
(https://github.com/lewiszlw";>lewiszlw)
   Read Parquet statistics as arrow Arrays  https://redirect.github.com/apache/arrow-rs/pull/6046";>#6046 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] (https://github.com/efredine";>efredine)
   
   Fixed bugs:
   
   Panic in ParquetMetadata::memory_size if no min/max set https://redirect.github.com/apache/arrow-rs/issues/6091";>#6091 [https://github.com/apache/arrow-rs/labels/parquet";>parquet]
   BinaryViewArray doesn't roundtrip a single Some(&[]) 
through parquet https://redirect.github.com/apache/arrow-rs/issues/6086";>#6086 [https://github.com/apache/arrow-rs/labels/parquet";>parquet]
   Parquet ColumnIndex for null columns is written even when 
statistics are disabled https://redirect.github.com/apache/arrow-rs/issues/6010";>#6010 [https://github.com/apache/arrow-rs/labels/parquet";>parquet]
   
   Documentation updates:
   
   Fix typo in GenericByteViewArray documentation https://redirect.github.com/apache/arrow-rs/pull/6054";>#6054 [https://github.com/apache/arrow-rs/labels/arrow";>arrow] (https://github.com/progval";>progval)
   Minor: Improve parquet PageIndex documentation https://redirect.github.com/apache/arrow-rs/pull/6042";>#6042 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] (https://github.com/alamb";>alamb)
   
   Closed issues:
   
   Potential performance improvements for reading Parquet to 
StringViewArray/BinaryViewArray https://redirect.github.com/apache/arrow-rs/issues/5904";>#5904 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] [https://github.com/apache/arrow-rs/labels/arrow";>arrow]
   
   Merged pull requests:
   
   Faster GenericByteView construction https://redirect.github.com/apache/arrow-rs/pull/6102";>#6102 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] [https://github.com/apache/arrow-rs/labels/arrow";>arrow] (https://github.com/XiangpengHao";>XiangpengHao)
   Add benchmark to track byte-view construction performance https://redirect.github.com/apache/arrow-rs/pull/6101";>#6101 [https://github.com/apache/arrow-rs/labels/parquet";>parquet] (https://github.com/XiangpengHao";>XiangpengHao)
   Optimize bool_or using max_boolean https://redirect.github.com/apache/arrow-rs/pull/6100";>#6100 [https://github.com/apache/arrow-rs/labels/arrow";>arrow] (https://github.com/simonvandel";>simonvandel)
   Optimize max_boolean by operating on u64 

Re: [PR] feat (datafusion integration): convert datafusion expr filters to Iceberg Predicate [iceberg-rust]

2024-09-08 Thread via GitHub


a-agmon commented on code in PR #588:
URL: https://github.com/apache/iceberg-rust/pull/588#discussion_r1749162764


##
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs:
##
@@ -0,0 +1,312 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::collections::VecDeque;
+
+use datafusion::common::tree_node::{TreeNodeRecursion, TreeNodeVisitor};
+use datafusion::common::Column;
+use datafusion::error::DataFusionError;
+use datafusion::logical_expr::{Expr, Operator};
+use datafusion::scalar::ScalarValue;
+use iceberg::expr::{Predicate, Reference};
+use iceberg::spec::Datum;
+
+pub struct ExprToPredicateVisitor {
+stack: VecDeque>,
+}
+impl ExprToPredicateVisitor {
+/// Create a new predicate conversion visitor.
+pub fn new() -> Self {
+Self {
+stack: VecDeque::new(),
+}
+}
+/// Get the predicate from the stack.
+pub fn get_predicate(&self) -> Option {
+self.stack
+.iter()
+.filter_map(|opt| opt.clone())
+.reduce(Predicate::and)
+}
+
+/// Convert a column expression to an iceberg predicate.
+fn convert_column_expr(
+&self,
+col: &Column,
+op: &Operator,
+lit: &ScalarValue,
+) -> Option {
+let reference = Reference::new(col.name.clone());
+let datum = scalar_value_to_datum(lit)?;
+Some(binary_op_to_predicate(reference, op, datum))
+}
+
+/// Convert a compound expression to an iceberg predicate.
+///
+/// The strategy is to support the following cases:
+/// - if its an AND expression then the result will be the valid 
predicates, whether there are 2 or just 1
+/// - if its an OR expression then a predicate will be returned only if 
there are 2 valid predicates on both sides
+fn convert_compound_expr(&self, valid_preds: &[Predicate], op: &Operator) 
-> Option {
+let valid_preds_count = valid_preds.len();
+match (op, valid_preds_count) {
+(Operator::And, 1) => valid_preds.first().cloned(),
+(Operator::And, 2) => Some(Predicate::and(
+valid_preds[0].clone(),
+valid_preds[1].clone(),
+)),
+(Operator::Or, 2) => Some(Predicate::or(
+valid_preds[0].clone(),
+valid_preds[1].clone(),
+)),
+_ => None,
+}
+}
+}
+
+// Implement TreeNodeVisitor for ExprToPredicateVisitor
+impl<'n> TreeNodeVisitor<'n> for ExprToPredicateVisitor {
+type Node = Expr;
+
+fn f_down(&mut self, _node: &'n Self::Node) -> Result {
+Ok(TreeNodeRecursion::Continue)
+}
+
+fn f_up(&mut self, node: &'n Self::Node) -> Result {
+if let Expr::BinaryExpr(binary) = node {
+match (&*binary.left, &binary.op, &*binary.right) {
+// process simple expressions (involving a column, operator 
and literal)
+(Expr::Column(col), op, Expr::Literal(lit)) => {

Review Comment:
   Thanks for the comment @liurenjie1024 
   I have added support, and test, for both "directions" of binary statement: 
`X > 1` and `1 < X`.
   But I think there is something more to be clarified: 
   Both `x>1` and `1 1` is processed then there are 3 main leafs 
to handle:
   
   ```
   1 - Column(Column { relation: Some(Bare { table: "my_table" }), name: "foo" 
})
   ```
   This ^  will be ignored as its not a binary expr (its a column expr)
   
   ```
   2 - Literal(Int64(1))
   ```
   This ^  will also be ignored as its not a binary expr (its a literal expr)
   
   ```
   3 - BinaryExpr(BinaryExpr { left: Column(Column { relation: Some(Bare { 
table: "my_table" }), name: "foo" }), op: Gt, right: Literal(Int64(1)) })
   ```
   This ^ binary expr will be captured, deconstructed, and processed by this 
match arm 
   ```
 match (&*binary.left, &binary.op, &*binary.right) {
   (Expr::Column(col), op, Expr::Literal(lit)) => {
 ...
   }
   ```
   
   Therefore, in order to support `a>6` or `6

Re: [PR] Iceberg/Comet integration POC [iceberg]

2024-09-08 Thread via GitHub


manuzhang commented on PR #9841:
URL: https://github.com/apache/iceberg/pull/9841#issuecomment-2336715251

   As a newbie to the native land, I'm wondering why we integrate with Comet 
parquet reader here. Any alternatives? Is there an "official" parquet reader?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: SortOrder methods should take schema ref if possible [iceberg-rust]

2024-09-08 Thread via GitHub


Xuanwo commented on code in PR #613:
URL: https://github.com/apache/iceberg-rust/pull/613#discussion_r1749260477


##
crates/iceberg/src/spec/sort.rs:
##
@@ -133,6 +133,14 @@ impl SortOrder {
 pub fn is_unsorted(&self) -> bool {
 self.fields.is_empty()
 }
+
+/// Set the order id for the sort order
+pub fn with_order_id(&self, order_id: i64) -> SortOrder {

Review Comment:
   An API named `with_xxx` that takes a ref might be a bit confusing. Is there 
any context for this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] fix: SIGSEGV when describe empty table [iceberg-go]

2024-09-08 Thread via GitHub


alex-kar opened a new pull request, #145:
URL: https://github.com/apache/iceberg-go/pull/145

   Empty table does not have "Current Snapshot" causing describe command to be 
failed.
   Added validation to replace with empty string `""`.
   
   Steps to reproduce
   
   1. Start env `docker-compose -f dev/docker-compose.yml up -d`
   2. Create namespace
   ```
   curl --location --request POST 'localhost:8181/v1/namespaces/public/tables' \
   --header 'Content-Type: application/json' \
   --data-raw '{
   "schema": {
   "type":"struct",
   "fields": [
   {
   "id":1,
   "name": "col1",
   "type": "boolean",
   "required": true
   }
   ]
   },
   "name": "t"
   }
   '
   ```
   3. Describe table `t`: `iceberg describe public.t 
--uri=http://localhost:8181`
   
   Actual result
   ```
   Table format version | 2
   Metadata location| 
s3://warehouse/public/t/metadata/0-c2e47505-8693-4254-b335-437111d400f0.metadata.json
   Table UUID   | 552a9f6b-c96e-4816-957c-160b96d57a62
   Last updated | 1725807593801
   Sort Order   | 0: []
   Partition Spec   | []
   
   Current Schema, id=0
   └──1: col1: required boolean
   
   panic: runtime error: invalid memory address or nil pointer dereference
   [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x925504]
   
   goroutine 1 [running]:
   main.text.DescribeTable({}, 0xc0001740a0)
   /home/alex/projects/opensource/iceberg-go/cmd/iceberg/output.go:101 
+0xea4
   main.describe({0xb3dfe8, 0xf85660}, {0xb3e0c8, 0xc000362ed0}, 
{0x7ffe86519d7c, 0x8}, {0xa44687, 0x3})
   /home/alex/projects/opensource/iceberg-go/cmd/iceberg/main.go:252 
+0x3db
   main.main()
   /home/alex/projects/opensource/iceberg-go/cmd/iceberg/main.go:149 
+0x4ef
   ```
   
   Fixed result
   ```
   Table format version | 2
   Metadata location| 
s3://warehouse/public/t/metadata/0-c2e47505-8693-4254-b335-437111d400f0.metadata.json
   Table UUID   | 552a9f6b-c96e-4816-957c-160b96d57a62
   Last updated | 1725807593801
   Sort Order   | 0: []
   Partition Spec   | []
   
   Current Schema, id=0
   └──1: col1: required boolean
   
   Current Snapshot | 
   
   Snapshots
   
   Properties
   key | value
   ---
   write.parquet.compression-codec | zstd
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: partition compatibility [iceberg-rust]

2024-09-08 Thread via GitHub


c-thiel commented on code in PR #612:
URL: https://github.com/apache/iceberg-rust/pull/612#discussion_r1749274990


##
crates/iceberg/src/spec/partition.rs:
##
@@ -118,9 +118,63 @@ impl PartitionSpec {
 /// Turn this partition spec into an unbound partition spec.
 ///
 /// The `field_id` is retained as `partition_id` in the unbound partition 
spec.
-pub fn to_unbound(self) -> UnboundPartitionSpec {
+pub fn into_unbound(self) -> UnboundPartitionSpec {
 self.into()
 }
+
+/// Check if this partition spec is compatible with another partition spec.
+///
+/// Returns true if the partition spec is equal to the other spec with 
partition field ids ignored and
+/// spec_id ignored. The following must be identical:
+/// * The number of fields
+/// * Field order
+/// * Field names
+/// * Source column ids
+/// * Transforms
+pub fn compatible_with(&self, other: &UnboundPartitionSpec) -> bool {

Review Comment:
   Sounds good - done in 
https://github.com/apache/iceberg-rust/pull/612/commits/64303ea21a26708b873a62f54a43b4d1e2265f94



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: partition compatibility [iceberg-rust]

2024-09-08 Thread via GitHub


Xuanwo merged PR #612:
URL: https://github.com/apache/iceberg-rust/pull/612


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Feat: Normalize TableMetadata [iceberg-rust]

2024-09-08 Thread via GitHub


Xuanwo commented on code in PR #611:
URL: https://github.com/apache/iceberg-rust/pull/611#discussion_r1749292893


##
crates/iceberg/src/spec/table_metadata.rs:
##
@@ -278,6 +330,229 @@ impl TableMetadata {
 self.snapshots
 .insert(snapshot.snapshot_id(), Arc::new(snapshot));
 }
+
+/// Normalize this partition spec.
+///
+/// This is an internal method
+/// meant to be called after constructing table metadata from untrusted 
sources.
+/// We run this method after json deserialization.
+/// All constructors for `TableMetadata` which are part of `iceberg-rust`
+/// should return normalized `TableMetadata`.
+///
+/// It does:
+/// * Validate the current schema is set and valid
+/// * Validate that all refs are valid (snapshot exists)
+/// * Validate logs are chronological
+/// * Normalize location (remove trailing slash)
+/// * Validate that for V1 Metadata the last_sequence_number is 0
+/// * If the default partition spec is specified but the spec is not 
present in specs, add it
+/// * If the default sort order is unsorted but the sort order is not 
present, add it
+pub(super) fn try_normalize(&mut self) -> Result<&mut Self> {
+self.validate_current_schema()?;
+self.normalize_current_snapshot()?;
+self.validate_refs()?;
+self.validate_chronological_snapshot_logs()?;
+self.validate_chronological_metadata_logs()?;
+self.location = self.location.trim_end_matches('/').to_string();
+self.validate_format_version_specifics()?;
+self.try_normalize_partition_spec()?;
+self.try_normalize_sort_order()?;
+Ok(self)
+}
+
+fn try_normalize_partition_spec(&mut self) -> Result<()> {

Review Comment:
   Hi, I'm not sure if this makes the logic easier to read. (It works for me).
   
   ```rust
   if self.partition_spec_by_id(self.default_spec_id).is_some() {
  return Ok(())
   }
   
   if self.default_spec_id != DEFAULT_PARTITION_SPEC_ID {
 return Err(...)
   }
   
   let partition_spec = xxx;
   self.partition_specs.insert(xxx);
   Ok(())
   ```
   
   The `self.partition_specs.is_empty()` check can be avoided since it should 
never occur.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Spark Streaming Job with multiple queries MERGE INTO the same target table (Runtime file filtering is not possible) [iceberg]

2024-09-08 Thread via GitHub


eric-maynard commented on issue #11094:
URL: https://github.com/apache/iceberg/issues/11094#issuecomment-2336763248

   To use multiple sessions you would essentially move this part of your code 
into the `start_streaming_query` function:
   
   ```
   # Initialize Spark Session
   spark = ( SparkSession.builder
   .appName("streaming merg into")
   .config("spark.sql.shuffle.partitions", "2")
   .config("spark.log.level", "ERROR")
   .getOrCreate()
   )
   ```
   
   You might also want to make some changes such as appending the thread ID to 
the app name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] add option to force columns to lowercase [iceberg]

2024-09-08 Thread via GitHub


yornstei opened a new pull request, #11100:
URL: https://github.com/apache/iceberg/pull/11100

   The current connector will create the columns with the same casing as the 
incoming data, even though many catalogs will show the columns as lowercase. 
This PR will give the option to force columns to lowercase. 
   
   Refer to this [slack 
thread](https://apache-iceberg.slack.com/archives/C03SEK9PW0Z/p1725468165932919)
 for more details.
   
   cc @bryanck 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Reassign field ids for schema [iceberg-rust]

2024-09-08 Thread via GitHub


Fokko commented on code in PR #615:
URL: https://github.com/apache/iceberg-rust/pull/615#discussion_r1749313279


##
crates/iceberg/src/spec/schema.rs:
##
@@ -86,6 +87,16 @@ impl SchemaBuilder {
 self
 }
 
+/// Reassign all field-ids (nested) on build.
+/// If `start_from` is provided, it will start reassigning from that id 
(inclusive).
+/// If not provided, it will start from 0.
+///
+/// All specified aliases and identifier fields will be updated to the new 
field-ids.
+pub fn with_reassigned_field_ids(mut self, start_from: Option) -> 
Self {
+self.reassign_field_ids_from = Some(start_from.unwrap_or(0));

Review Comment:
   Typically you would pass in `last-column-id` from the table metadata. In the 
case of a `CREATE TABLE` this will be zero, in the case of a `REPLACE TABLE` it 
will be `last-column-id` where all columns get a new ID and you ensure they are 
not used before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] storage/s3: We should respect `client.region` too [iceberg-rust]

2024-09-08 Thread via GitHub


jdockerty commented on issue #603:
URL: https://github.com/apache/iceberg-rust/issues/603#issuecomment-2336810234

   I can do this 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.3,3.4: Add back log making it clearer when we are not pushing down filters [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] closed pull request #7713: Spark 3.3,3.4: Add back log 
making it clearer when we are not pushing down filters
URL: https://github.com/apache/iceberg/pull/7713


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.4: IcebergSource extends SessionConfigSupport [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] commented on PR #7732:
URL: https://github.com/apache/iceberg/pull/7732#issuecomment-2336880892

   This pull request has been closed due to lack of activity. This is not a 
judgement on the merit of the PR in any way. It is just a way of keeping the PR 
queue manageable. If you think that is incorrect, or the pull request requires 
review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.4: Correct the two-stage parsing strategy of antlr parser [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] commented on PR #7734:
URL: https://github.com/apache/iceberg/pull/7734#issuecomment-2336880932

   This pull request has been closed due to lack of activity. This is not a 
judgement on the merit of the PR in any way. It is just a way of keeping the PR 
queue manageable. If you think that is incorrect, or the pull request requires 
review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.4: Correct the two-stage parsing strategy of antlr parser [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] closed pull request #7734: Spark 3.4: Correct the two-stage 
parsing strategy of antlr parser
URL: https://github.com/apache/iceberg/pull/7734


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.4: Allow write mode (copy-on-write/merge-on-read) to be specified in SQLConf [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] closed pull request #7790: Spark 3.4: Allow write mode 
(copy-on-write/merge-on-read) to be specified in SQLConf
URL: https://github.com/apache/iceberg/pull/7790


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.4: Allow write mode (copy-on-write/merge-on-read) to be specified in SQLConf [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] commented on PR #7790:
URL: https://github.com/apache/iceberg/pull/7790#issuecomment-2336880984

   This pull request has been closed due to lack of activity. This is not a 
judgement on the merit of the PR in any way. It is just a way of keeping the PR 
queue manageable. If you think that is incorrect, or the pull request requires 
review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Disable containerReuse when building delete filters [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] commented on PR #7833:
URL: https://github.com/apache/iceberg/pull/7833#issuecomment-2336881035

   This pull request has been closed due to lack of activity. This is not a 
judgement on the merit of the PR in any way. It is just a way of keeping the PR 
queue manageable. If you think that is incorrect, or the pull request requires 
review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: support use-table-distribution-and-ordering in session conf [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] commented on PR #8164:
URL: https://github.com/apache/iceberg/pull/8164#issuecomment-2336881202

   This pull request has been marked as stale due to 30 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@iceberg.apache.org list. Thank you for your 
contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Case sensitivity is not respected when using IcebergGenerics.ScanBuilder [iceberg]

2024-09-08 Thread via GitHub


github-actions[bot] commented on issue #8178:
URL: https://github.com/apache/iceberg/issues/8178#issuecomment-2336881273

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support metadata compaction [iceberg-python]

2024-09-08 Thread via GitHub


github-actions[bot] closed issue #270: Support metadata compaction
URL: https://github.com/apache/iceberg-python/issues/270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support metadata compaction [iceberg-python]

2024-09-08 Thread via GitHub


github-actions[bot] commented on issue #270:
URL: https://github.com/apache/iceberg-python/issues/270#issuecomment-2336882412

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] chore(deps): Update arrow-array requirement from 52 to 53 [iceberg-rust]

2024-09-08 Thread via GitHub


dependabot[bot] commented on PR #617:
URL: https://github.com/apache/iceberg-rust/pull/617#issuecomment-2336926987

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] chore(deps): Update arrow-ord requirement from 52 to 53 [iceberg-rust]

2024-09-08 Thread via GitHub


dependabot[bot] commented on PR #621:
URL: https://github.com/apache/iceberg-rust/pull/621#issuecomment-2336927231

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] chore(deps): Update arrow-arith requirement from 52 to 53 [iceberg-rust]

2024-09-08 Thread via GitHub


dependabot[bot] commented on PR #620:
URL: https://github.com/apache/iceberg-rust/pull/620#issuecomment-2336927697

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] storage/s3: We should respect `client.region` too [iceberg-rust]

2024-09-08 Thread via GitHub


Xuanwo closed issue #603: storage/s3: We should respect `client.region` too
URL: https://github.com/apache/iceberg-rust/issues/603


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] fix: Correctly calculate highest_field_id in schema [iceberg-rust]

2024-09-08 Thread via GitHub


Xuanwo merged PR #590:
URL: https://github.com/apache/iceberg-rust/pull/590


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Iceberg/Comet integration POC [iceberg]

2024-09-08 Thread via GitHub


huaxingao commented on PR #9841:
URL: https://github.com/apache/iceberg/pull/9841#issuecomment-2337062384

   @manuzhang We have native Parquet reader on Comet side to take advantage of 
the performance gain on native side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] TableMetadataBuilder [iceberg-rust]

2024-09-08 Thread via GitHub


liurenjie1024 commented on PR #587:
URL: https://github.com/apache/iceberg-rust/pull/587#issuecomment-2337063983

   > After we answered those questions, and we still think splitting makes 
sense, I can try to find time to build stacked-PRs. Maybe just splitting 
normalization / validation in table_metadata.rs from the actual builder would 
be a leaner option than splitting every single TableUpdate?
   
   That sound reasonable to me. If one pr per table update is too much burden, 
could we split them by components, for example sort oder, partition spec, 
schema changes?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support relative paths in Table Metadata [iceberg]

2024-09-08 Thread via GitHub


lightmelodies commented on issue #1617:
URL: https://github.com/apache/iceberg/issues/1617#issuecomment-2337073064

   For anyone who is interested: 
https://github.com/lightmelodies/iceberg-relative-io


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-08 Thread via GitHub


stevenzwu commented on code in PR #11086:
URL: https://github.com/apache/iceberg/pull/11086#discussion_r1749541033


##
core/src/test/java/org/apache/iceberg/TestFastAppend.java:
##
@@ -42,6 +42,25 @@ protected static List parameters() {
 return Arrays.asList(1, 2, 3);
   }
 
+  @TestTemplate
+  public void testAddManyFiles() {
+assertThat(listManifestFiles()).as("Table should start empty").isEmpty();
+
+List dataFiles = Lists.newArrayList();
+
+for (int ordinal = 0; ordinal < 20_000; ordinal++) {

Review Comment:
   I am usually good with package private. we have quite a bit 
`@VisibleForTesting` for case like that.
   
   It can be tricky to update the tests if we ever change the 
`MIN_FILE_GROUP_SIZE` default value to some larger value like 20_000. then this 
test is not covering the intended case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Iceberg/Comet integration POC [iceberg]

2024-09-08 Thread via GitHub


manuzhang commented on PR #9841:
URL: https://github.com/apache/iceberg/pull/9841#issuecomment-2337144632

   @huaxingao I know that. What about alternatives? Is the Comet parquet reader 
the only native parquet reader implementation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spec: Fix rendering of partition stats file spec [iceberg]

2024-09-08 Thread via GitHub


nastra merged PR #11068:
URL: https://github.com/apache/iceberg/pull/11068


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org