Re: [PR] Tests: Unify the test catalog setting [iceberg-python]

2024-04-16 Thread via GitHub


frankliee commented on code in PR #609:
URL: https://github.com/apache/iceberg-python/pull/609#discussion_r1566818851


##
tests/conftest.py:
##
@@ -2144,3 +2144,31 @@ def arrow_table_with_only_nulls(pa_schema: "pa.Schema") 
-> "pa.Table":
 import pyarrow as pa
 
 return pa.Table.from_pylist([{}, {}], schema=pa_schema)
+
+
+@pytest.fixture()
+def catalog_rest() -> Catalog:
+return load_catalog(
+"local",
+**{
+"type": "rest",
+"uri": "http://localhost:8181";,
+"s3.endpoint": "http://localhost:9000";,
+"s3.access-key-id": "admin",
+"s3.secret-access-key": "password",
+},
+)
+
+
+@pytest.fixture()
+def catalog_hive() -> Catalog:

Review Comment:
   > @frankliee Thanks for working on this! We also have `session_catalog` and 
`session_hive_catalog` in `conftest`
   > 
   > 
https://github.com/apache/iceberg-python/blob/aa4654368e54bf84933279179519b299b5910493/tests/conftest.py#L1997-L2022
   > 
   > 
   > and I think they can be unified with these too (using scope `session`). 
WDYT?
   
   Nice suggestion, I will unify all of them as `catalog_rest` and 
`catalog_hive` in this PR after a while.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #10112:
URL: https://github.com/apache/iceberg/pull/10112#discussion_r1566850598


##
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java:
##
@@ -391,17 +391,16 @@ public void createTable(ObjectPath tablePath, 
CatalogBaseTable table, boolean ig
   + "an iceberg catalog, Please create table with 
'connector'='iceberg' property in a non-iceberg catalog or "
   + "create table without 'connector'='iceberg' related properties 
in an iceberg table.");
 }
-
-createIcebergTable(tablePath, table, ignoreIfExists);
+Preconditions.checkArgument(table instanceof ResolvedCatalogTable, "table 
should be resolved");

Review Comment:
   is this new code that's required for Flink 1.19?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #10112:
URL: https://github.com/apache/iceberg/pull/10112#discussion_r1566852990


##
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java:
##
@@ -391,17 +391,16 @@ public void createTable(ObjectPath tablePath, 
CatalogBaseTable table, boolean ig
   + "an iceberg catalog, Please create table with 
'connector'='iceberg' property in a non-iceberg catalog or "
   + "create table without 'connector'='iceberg' related properties 
in an iceberg table.");
 }
-
-createIcebergTable(tablePath, table, ignoreIfExists);
+Preconditions.checkArgument(table instanceof ResolvedCatalogTable, "table 
should be resolved");

Review Comment:
   it's not clear to me why there's a diff on `FlinkCatalog`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #10112:
URL: https://github.com/apache/iceberg/pull/10112#discussion_r1566855387


##
flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java:
##
@@ -391,17 +391,16 @@ public void createTable(ObjectPath tablePath, 
CatalogBaseTable table, boolean ig
   + "an iceberg catalog, Please create table with 
'connector'='iceberg' property in a non-iceberg catalog or "
   + "create table without 'connector'='iceberg' related properties 
in an iceberg table.");
 }
-
-createIcebergTable(tablePath, table, ignoreIfExists);
+Preconditions.checkArgument(table instanceof ResolvedCatalogTable, "table 
should be resolved");

Review Comment:
   I think the problem is that the diff is still from a previous attempt where 
changes were done based on Flink 1.16. What about removing Flink 1.16 building 
from the gradle files but doing the actual remove of Flink 1.16 folders in a 
separate PR? I think that would avoid these diffs here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-16 Thread via GitHub


nastra commented on PR #10112:
URL: https://github.com/apache/iceberg/pull/10112#issuecomment-2058425533

   > @nastra: Any comments? I would like to merge this soon, as any merge to 
Flink code path will make this PR stale, and @rodmeneses needs to recreate the 
whole PR.
   > 
   > Thanks,
   
   I think there's still an issue as there are a bunch of files/diffs that are 
because Flink 1.16 is being removed and git detects it as a move (with some 
additional changes). This can also be seen when looking at the file path, where 
a Flink 1.16 file is moved to a Flink 1.19 file, while also adding some diffs 
where it's not clear why the diff is there in the first place.
   
   My suggestion would be to do the actual removal of the 1.16 directory as a 
separate PR in an immediate follow-up. This would mean to skip tests 8 + 9 from 
the PR description, but it's fine to update gradle files to not build 1.16 
anymore. Thoughts on the suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add `BoundPredicateVisitor` (alternate version) [iceberg-rust]

2024-04-16 Thread via GitHub


sdd opened a new pull request, #334:
URL: https://github.com/apache/iceberg-rust/pull/334

   Alternative implementation for 
https://github.com/apache/iceberg-rust/pull/320, using per-operator visitor 
methods


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-16 Thread via GitHub


nastra commented on PR #10124:
URL: https://github.com/apache/iceberg/pull/10124#issuecomment-2058435530

   @mrcnc looks like there's a merge conflict. Can you rebase please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-16 Thread via GitHub


sdd commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1566871505


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,317 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub(crate) enum OpLiteral<'a> {
+Single(&'a Datum),
+Set(&'a FnvHashSet),
+}
+
+/// A visitor for [`BoundPredicate`]s. Visits in post-order.
+pub trait BoundPredicateVisitor {
+/// The return type of this visitor
+type T;
+
+/// Called after an `AlwaysTrue` predicate is visited
+fn always_true(&mut self) -> Result;
+
+/// Called after an `AlwaysFalse` predicate is visited
+fn always_false(&mut self) -> Result;
+
+/// Called after an `And` predicate is visited
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+
+/// Called after an `Or` predicate is visited
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+
+/// Called after a `Not` predicate is visited
+fn not(&mut self, inner: Self::T) -> Result;
+
+/// Called after visiting a UnaryPredicate, BinaryPredicate,
+/// or SetPredicate. Passes the predicate's operator in all cases,
+/// as well as the term and literals in the case of binary and aet
+/// predicates.
+fn op(

Review Comment:
   Ok - here's another PR with the original design. I'll let you guys choose 
which one to merge :-) https://github.com/apache/iceberg-rust/pull/334. 
   
   Here's how the inclusive projection would end up looking with this design 
too: https://github.com/apache/iceberg-rust/pull/335



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support partial deletes [iceberg-python]

2024-04-16 Thread via GitHub


Fokko commented on code in PR #569:
URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1566870388


##
pyiceberg/io/pyarrow.py:
##
@@ -1912,3 +1920,55 @@ def _get_parquet_writer_kwargs(table_properties: 
Properties) -> Dict[str, Any]:
 default=TableProperties.PARQUET_PAGE_ROW_LIMIT_DEFAULT,
 ),
 }
+
+
+def _dataframe_to_data_files(

Review Comment:
   I've moved this one to `pyarrow.py` since it is Arrow-specific. This also 
explains the local imports above to avoid circular imports.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] add `InclusiveProjection` Visitor (alternate version) [iceberg-rust]

2024-04-16 Thread via GitHub


sdd opened a new pull request, #335:
URL: https://github.com/apache/iceberg-rust/pull/335

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Create iceberg table from existsing parquet files with slightly different schemas (schemas merge is possible). [iceberg-python]

2024-04-16 Thread via GitHub


sergun commented on issue #601:
URL: https://github.com/apache/iceberg-python/issues/601#issuecomment-2058445375

   @kevinjqliu
   It is strange to me that in PyArrow there is 
[pa.unify_schemas(...)](https://arrow.apache.org/docs/python/generated/pyarrow.unify_schemas.html)
 which is able (I double-checked) to unify **nested schemas** (even with type 
promotions).
   But there is no "dual" functionality to cast, concat, read or to do 
something different with corresponding **data** in `pa.Table`. None of e.g. 
[pa.Table.cast(...)](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.cast),
 
[pa.concat_tables(...)](https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html),
 
[pa.concat_tables(...)](https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html),
 
[pa.parquet.read_table(...)](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html)
 is working.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-16 Thread via GitHub


nastra merged PR #10123:
URL: https://github.com/apache/iceberg/pull/10123


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Delete using Merge-on-Read sets `OVERWRITE` while `DELETE` is expected [iceberg]

2024-04-16 Thread via GitHub


nastra closed issue #10122: Delete using Merge-on-Read sets `OVERWRITE` while 
`DELETE` is expected
URL: https://github.com/apache/iceberg/issues/10122


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Create iceberg table from existsing parquet files with slightly different schemas (schemas merge is possible). [iceberg-python]

2024-04-16 Thread via GitHub


sergun commented on issue #601:
URL: https://github.com/apache/iceberg-python/issues/601#issuecomment-2058456159

   > One thing I wonder is if PyIceberg can handle schema evolution of nested 
structs.
   
   Looks like it can.
   From https://py.iceberg.apache.org/api/#add-column:
   ```
   with table.update_schema() as update:
   update.add_column("retries", IntegerType(), "Number of retries to place 
the bid")
   # In a struct
   update.add_column("details.confirmed_by", StringType(), "Name of the 
exchange")
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-16 Thread via GitHub


sdd commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1566902353


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,317 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub(crate) enum OpLiteral<'a> {
+Single(&'a Datum),
+Set(&'a FnvHashSet),
+}
+
+/// A visitor for [`BoundPredicate`]s. Visits in post-order.
+pub trait BoundPredicateVisitor {
+/// The return type of this visitor
+type T;
+
+/// Called after an `AlwaysTrue` predicate is visited
+fn always_true(&mut self) -> Result;
+
+/// Called after an `AlwaysFalse` predicate is visited
+fn always_false(&mut self) -> Result;
+
+/// Called after an `And` predicate is visited
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+
+/// Called after an `Or` predicate is visited
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+
+/// Called after a `Not` predicate is visited
+fn not(&mut self, inner: Self::T) -> Result;
+
+/// Called after visiting a UnaryPredicate, BinaryPredicate,
+/// or SetPredicate. Passes the predicate's operator in all cases,
+/// as well as the term and literals in the case of binary and aet
+/// predicates.
+fn op(

Review Comment:
   My preference is the original design, which is  (confusingly) in the 
"alternative" PRs 😅



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-16 Thread via GitHub


nastra commented on PR #10123:
URL: https://github.com/apache/iceberg/pull/10123#issuecomment-2058468333

   thanks for the reviews @amogh-jahagirdar and @aokolnychyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Flink: FlinkFileIO implementation [iceberg]

2024-04-16 Thread via GitHub


pvary opened a new pull request, #10151:
URL: https://github.com/apache/iceberg/pull/10151

   FileIO implementation which uses Flink embedded filesystem implementations.
   This allows the user to use the Flink pluggable FileSystems:
   
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/filesystems/overview/
   
   Some advantages:
   - Same configuration is used for checkpoints and Iceberg tables
   - Automatic Delegation Token support


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Create iceberg table from existsing parquet files with slightly different schemas (schemas merge is possible). [iceberg-python]

2024-04-16 Thread via GitHub


sergun commented on issue #601:
URL: https://github.com/apache/iceberg-python/issues/601#issuecomment-2058470428

   BTW: Found some explaination why merge of Arrow tables with different 
schemas is not possible:
   https://github.com/apache/arrow/issues/35424
   The reason looks weired, but yes, as I remeber e.g. Spark dataframes may 
have columns with duplicated names.
   
   Probably it is possible to implement table merge in PyArrow after the check 
that there are no duplicated column names in each struct and on root level.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] flink autoscaler: how set write-parallelism ? [iceberg]

2024-04-16 Thread via GitHub


sannaroby commented on issue #10147:
URL: https://github.com/apache/iceberg/issues/10147#issuecomment-2058495395

   Hi @pvary, thanks for your reply.
   We're using the HASH distribution mode and this is an extract from our flink 
job:
   
   ```
   SingleOutputStreamOperator mainFunction = env.addSource(kinesisSource)
   .keyBy(inputMessage-> inputMessage.getKey())
   .window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
   .process(new InputMessageToRowProcessFunction(), rowTypeInfo);
   
   FlinkSink.forRow(mainFunction, loadedTableSchema)
   .tableLoader(tableLoader)
   .table(icebergTable)
   .equalityFieldColumns(List.of("messageKey", "messageTimestamp"))
   .distributionMode(DistributionMode.HASH)
   .append();
   ```
   The autoscaler didn't change the upstream operator parallelism (the 
"InputMessageToRowProcessFunction"), because it judged it to be unnecessary.
   What do you mean for "rebalance step" ?
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566934826


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(

Review Comment:
   ```suggestion
  Assertions.assertThatThrownBy(
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566941616


##
core/src/test/java/org/apache/iceberg/rest/responses/TestListNamespacesResponse.java:
##
@@ -83,9 +83,32 @@ public void testBuilderDoesNotCreateInvalidObjects() {
 .hasMessage("Invalid namespace: null");
   }
 
+  @Test
+  public void testWithNullPaginationToken() throws JsonProcessingException {
+String jsonWithNullPageToken =
+
"{\"namespaces\":[[\"accounting\"],[\"tax\"]],\"next-page-token\":null}";
+ListNamespacesResponse response =
+
ListNamespacesResponse.builder().addAll(NAMESPACES).nextPageToken(null).build();
+assertRoundTripSerializesEquallyFrom(jsonWithNullPageToken, response);
+Assertions.assertThat(response.nextPageToken()).isNull();
+Assertions.assertThat(response.namespaces()).isEqualTo(NAMESPACES);
+  }
+
+  @Test
+  public void testWithPaginationToken() throws JsonProcessingException {
+String pageToken = "token";
+String jsonWithPageToken =
+
"{\"namespaces\":[[\"accounting\"],[\"tax\"]],\"next-page-token\":\"token\"}";
+ListNamespacesResponse response =
+
ListNamespacesResponse.builder().addAll(NAMESPACES).nextPageToken(pageToken).build();
+assertRoundTripSerializesEquallyFrom(jsonWithPageToken, response);
+Assertions.assertThat(response.nextPageToken()).isNotNull();

Review Comment:
   ```suggestion
   Assertions.assertThat(response.nextPageToken()).isEqualTo("token");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566941917


##
core/src/test/java/org/apache/iceberg/rest/responses/TestListTablesResponse.java:
##
@@ -105,9 +105,32 @@ public void testBuilderDoesNotCreateInvalidObjects() {
 .hasMessage("Invalid table identifier: null");
   }
 
+  @Test
+  public void testWithNullPaginationToken() throws JsonProcessingException {
+String jsonWithNullPageToken =
+
"{\"identifiers\":[{\"namespace\":[\"accounting\",\"tax\"],\"name\":\"paid\"}],\"next-page-token\":null}";
+ListTablesResponse response =
+
ListTablesResponse.builder().addAll(IDENTIFIERS).nextPageToken(null).build();
+assertRoundTripSerializesEquallyFrom(jsonWithNullPageToken, response);
+Assertions.assertThat(response.nextPageToken()).isNull();
+Assertions.assertThat(response.identifiers()).isEqualTo(IDENTIFIERS);
+  }
+
+  @Test
+  public void testWithPaginationToken() throws JsonProcessingException {
+String pageToken = "token";
+String jsonWithPageToken =
+
"{\"identifiers\":[{\"namespace\":[\"accounting\",\"tax\"],\"name\":\"paid\"}],\"next-page-token\":\"token\"}";
+ListTablesResponse response =
+
ListTablesResponse.builder().addAll(IDENTIFIERS).nextPageToken(pageToken).build();
+assertRoundTripSerializesEquallyFrom(jsonWithPageToken, response);
+Assertions.assertThat(response.nextPageToken()).isNotNull();

Review Comment:
   ```suggestion
   Assertions.assertThat(response.nextPageToken()).isEqualTo("token");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Check table existence to determine which catalog for drop table [iceberg]

2024-04-16 Thread via GitHub


wForget commented on code in PR #10128:
URL: https://github.com/apache/iceberg/pull/10128#discussion_r1566943822


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSessionCatalog.java:
##
@@ -275,18 +275,20 @@ public Table alterTable(Identifier ident, TableChange... 
changes) throws NoSuchT
 
   @Override
   public boolean dropTable(Identifier ident) {
-// no need to check table existence to determine which catalog to use. if 
a table doesn't exist

Review Comment:
   > This PR requires tests that would show the exact problem that was faced 
and how the new logic behaves now.
   
   thanks, added new unit test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566944912


##
core/src/test/java/org/apache/iceberg/rest/TestRESTViewCatalog.java:
##
@@ -144,6 +151,36 @@ public void closeCatalog() throws Exception {
 }
   }
 
+  @Test
+  public void testPaginationForListViews() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+
+int numberOfItems = 100;
+String namespaceName = "newdb";
+String tableName = "newtable";
+String viewName = "newview";
+
+// create initial namespace
+catalog().createNamespace(Namespace.of(namespaceName));
+
+// create several views under namespace, based off a table for listing and 
verify
+for (int i = 0; i < numberOfItems; i++) {
+  TableIdentifier viewIndentifier = TableIdentifier.of(namespaceName, 
viewName + i);
+  catalog
+  .buildView(viewIndentifier)
+  .withSchema(SCHEMA)
+  .withDefaultNamespace(viewIndentifier.namespace())
+  .withQuery("spark", "select * from " + namespaceName + "." + 
tableName)

Review Comment:
   ```suggestion
 .withQuery("spark", "select * from ns.tbl")
   ```
   
   and then you can remove the `tableName` variable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566933302


##
core/src/test/java/org/apache/iceberg/rest/responses/TestListNamespacesResponse.java:
##
@@ -34,7 +34,7 @@ public class TestListNamespacesResponse extends 
RequestResponseTestBase

Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566937039


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(
+() -> catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "-1")))
+.isInstanceOf(IllegalArgumentException.class)
+.hasMessage("Invalid value for pageSize, must be a positive 
integer");
+  }
+
+  @Test
+  public void testPaginationForListNamespaces() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+int numberOfItems = 100;
+String namespaceName = "newdb";
+
+// create several namespaces for listing and verify
+for (int i = 0; i < numberOfItems; i++) {
+  String nameSpaceName = namespaceName + i;
+  catalog.createNamespace(Namespace.of(nameSpaceName));
+}
+
+List results = catalog.listNamespaces();
+assertThat(results).hasSize(numberOfItems);

Review Comment:
   nit: can be inlined
   ```suggestion
   assertThat(catalog.listNamespaces()).hasSize(numberOfItems);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566939950


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(
+() -> catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "-1")))
+.isInstanceOf(IllegalArgumentException.class)
+.hasMessage("Invalid value for pageSize, must be a positive 
integer");
+  }
+
+  @Test
+  public void testPaginationForListNamespaces() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+int numberOfItems = 100;
+String namespaceName = "newdb";
+
+// create several namespaces for listing and verify
+for (int i = 0; i < numberOfItems; i++) {
+  String nameSpaceName = namespaceName + i;
+  catalog.createNamespace(Namespace.of(nameSpaceName));
+}
+
+List results = catalog.listNamespaces();
+assertThat(results).hasSize(numberOfItems);
+  }
+
+  @Test
+  public void testPaginationForListTables() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+int numberOfItems = 100;
+String namespaceName = "newdb";
+String tableName = "newtable";
+catalog.createNamespace(Namespace.of(namespaceName));
+
+// create several tables under namespace for listing and verify
+for (int i = 0; i < numberOfItems; i++) {
+  TableIdentifier tableIdentifier = TableIdentifier.of(namespaceName, 
tableName + i);
+  catalog.createTable(tableIdentifier, SCHEMA);
+}
+
+List tables = 
catalog.listTables(Namespace.of(namespaceName));
+assertThat(tables).hasSize(numberOfItems);

Review Comment:
   ```suggestion
   
assertThat(catalog.listTables(Namespace.of(namespaceName))).hasSize(numberOfItems);
   ```



##
core/src/test/java/org/apache/iceberg/rest/TestRESTViewCatalog.java:
##
@@ -144,6 +151,36 @@ public void closeCatalog() throws Exception {
 }
   }
 
+  @Test
+  public void testPaginationForListViews() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));

Review Comment:
   same as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566939429


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(
+() -> catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "-1")))
+.isInstanceOf(IllegalArgumentException.class)
+.hasMessage("Invalid value for pageSize, must be a positive 
integer");
+  }
+
+  @Test
+  public void testPaginationForListNamespaces() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));

Review Comment:
   no need for `Mockito.spy()` here as you're not doing any verifications on 
the `adapter` in this test (and in the other tests)



##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(
+() -> catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "-1")))
+.isInstanceOf(IllegalArgumentException.class)
+.hasMessage("Invalid value for pageSize, must be a positive 
integer");
+  }
+
+  @Test
+  public void testPaginationForListNamespaces() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+int numberOfItems = 100;
+String namespaceName = "newdb";
+
+// create several namespaces for listing and verify
+for (int i = 0; i < numberOfItems; i++) {
+  String nameSpaceName = namespaceName + i;
+  catalog.createNamespace(Namespace.of(nameSpaceName));
+}
+
+List results = catalog.listNamespaces();
+assertThat(results).hasSize(numberOfItems);
+  }
+
+  @Test
+  public void testPaginationForListTables() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));

Review Comment:
   same as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566946785


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -1796,6 +1799,48 @@ public void 
testCatalogExpiredBearerTokenIsRefreshedWithCredential(String oauth2
 any());
   }
 
+
+  @Test
+  public void testCatalogWithPagaintionTokenIssue() {
+//TODO remove this test, Used to highlight issue with namespaces

Review Comment:
   I don't think we should remove this test. We actually want a test that makes 
sure we get the expected behavior



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Core: Fix JDBC Catalog table commit when migrating from schema V0 to V1 (#101111) [iceberg]

2024-04-16 Thread via GitHub


jbonofre opened a new pull request, #10152:
URL: https://github.com/apache/iceberg/pull/10152

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Fix JDBC Catalog table commit when migrating from schema V0 to V1 (#101111) [iceberg]

2024-04-16 Thread via GitHub


jbonofre commented on PR #10152:
URL: https://github.com/apache/iceberg/pull/10152#issuecomment-2058580254

   @nastra this is the backport to the `1.5.x` branch (in preparation for 1.5.1 
release).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Discussion: Next steps / requirements to support `append` files [iceberg-rust]

2024-04-16 Thread via GitHub


Fokko commented on issue #329:
URL: https://github.com/apache/iceberg-rust/issues/329#issuecomment-2058636547

   @marvinlanhenke Sorry for being late to the party here. Appending a file is 
rather straightforward, but all the conditions must be met. This is the 
high-level way of appending a file:
   
   - Write a Parquet file with the field IDs populated.
   - Collect the metrics to populate the statistics in the manifest file. We do 
this in PyIceberg 
[here](https://github.com/apache/iceberg-python/blob/49ac3a27794fc12cfb67b29502ba92b429396201/pyiceberg/io/pyarrow.py#L1433-L1496).
   - Write the snapshot following the concept of a fast-append. A normal append 
will append the new files to an existing manifest, and a fast-append will write 
a new manifest file with the new entries. This is much easier to implement, 
since you don't have to worry about [sequence-number inheritance and 
such](https://iceberg.apache.org/spec/#sequence-number-inheritance).
   - Rewrite the manifest-list to add the newly created manifest.
   - Generate a snapshot summary
   - Update the metadata. When you are using a traditional catalog like Glue 
and Hive, this can be a bit of work. If you use the Iceberg REST catalog, this 
is much easier since it is the responsibility of the REST catalog.
   
   > calling the writer to write the DataFile
   
   > I think this is also what the python implementation does. In 
Transaction.append, it calls _dataframe_to_data_files to generate DataFiles 
based on the pa.Table.
   
   In [PyIceberg we have 
`_dataframe_to_data_files`](https://github.com/apache/iceberg-python/blob/49ac3a27794fc12cfb67b29502ba92b429396201/pyiceberg/table/__init__.py#L2683)
 that writes out the Arrow table to one or more Parquet files. Then we collect 
all the statistics and return a Datafile that can be appended to the table. I 
hope in the future that we can push this down to iceberg-rust :)
   
   > If any error happens during generating metadata relation info like 
manifest etc., as the writer already wrote DataFiles, should we go to delete 
the written DataFiles?
   
   Iceberg Java does this best effort. If it fails, it tries to clean it up, 
but it is always possible that this won't happen (Looking at you OOMs). This is 
where the maintenance tasks kick in, as @sdd already pointed out.
   
   Talking about prioritization: Things can happen in parallel. For example, 
something simpler like updating table properties will make sure that the commit 
path is in place. The Snapshot summary generation can be a PR. The same goes 
for collecting the column metrics.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Doubts about the types supported by Iceberg, Not in line with expectations [iceberg]

2024-04-16 Thread via GitHub


Fokko commented on issue #10153:
URL: https://github.com/apache/iceberg/issues/10153#issuecomment-2058638829

   @madeirak Thanks for reaching out here. Which version of Spark are you using?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] flink:FlinkSink support dynamically changed schema [iceberg]

2024-04-16 Thread via GitHub


Ruees commented on issue #4190:
URL: https://github.com/apache/iceberg/issues/4190#issuecomment-2058647346

   > @leichangqing You can refer to the last two commits of my branch 
https://github.com/lintingbin2009/iceberg/tree/flink-sink-dynamically-change. 
We have put this part of the code in our own production environment and 
executed it for half a year, and it seems that there is no problem so far. https://user-images.githubusercontent.com/5699014/224219390-eac0d0e2-1578-44d8-a6d5-6ba97362e02e.png";>
 Corresponding to 1, 2, 3 in the above picture, you need to prepare:
   > 
   > 1. There needs to be a broadcast node that can subscribe to your schema 
changes.
   > 2. The data processing node can generate RowData according to the latest 
schema processing data.
   > 3. This is based on the above code modification so that the iceberg writer 
node can receive the latest schema and apply it.
   
   
   I tried to pull the Flinksink related modification code for the first commit 
and added a column at the end using Java API in the map operator, but the 
result was not successful. Even after inserting the data successfully, the 
column at the end was still empty
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Doubts about the types supported by Iceberg, Not in line with expectations [iceberg]

2024-04-16 Thread via GitHub


madeirak closed issue #10153: Doubts about the types supported by Iceberg, Not 
in line with expectations
URL: https://github.com/apache/iceberg/issues/10153


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can Iceberg old partition with no data be deleted? [iceberg]

2024-04-16 Thread via GitHub


ajantha-bhat commented on issue #10121:
URL: https://github.com/apache/iceberg/issues/10121#issuecomment-2058678837

   how is the table partitioned? Can you do describe table or show create table 
to show the partition scheme? Also, was there any partition evolution? Are you 
using V1 Iceberg table or v2? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567071915


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -1796,6 +1799,48 @@ public void 
testCatalogExpiredBearerTokenIsRefreshedWithCredential(String oauth2
 any());
   }
 
+
+  @Test
+  public void testCatalogWithPagaintionTokenIssue() {
+//TODO remove this test, Used to highlight issue with namespaces
+String token =
+
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyLCJleHAiOjE5OTk5OTk5OTk5fQ._3k92KJi2NTyTG6V1s2mzJ__GiQtL36DnzsZSkBdYPw";
+Map catalogHeaders = ImmutableMap.of("Authorization", 
"Bearer " + token);
+
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+
+String credential = "catalog:12345";
+Map contextCredentials =
+ImmutableMap.of("token", token, "credential", credential);
+SessionCatalog.SessionContext context =
+new SessionCatalog.SessionContext(
+UUID.randomUUID().toString(), "user", contextCredentials, 
ImmutableMap.of());
+
+RESTCatalog catalog = new RESTCatalog(context, (config) -> adapter);
+catalog.initialize("prod", ImmutableMap.of(CatalogProperties.URI, 
"ignored", "token", token));
+
+Mockito.verify(adapter)
+.execute(
+eq(HTTPMethod.GET),
+eq("v1/config"),
+any(),
+any(),
+eq(ConfigResponse.class),
+eq(catalogHeaders),
+any());
+
+Mockito.verify(adapter)

Review Comment:
   @rahil-c this fails because you're verifying that the namespaces endpoint 
has been called but you're actually not calling `catalog.listNamespaces(..)` in 
this test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567077122


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -1796,6 +1799,48 @@ public void 
testCatalogExpiredBearerTokenIsRefreshedWithCredential(String oauth2
 any());
   }
 
+
+  @Test
+  public void testCatalogWithPagaintionTokenIssue() {
+//TODO remove this test, Used to highlight issue with namespaces

Review Comment:
   alternatively you can also add the mockito verifications on `adapter` to the 
other tests that you added (`testPaginationForListNamespaces()` / 
`testPaginationForListTables()` / `testPaginationForListViews()`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566939429


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(
+() -> catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "-1")))
+.isInstanceOf(IllegalArgumentException.class)
+.hasMessage("Invalid value for pageSize, must be a positive 
integer");
+  }
+
+  @Test
+  public void testPaginationForListNamespaces() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));

Review Comment:
   no need for `Mockito.spy()` here as you're not doing any verifications on 
the `adapter` in this test (and in the other tests)



##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(
+() -> catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "-1")))
+.isInstanceOf(IllegalArgumentException.class)
+.hasMessage("Invalid value for pageSize, must be a positive 
integer");
+  }
+
+  @Test
+  public void testPaginationForListNamespaces() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+int numberOfItems = 100;
+String namespaceName = "newdb";
+
+// create several namespaces for listing and verify
+for (int i = 0; i < numberOfItems; i++) {
+  String nameSpaceName = namespaceName + i;
+  catalog.createNamespace(Namespace.of(nameSpaceName));
+}
+
+List results = catalog.listNamespaces();
+assertThat(results).hasSize(numberOfItems);
+  }
+
+  @Test
+  public void testPaginationForListTables() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));

Review Comment:
   same as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567077122


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -1796,6 +1799,48 @@ public void 
testCatalogExpiredBearerTokenIsRefreshedWithCredential(String oauth2
 any());
   }
 
+
+  @Test
+  public void testCatalogWithPagaintionTokenIssue() {
+//TODO remove this test, Used to highlight issue with namespaces

Review Comment:
   alternatively you can also add the mockito verifications on `adapter` to the 
other tests that you added (`testPaginationForListNamespaces()` / 
`testPaginationForListTables()` / `testPaginationForListViews()`).
   
   For `testPaginationForListNamespaces()` this test would do some additional 
checks after creating all namespaces and listing them. Those verification 
checks would be:
   
   // verify config endpoint was called (which is e.g. what you already have in 
L1822ff)
   // verify create namespace was called X times
   // verify list namespaces was called. Here is where you would verify that 
the pagination token is properly exchanged via query params



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1566940267


##
core/src/test/java/org/apache/iceberg/rest/TestRESTViewCatalog.java:
##
@@ -144,6 +151,36 @@ public void closeCatalog() throws Exception {
 }
   }
 
+  @Test
+  public void testPaginationForListViews() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));

Review Comment:
   same as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567097286


##
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##
@@ -278,14 +286,26 @@ public void setConf(Object newConf) {
   @Override
   public List listTables(SessionContext context, Namespace 
ns) {
 checkNamespaceIsValid(ns);
+Map queryParams = Maps.newHashMap();
+List tables = Lists.newArrayList();
+String pageToken = "";
+if (restPageSize != null) {
+  queryParams.put("pageToken", pageToken);
+  queryParams.put("pageSize", restPageSize);
+}
+do {
+  ListTablesResponse response =
+  client.get(
+  paths.tables(ns),
+  queryParams,
+  ListTablesResponse.class,
+  headers(context),
+  ErrorHandlers.namespaceErrorHandler());
+  pageToken = response.nextPageToken();

Review Comment:
   I've replied in https://github.com/apache/iceberg/pull/9782/files#r1567071915



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Remove extraneous trailing slash in table location [iceberg-python]

2024-04-16 Thread via GitHub


Vitalii0-o commented on issue #606:
URL: https://github.com/apache/iceberg-python/issues/606#issuecomment-2058724052

   Traceback (most recent call last):
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/destinations/sql_client.py",
 line 242, in _wrap_gen
   return (yield from f(self, *args, **kwargs))
   ^^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/destinations/impl/athena/athena.py",
 line 296, in execute_query
   cursor.execute(query_line, db_args)
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/pyathena/cursor.py", 
line 108, in execute
   raise OperationalError(query_execution.state_change_reason)
   pyathena.error.OperationalError: GENERIC_INTERNAL_ERROR: 
io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: 
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: 123; S3 Extended Request ID: 123=; Proxy: null), S3 Extended Request ID: 
123= (Bucket: bucket, Key: 
facebook/123/bronze_facebook_test1/_dlt_pipeline_state/metadata/123.metadata.json)
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py",
 line 699, in sync_destination
   remote_state = self._restore_state_from_destination()
  ^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/pipeline/pipeline.py",
 line 1420, in _restore_state_from_destination
   state = load_pipeline_state_from_destination(self.pipeline_name, 
job_client)
   

 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/pipeline/state_sync.py",
 line 139, in load_pipeline_state_from_destination
   state = client.get_stored_state(pipeline_name)
   ^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/destinations/job_client_impl.py",
 line 368, in get_stored_state
   with self.sql_client.execute_query(query, pipeline_name) as cur:
 File "/usr/local/lib/python3.11/contextlib.py", line 137, in __enter__
   return next(self.gen)
  ^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/destinations/sql_client.py",
 line 244, in _wrap_gen
   raise self._make_database_exception(ex)
   dlt.destinations.exceptions.DatabaseTerminalException: 
GENERIC_INTERNAL_ERROR: 
io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: 
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: 123; S3 Extended Request ID: 123=; Proxy: null), S3 Extended Request ID: 
123= (Bucket: bucket, Key: 
facebook/123/bronze_facebook_test1/_dlt_pipeline_state/metadata/123.metadata.json)
   The above exception was the direct cause of the following exception:
   Traceback (most recent call last):
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 433, in _execute_task
   result = execute_callable(context=context, **execute_callable_kwargs)

 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/operators/python.py",
 line 199, in execute
   return_value = self.execute_callable()
  ^^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/operators/python.py",
 line 216, in execute_callable
   return self.python_callable(*self.op_args, **self.op_kwargs)
  ^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/helpers/airflow_helper.py",
 line 273, in _run
   for attempt in self.retry_policy.copy(
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", 
line 347, in __iter__
   do = self.iter(retry_state=retry_state)
^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", 
line 314, in iter
   return fut.result()
  
 File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in 
result
   return self.__get_result()
  ^^^
 File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in 
__get_result
   raise self._exception
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/dlt/helpers/airflow_helper.py",
 line 283, in _run
   load_info = task_pipeline.run(

Re: [I] Remove extraneous trailing slash in table location [iceberg-python]

2024-04-16 Thread via GitHub


Vitalii0-o commented on issue #606:
URL: https://github.com/apache/iceberg-python/issues/606#issuecomment-2058726484

   This error occurred to me about a week ago, but there were no changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can Iceberg old partition with no data be deleted? [iceberg]

2024-04-16 Thread via GitHub


madeirak commented on issue #10121:
URL: https://github.com/apache/iceberg/issues/10121#issuecomment-2058730627

   > how is the table partitioned? Can you do describe table or show create 
table to show the partition scheme? Also, was there any partition evolution? 
Are you using V1 Iceberg table or v2?
   
   thx for reply, this table has indeed changed partitions.
   Before, show create table on this table , and get "PARTITIONED BY (old_xxx, 
old_yyy)", with no data in the table.
   Then, the partition fields change to  "PARTITIONED BY (new_partition_1, 
new_partition_2)"
   
   I know this form is used for data co-existence between new and old 
partitions,with old partition field = null.
   but in my case, old partition field has no data in it, I am worried that a 
long path will affect the efficiency of READ and WRITE, 
   **so I wonder if there is way to delete the old partition path with no data, 
or the impact of a long path on the efficiency of READ and WRITE**?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can Iceberg old partition with no data be deleted? [iceberg]

2024-04-16 Thread via GitHub


ajantha-bhat commented on issue #10121:
URL: https://github.com/apache/iceberg/issues/10121#issuecomment-2058743091

   I think Iceberg v1 tables has a concept of void transform and it will keep 
the dropped partition still as a void after partition evolution. 
   
   Try changing the table default format version table property to 2 and try 
the partition evolution freshly with rewrite data files again. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can Iceberg old partition with no data be deleted? [iceberg]

2024-04-16 Thread via GitHub


ajantha-bhat commented on issue #10121:
URL: https://github.com/apache/iceberg/issues/10121#issuecomment-2058745161

   Lastly I think there is no way to delete empty folders in Iceberg. Only 
files are tracked by Iceberg metadata. So, GC operations only clean up the 
files. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Iceberg Spark Extensions conflict with Paimon [iceberg]

2024-04-16 Thread via GitHub


ajantha-bhat commented on issue #10143:
URL: https://github.com/apache/iceberg/issues/10143#issuecomment-2058746453

   can you close this if it is a duplicate of 
https://github.com/apache/paimon/issues/3212 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-16 Thread via GitHub


Fokko commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1567138774


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,317 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub(crate) enum OpLiteral<'a> {
+Single(&'a Datum),
+Set(&'a FnvHashSet),
+}
+
+/// A visitor for [`BoundPredicate`]s. Visits in post-order.
+pub trait BoundPredicateVisitor {
+/// The return type of this visitor
+type T;
+
+/// Called after an `AlwaysTrue` predicate is visited
+fn always_true(&mut self) -> Result;
+
+/// Called after an `AlwaysFalse` predicate is visited
+fn always_false(&mut self) -> Result;
+
+/// Called after an `And` predicate is visited
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+
+/// Called after an `Or` predicate is visited
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+
+/// Called after a `Not` predicate is visited
+fn not(&mut self, inner: Self::T) -> Result;
+
+/// Called after visiting a UnaryPredicate, BinaryPredicate,
+/// or SetPredicate. Passes the predicate's operator in all cases,
+/// as well as the term and literals in the case of binary and aet
+/// predicates.
+fn op(

Review Comment:
   I'm not sold on the iterator idea either, it looks very fancy, but iterating 
over a set feels awkward to me. 
   
   > If you're not happy with this approach @Fokko, I'll wait until the three 
of you are in agreement and fix up to whatever you all decide on. I'm happy 
with the approach as-is after switching my original design to the approach 
proposed by @liurenjie1024 and @marvinlanhenke.
   
   Well, I'm not the one to decide here, just weighing in my opinion :D I have 
a strong preference for the original approach as suggested in 
https://github.com/apache/iceberg-rust/pull/334 I think this leverages the 
visitor as much as possible, and also makes sure that all the operations are 
covered.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Flaky spark-3.3-scala-2.13-java-8 tests due to `No space left on device` [iceberg]

2024-04-16 Thread via GitHub


manuzhang commented on issue #10040:
URL: https://github.com/apache/iceberg/issues/10040#issuecomment-2058792129

   Can this be related to https://issues.apache.org/jira/browse/SPARK-39195? 
@aokolnychyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can Iceberg old partition with no data be deleted? [iceberg]

2024-04-16 Thread via GitHub


madeirak commented on issue #10121:
URL: https://github.com/apache/iceberg/issues/10121#issuecomment-2058796187

   > Lastly I think there is no way to delete empty folders in Iceberg. Only 
files are tracked by Iceberg metadata. So, GC operations only clean up the 
files.
   
   thx set version of table to 2, it does fix my problem😘


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can Iceberg old partition with no data be deleted? [iceberg]

2024-04-16 Thread via GitHub


madeirak closed issue #10121: Can Iceberg old partition with no data  be 
deleted?
URL: https://github.com/apache/iceberg/issues/10121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Tests: Unify the test catalog setting [iceberg-python]

2024-04-16 Thread via GitHub


frankliee commented on code in PR #609:
URL: https://github.com/apache/iceberg-python/pull/609#discussion_r1566818851


##
tests/conftest.py:
##
@@ -2144,3 +2144,31 @@ def arrow_table_with_only_nulls(pa_schema: "pa.Schema") 
-> "pa.Table":
 import pyarrow as pa
 
 return pa.Table.from_pylist([{}, {}], schema=pa_schema)
+
+
+@pytest.fixture()
+def catalog_rest() -> Catalog:
+return load_catalog(
+"local",
+**{
+"type": "rest",
+"uri": "http://localhost:8181";,
+"s3.endpoint": "http://localhost:9000";,
+"s3.access-key-id": "admin",
+"s3.secret-access-key": "password",
+},
+)
+
+
+@pytest.fixture()
+def catalog_hive() -> Catalog:

Review Comment:
   > @frankliee Thanks for working on this! We also have `session_catalog` and 
`session_hive_catalog` in `conftest`
   > 
   > 
https://github.com/apache/iceberg-python/blob/aa4654368e54bf84933279179519b299b5910493/tests/conftest.py#L1997-L2022
   > 
   > 
   > and I think they can be unified with these too (using scope `session`). 
WDYT?
   
   Nice suggestion, I will unify all of them as `session_catalog` and 
`session_hive_catalog` in this PR after a while.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Tests: Unify the test catalog setting [iceberg-python]

2024-04-16 Thread via GitHub


frankliee commented on code in PR #609:
URL: https://github.com/apache/iceberg-python/pull/609#discussion_r1566818851


##
tests/conftest.py:
##
@@ -2144,3 +2144,31 @@ def arrow_table_with_only_nulls(pa_schema: "pa.Schema") 
-> "pa.Table":
 import pyarrow as pa
 
 return pa.Table.from_pylist([{}, {}], schema=pa_schema)
+
+
+@pytest.fixture()
+def catalog_rest() -> Catalog:
+return load_catalog(
+"local",
+**{
+"type": "rest",
+"uri": "http://localhost:8181";,
+"s3.endpoint": "http://localhost:9000";,
+"s3.access-key-id": "admin",
+"s3.secret-access-key": "password",
+},
+)
+
+
+@pytest.fixture()
+def catalog_hive() -> Catalog:

Review Comment:
   > @frankliee Thanks for working on this! We also have `session_catalog` and 
`session_hive_catalog` in `conftest`
   > 
   > 
https://github.com/apache/iceberg-python/blob/aa4654368e54bf84933279179519b299b5910493/tests/conftest.py#L1997-L2022
   > 
   > 
   > and I think they can be unified with these too (using scope `session`). 
WDYT?
   
   Nice suggestion, I have unified all of them as `session_catalog` and 
`session_hive_catalog` in this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: turn off the stats gathering when iceberg.hive.keep.stats is false [iceberg]

2024-04-16 Thread via GitHub


deniskuzZ commented on PR #10148:
URL: https://github.com/apache/iceberg/pull/10148#issuecomment-2058916040

   AFAIK, autogater doesn't even work in Hive. After some operations like 
insert, we issue an extra stats update task that persists column stats changes 
either to the HMS or puffin file ("hive.iceberg.stats.source", "iceberg")


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] spark.table() raises warn: Unclosed S3FileIO instance in HadoopTableOperations [iceberg]

2024-04-16 Thread via GitHub


KingLommel commented on issue #10145:
URL: https://github.com/apache/iceberg/issues/10145#issuecomment-2058941436

   Thanks @ajantha-bhat. 
   As soon as I have time to dive deep into this I will give it a try!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1567303582


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,317 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub(crate) enum OpLiteral<'a> {
+Single(&'a Datum),
+Set(&'a FnvHashSet),
+}
+
+/// A visitor for [`BoundPredicate`]s. Visits in post-order.
+pub trait BoundPredicateVisitor {
+/// The return type of this visitor
+type T;
+
+/// Called after an `AlwaysTrue` predicate is visited
+fn always_true(&mut self) -> Result;
+
+/// Called after an `AlwaysFalse` predicate is visited
+fn always_false(&mut self) -> Result;
+
+/// Called after an `And` predicate is visited
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+
+/// Called after an `Or` predicate is visited
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+
+/// Called after a `Not` predicate is visited
+fn not(&mut self, inner: Self::T) -> Result;
+
+/// Called after visiting a UnaryPredicate, BinaryPredicate,
+/// or SetPredicate. Passes the predicate's operator in all cases,
+/// as well as the term and literals in the case of binary and aet
+/// predicates.
+fn op(

Review Comment:
   I'm also +1 for the original approach after comparing these two approaches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `BoundPredicateVisitor` (alternate version) [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 commented on PR #334:
URL: https://github.com/apache/iceberg-rust/pull/334#issuecomment-2059023496

   I'll merge this given we have reached some consensus 
[here](https://github.com/apache/iceberg-rust/pull/320#discussion_r1565780931), 
thanks @Fokko for review, and @sdd for this effort!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `BoundPredicateVisitor` (alternate version) [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 merged PR #334:
URL: https://github.com/apache/iceberg-rust/pull/334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Lazily compute & cache hashCode in CharSequenceWrapper [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #10023:
URL: https://github.com/apache/iceberg/pull/10023#discussion_r1567311373


##
api/src/main/java/org/apache/iceberg/util/CharSequenceWrapper.java:
##
@@ -29,13 +29,16 @@ public static CharSequenceWrapper wrap(CharSequence seq) {
   }
 
   private CharSequence wrapped;
+  // lazily computed & cached hashCode
+  private transient int hashCode = 0;

Review Comment:
   ah good point, thanks for finding that. Instead of making this an `Integer` 
I actually introduced a boolean flag. The `String' implementation does the same 
thing so that the hash isn't re-calculated when the hash is actually 0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 closed pull request #320: Add `BoundPredicateVisitor` trait
URL: https://github.com/apache/iceberg-rust/pull/320


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 commented on PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#issuecomment-2059024512

   Close by #334 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Glue Catalog - table operations (3/3) [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 commented on code in PR #314:
URL: https://github.com/apache/iceberg-rust/pull/314#discussion_r1565832014


##
crates/catalog/glue/src/utils.rs:
##
@@ -151,6 +205,65 @@ pub(crate) fn validate_namespace(namespace: 
&NamespaceIdent) -> Result {
 Ok(name)
 }
 
+/// Get default table location from `Namespace` properties
+pub(crate) fn get_default_table_location(
+namespace: &Namespace,
+table_name: impl AsRef,
+warehouse: impl AsRef,
+) -> String {
+let properties = namespace.properties();
+
+let location = match properties.get(LOCATION) {

Review Comment:
   This seems incorrect to me, see [python 
implementation](https://github.com/apache/iceberg-python/blob/a892309936effa7ec575195ad3be70193e82d704/pyiceberg/catalog/__init__.py#L762).
 
   1. The `LOCATION` property is database location, if it exists, it should be 
"{LOCATION}/{tablename}"
   2. Otherwise, we should use "{warehouse}/{database}.db/{table}"



##
crates/catalog/glue/src/catalog.rs:
##
@@ -310,31 +327,282 @@ impl Catalog for GlueCatalog {
 Ok(table_list)
 }
 
+/// Creates a new table within a specified namespace using the provided
+/// table creation settings.
+///
+/// # Returns
+/// A `Result` wrapping a `Table` object representing the newly created
+/// table.
+///
+/// # Errors
+/// This function may return an error in several cases, including invalid
+/// namespace identifiers, failure to determine a default storage location,
+/// issues generating or writing table metadata, and errors communicating
+/// with the Glue Catalog.
 async fn create_table(
 &self,
-_namespace: &NamespaceIdent,
-_creation: TableCreation,
+namespace: &NamespaceIdent,
+creation: TableCreation,
 ) -> Result {
-todo!()
+let db_name = validate_namespace(namespace)?;
+let table_name = creation.name.clone();
+
+let location = match &creation.location {
+Some(location) => location.clone(),
+None => {
+let ns = self.get_namespace(namespace).await?;
+get_default_table_location(&ns, &table_name, 
&self.config.warehouse)
+}
+};
+
+let metadata = 
TableMetadataBuilder::from_table_creation(creation)?.build()?;
+let metadata_location = create_metadata_location(&location, 0)?;
+
+let mut file = self
+.file_io
+.new_output(&metadata_location)?
+.writer()
+.await?;
+file.write_all(&serde_json::to_vec(&metadata)?).await?;
+file.shutdown().await?;
+
+let glue_table = convert_to_glue_table(
+&table_name,
+metadata_location.clone(),
+&metadata,
+metadata.properties(),
+None,
+)?;
+
+let builder = self
+.client
+.0
+.create_table()
+.database_name(&db_name)
+.table_input(glue_table);
+let builder = with_catalog_id!(builder, self.config);
+
+builder.send().await.map_err(from_aws_sdk_error)?;
+
+let table = Table::builder()
+.file_io(self.file_io())
+.metadata_location(metadata_location)
+.metadata(metadata)
+.identifier(TableIdent::new(NamespaceIdent::new(db_name), 
table_name))
+.build();
+
+Ok(table)
 }
 
-async fn load_table(&self, _table: &TableIdent) -> Result {
-todo!()
+/// Loads a table from the Glue Catalog and constructs a `Table` object
+/// based on its metadata.
+///
+/// # Returns
+/// A `Result` wrapping a `Table` object that represents the loaded table.
+///
+/// # Errors
+/// This function may return an error in several scenarios, including:
+/// - Failure to validate the namespace.
+/// - Failure to retrieve the table from the Glue Catalog.
+/// - Absence of metadata location information in the table's properties.
+/// - Issues reading or deserializing the table's metadata file.
+async fn load_table(&self, table: &TableIdent) -> Result {
+let db_name = validate_namespace(table.namespace())?;
+let table_name = table.name();
+
+let builder = self
+.client
+.0
+.get_table()
+.database_name(&db_name)
+.name(table_name);
+let builder = with_catalog_id!(builder, self.config);
+
+let glue_table_output = 
builder.send().await.map_err(from_aws_sdk_error)?;
+
+match glue_table_output.table() {
+None => Err(Error::new(
+ErrorKind::Unexpected,
+format!(
+"'Table' object for database: {} and table: {} does not 
exist",
+db_name, table_name
+),
+)),
+Some(table) => {
+let 

Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 commented on issue #177:
URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-2059068941

   > Is this issue still valid given #185 ? This seems addressed. Apologies for 
the noise, surfing good first issues for something to get started on.
   
   @tabmatfournier Yeah, I think it's closed, thanks for reporting this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 closed issue #177: Remove `unwrap()` in 
`ManifestListWriter.close()`
URL: https://github.com/apache/iceberg-rust/issues/177


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Lazily compute & cache hashCode in CharSequenceWrapper [iceberg]

2024-04-16 Thread via GitHub


nastra commented on code in PR #10023:
URL: https://github.com/apache/iceberg/pull/10023#discussion_r1567311373


##
api/src/main/java/org/apache/iceberg/util/CharSequenceWrapper.java:
##
@@ -29,13 +29,16 @@ public static CharSequenceWrapper wrap(CharSequence seq) {
   }
 
   private CharSequence wrapped;
+  // lazily computed & cached hashCode
+  private transient int hashCode = 0;

Review Comment:
   ah good point, thanks for finding that. Instead of making this an `Integer` 
I actually introduced a boolean flag. The `String` implementation does the same 
thing so that the hash isn't re-calculated when the hash is actually 0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Glue Catalog - table operations (3/3) [iceberg-rust]

2024-04-16 Thread via GitHub


marvinlanhenke commented on code in PR #314:
URL: https://github.com/apache/iceberg-rust/pull/314#discussion_r1567401176


##
crates/catalog/glue/src/utils.rs:
##
@@ -151,6 +205,65 @@ pub(crate) fn validate_namespace(namespace: 
&NamespaceIdent) -> Result {
 Ok(name)
 }
 
+/// Get default table location from `Namespace` properties
+pub(crate) fn get_default_table_location(
+namespace: &Namespace,
+table_name: impl AsRef,
+warehouse: impl AsRef,
+) -> String {
+let properties = namespace.properties();
+
+let location = match properties.get(LOCATION) {

Review Comment:
   Yes, you're right. 
   
   It seems that I missed `{database}.db` which would, in our case, equate to 
the `db_name` (NamespaceIdent(Vec)).
   
   Should be easy to fix. Thanks for spotting this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Glue Catalog - table operations (3/3) [iceberg-rust]

2024-04-16 Thread via GitHub


marvinlanhenke commented on code in PR #314:
URL: https://github.com/apache/iceberg-rust/pull/314#discussion_r1567401176


##
crates/catalog/glue/src/utils.rs:
##
@@ -151,6 +205,65 @@ pub(crate) fn validate_namespace(namespace: 
&NamespaceIdent) -> Result {
 Ok(name)
 }
 
+/// Get default table location from `Namespace` properties
+pub(crate) fn get_default_table_location(
+namespace: &Namespace,
+table_name: impl AsRef,
+warehouse: impl AsRef,
+) -> String {
+let properties = namespace.properties();
+
+let location = match properties.get(LOCATION) {

Review Comment:
   Yes, you're right. 
   
   It seems that I missed `{database}.db` which would, in our case, equate to 
the `db_name`.
   
   Should be easy to fix. Thanks for spotting this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Glue Catalog - table operations (3/3) [iceberg-rust]

2024-04-16 Thread via GitHub


marvinlanhenke commented on code in PR #314:
URL: https://github.com/apache/iceberg-rust/pull/314#discussion_r1567403891


##
crates/catalog/glue/src/catalog.rs:
##
@@ -310,31 +327,282 @@ impl Catalog for GlueCatalog {
 Ok(table_list)
 }
 
+/// Creates a new table within a specified namespace using the provided
+/// table creation settings.
+///
+/// # Returns
+/// A `Result` wrapping a `Table` object representing the newly created
+/// table.
+///
+/// # Errors
+/// This function may return an error in several cases, including invalid
+/// namespace identifiers, failure to determine a default storage location,
+/// issues generating or writing table metadata, and errors communicating
+/// with the Glue Catalog.
 async fn create_table(
 &self,
-_namespace: &NamespaceIdent,
-_creation: TableCreation,
+namespace: &NamespaceIdent,
+creation: TableCreation,
 ) -> Result {
-todo!()
+let db_name = validate_namespace(namespace)?;
+let table_name = creation.name.clone();
+
+let location = match &creation.location {
+Some(location) => location.clone(),
+None => {
+let ns = self.get_namespace(namespace).await?;
+get_default_table_location(&ns, &table_name, 
&self.config.warehouse)
+}
+};
+
+let metadata = 
TableMetadataBuilder::from_table_creation(creation)?.build()?;
+let metadata_location = create_metadata_location(&location, 0)?;
+
+let mut file = self
+.file_io
+.new_output(&metadata_location)?
+.writer()
+.await?;
+file.write_all(&serde_json::to_vec(&metadata)?).await?;
+file.shutdown().await?;
+
+let glue_table = convert_to_glue_table(
+&table_name,
+metadata_location.clone(),
+&metadata,
+metadata.properties(),
+None,
+)?;
+
+let builder = self
+.client
+.0
+.create_table()
+.database_name(&db_name)
+.table_input(glue_table);
+let builder = with_catalog_id!(builder, self.config);
+
+builder.send().await.map_err(from_aws_sdk_error)?;
+
+let table = Table::builder()
+.file_io(self.file_io())
+.metadata_location(metadata_location)
+.metadata(metadata)
+.identifier(TableIdent::new(NamespaceIdent::new(db_name), 
table_name))
+.build();
+
+Ok(table)
 }
 
-async fn load_table(&self, _table: &TableIdent) -> Result {
-todo!()
+/// Loads a table from the Glue Catalog and constructs a `Table` object
+/// based on its metadata.
+///
+/// # Returns
+/// A `Result` wrapping a `Table` object that represents the loaded table.
+///
+/// # Errors
+/// This function may return an error in several scenarios, including:
+/// - Failure to validate the namespace.
+/// - Failure to retrieve the table from the Glue Catalog.
+/// - Absence of metadata location information in the table's properties.
+/// - Issues reading or deserializing the table's metadata file.
+async fn load_table(&self, table: &TableIdent) -> Result {
+let db_name = validate_namespace(table.namespace())?;
+let table_name = table.name();
+
+let builder = self
+.client
+.0
+.get_table()
+.database_name(&db_name)
+.name(table_name);
+let builder = with_catalog_id!(builder, self.config);
+
+let glue_table_output = 
builder.send().await.map_err(from_aws_sdk_error)?;
+
+match glue_table_output.table() {
+None => Err(Error::new(
+ErrorKind::Unexpected,
+format!(
+"'Table' object for database: {} and table: {} does not 
exist",

Review Comment:
   Probably wanted to remove both single quotes here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Glue Catalog - table operations (3/3) [iceberg-rust]

2024-04-16 Thread via GitHub


marvinlanhenke commented on code in PR #314:
URL: https://github.com/apache/iceberg-rust/pull/314#discussion_r1567406955


##
crates/catalog/glue/src/catalog.rs:
##
@@ -310,31 +327,282 @@ impl Catalog for GlueCatalog {
 Ok(table_list)
 }
 
+/// Creates a new table within a specified namespace using the provided
+/// table creation settings.
+///
+/// # Returns
+/// A `Result` wrapping a `Table` object representing the newly created
+/// table.
+///
+/// # Errors
+/// This function may return an error in several cases, including invalid
+/// namespace identifiers, failure to determine a default storage location,
+/// issues generating or writing table metadata, and errors communicating
+/// with the Glue Catalog.
 async fn create_table(
 &self,
-_namespace: &NamespaceIdent,
-_creation: TableCreation,
+namespace: &NamespaceIdent,
+creation: TableCreation,
 ) -> Result {
-todo!()
+let db_name = validate_namespace(namespace)?;
+let table_name = creation.name.clone();
+
+let location = match &creation.location {
+Some(location) => location.clone(),
+None => {
+let ns = self.get_namespace(namespace).await?;
+get_default_table_location(&ns, &table_name, 
&self.config.warehouse)
+}
+};
+
+let metadata = 
TableMetadataBuilder::from_table_creation(creation)?.build()?;
+let metadata_location = create_metadata_location(&location, 0)?;
+
+let mut file = self
+.file_io
+.new_output(&metadata_location)?
+.writer()
+.await?;
+file.write_all(&serde_json::to_vec(&metadata)?).await?;
+file.shutdown().await?;
+
+let glue_table = convert_to_glue_table(
+&table_name,
+metadata_location.clone(),
+&metadata,
+metadata.properties(),
+None,
+)?;
+
+let builder = self
+.client
+.0
+.create_table()
+.database_name(&db_name)
+.table_input(glue_table);
+let builder = with_catalog_id!(builder, self.config);
+
+builder.send().await.map_err(from_aws_sdk_error)?;
+
+let table = Table::builder()
+.file_io(self.file_io())
+.metadata_location(metadata_location)
+.metadata(metadata)
+.identifier(TableIdent::new(NamespaceIdent::new(db_name), 
table_name))
+.build();
+
+Ok(table)
 }
 
-async fn load_table(&self, _table: &TableIdent) -> Result {
-todo!()
+/// Loads a table from the Glue Catalog and constructs a `Table` object
+/// based on its metadata.
+///
+/// # Returns
+/// A `Result` wrapping a `Table` object that represents the loaded table.
+///
+/// # Errors
+/// This function may return an error in several scenarios, including:
+/// - Failure to validate the namespace.
+/// - Failure to retrieve the table from the Glue Catalog.
+/// - Absence of metadata location information in the table's properties.
+/// - Issues reading or deserializing the table's metadata file.
+async fn load_table(&self, table: &TableIdent) -> Result {
+let db_name = validate_namespace(table.namespace())?;
+let table_name = table.name();
+
+let builder = self
+.client
+.0
+.get_table()
+.database_name(&db_name)
+.name(table_name);
+let builder = with_catalog_id!(builder, self.config);
+
+let glue_table_output = 
builder.send().await.map_err(from_aws_sdk_error)?;
+
+match glue_table_output.table() {
+None => Err(Error::new(
+ErrorKind::Unexpected,
+format!(
+"'Table' object for database: {} and table: {} does not 
exist",
+db_name, table_name
+),
+)),
+Some(table) => {
+let metadata_location = 
get_metadata_location(&table.parameters)?;

Review Comment:
   Yes, I kind of 'ignored' this on purpose - same check is missing for the 
hive catalog as well; 
   I guess we can handle those in separate PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Glue Catalog - table operations (3/3) [iceberg-rust]

2024-04-16 Thread via GitHub


marvinlanhenke commented on PR #314:
URL: https://github.com/apache/iceberg-rust/pull/314#issuecomment-2059160129

   > Thanks for @marvinlanhenke for this great pr, it looks great! Sorry for 
late reply, I have been busy lately.
   
   @liurenjie1024 
   No worries, and thanks for the review.
   
   I'll most likely fix those remaining suggestions tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Fix JDBC Catalog table commit when migrating from schema V0 to V1 (#101111) [iceberg]

2024-04-16 Thread via GitHub


nastra merged PR #10152:
URL: https://github.com/apache/iceberg/pull/10152


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Discussion: Next steps / requirements to support `append` files [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 commented on issue #329:
URL: https://github.com/apache/iceberg-rust/issues/329#issuecomment-2059202270

   I think to implement appending data file, there are two main tasks:
   
   1. Implement transaction api to append data file
   2. Implement file writer to write record batches to parquet files, and 
generate data file structs. 
   
   Currently there is no design or plan for 1, and @ZENOTME is working on 2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Sql catalog [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 commented on PR #229:
URL: https://github.com/apache/iceberg-rust/pull/229#issuecomment-2059212759

   cc @JanKaul Is this pr ready for review or you need to do more updates?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Convert predicate to arrow filter and push down to parquet reader [iceberg-rust]

2024-04-16 Thread via GitHub


liurenjie1024 commented on code in PR #295:
URL: https://github.com/apache/iceberg-rust/pull/295#discussion_r1567468384


##
crates/iceberg/src/arrow/reader.rs:
##
@@ -186,4 +216,399 @@ impl ArrowReader {
 Ok(ProjectionMask::leaves(parquet_schema, indices))
 }
 }
+
+fn get_row_filter(&self, parquet_schema: &SchemaDescriptor) -> 
Result> {
+if let Some(predicates) = &self.predicates {
+let field_id_map = self.build_field_id_map(parquet_schema)?;
+
+// Collect Parquet column indices from field ids
+let mut collector = CollectFieldIdVisitor { field_ids: vec![] };
+visit_predicate(&mut collector, predicates).unwrap();

Review Comment:
   We don't need to do this for every file?



##
crates/iceberg/src/arrow/reader.rs:
##
@@ -186,4 +216,399 @@ impl ArrowReader {
 Ok(ProjectionMask::leaves(parquet_schema, indices))
 }
 }
+
+fn get_row_filter(&self, parquet_schema: &SchemaDescriptor) -> 
Result> {
+if let Some(predicates) = &self.predicates {
+let field_id_map = self.build_field_id_map(parquet_schema)?;
+
+// Collect Parquet column indices from field ids
+let mut collector = CollectFieldIdVisitor { field_ids: vec![] };
+visit_predicate(&mut collector, predicates).unwrap();
+let column_indices = collector
+.field_ids
+.iter()
+.map(|field_id| {
+field_id_map.get(field_id).cloned().ok_or_else(|| {
+Error::new(ErrorKind::DataInvalid, "Field id not found 
in schema")
+})
+})
+.collect::>>()?;
+
+// Convert BoundPredicates to ArrowPredicates
+let mut converter = PredicateConverter {
+columns: &column_indices,
+projection_mask: ProjectionMask::leaves(parquet_schema, 
column_indices.clone()),
+parquet_schema,
+column_map: &field_id_map,
+};
+let arrow_predicate = visit_predicate(&mut converter, predicates)?;
+Ok(Some(RowFilter::new(vec![arrow_predicate])))
+} else {
+Ok(None)
+}
+}
+
+/// Build the map of field id to Parquet column index in the schema.
+fn build_field_id_map(&self, parquet_schema: &SchemaDescriptor) -> 
Result> {

Review Comment:
   ```suggestion
   fn build_field_id_map(parquet_schema: &SchemaDescriptor) -> 
Result> {
   ```
   I think we don't need `self` parameter?



##
crates/iceberg/src/arrow/reader.rs:
##
@@ -186,4 +216,399 @@ impl ArrowReader {
 Ok(ProjectionMask::leaves(parquet_schema, indices))
 }
 }
+
+fn get_row_filter(&self, parquet_schema: &SchemaDescriptor) -> 
Result> {
+if let Some(predicates) = &self.predicates {
+let field_id_map = self.build_field_id_map(parquet_schema)?;
+
+// Collect Parquet column indices from field ids
+let mut collector = CollectFieldIdVisitor { field_ids: vec![] };
+visit_predicate(&mut collector, predicates).unwrap();
+let column_indices = collector
+.field_ids
+.iter()
+.map(|field_id| {
+field_id_map.get(field_id).cloned().ok_or_else(|| {
+Error::new(ErrorKind::DataInvalid, "Field id not found 
in schema")
+})
+})
+.collect::>>()?;
+
+// Convert BoundPredicates to ArrowPredicates
+let mut converter = PredicateConverter {
+columns: &column_indices,
+projection_mask: ProjectionMask::leaves(parquet_schema, 
column_indices.clone()),
+parquet_schema,
+column_map: &field_id_map,
+};
+let arrow_predicate = visit_predicate(&mut converter, predicates)?;
+Ok(Some(RowFilter::new(vec![arrow_predicate])))
+} else {
+Ok(None)
+}
+}
+
+/// Build the map of field id to Parquet column index in the schema.
+fn build_field_id_map(&self, parquet_schema: &SchemaDescriptor) -> 
Result> {
+let mut column_map = HashMap::new();
+for (idx, field) in parquet_schema.columns().iter().enumerate() {
+let field_type = field.self_type();
+match field_type {
+ParquetType::PrimitiveType { basic_info, .. } => {
+if !basic_info.has_id() {
+return Err(Error::new(
+ErrorKind::DataInvalid,
+format!(
+"Leave column {:?} in schema doesn't have 
field id",
+field_type
+),
+));
+}
+

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-16 Thread via GitHub


rodmeneses commented on PR #10112:
URL: https://github.com/apache/iceberg/pull/10112#issuecomment-2059306340

   > > @nastra: Any comments? I would like to merge this soon, as any merge to 
Flink code path will make this PR stale, and @rodmeneses needs to recreate the 
whole PR.
   > > Thanks,
   > 
   > I think there's still an issue as there are a bunch of files/diffs that 
are because Flink 1.16 is being removed and git detects it as a move (with some 
additional changes). This can also be seen when looking at the file path, where 
a Flink 1.16 file is moved to a Flink 1.19 file, while also adding some diffs 
where it's not clear why the diff is there in the first place.
   > 
   > My suggestion would be to do the actual removal of the 1.16 directory as a 
separate PR in an immediate follow-up. This would mean to skip tests 8 + 9 from 
the PR description, but it's fine to update gradle files to not build 1.16 
anymore. Thoughts on the suggestion?
   
   Hi @nastra thanks for your review and comment,
   > I think there's still an issue as there are a bunch of files/diffs that 
are because Flink 1.16 is being removed and git detects it as a move (with some 
additional changes).
   This is because you are looking at the changes in the whole PR. If you se 
each of the 4 commits individually, you'll find that everything is making sense
   
   I did the approach of not deleting v1.16 and this PR was updated. But, if 
you see the changes as a whole PR, now it's even worse than before, because we 
dont see the history properly. 
   
   Given this, I'd suggest to move forward with deleting v1.16 in this same PR. 
   Thoughts ? @nastra @pvary 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-16 Thread via GitHub


pvary commented on PR #10112:
URL: https://github.com/apache/iceberg/pull/10112#issuecomment-2059345426

   > Why do we remove Flink 1.1.6 in this PR?
   
   @manuzhang: This is how we usually do these changes. We support the 3 last 
version of Flink, so when we add a new version, we remove the old one.
   Also we do the changes this way, to keep the history of the main directory 
(in our case 1.18->1.19).
   
   Old PRs:
   - #9211
   - #7254
   - #6092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-16 Thread via GitHub


pvary merged PR #10112:
URL: https://github.com/apache/iceberg/pull/10112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Tests: Unify the test catalog setting [iceberg-python]

2024-04-16 Thread via GitHub


HonahX merged PR #609:
URL: https://github.com/apache/iceberg-python/pull/609


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-16 Thread via GitHub


pvary commented on PR #10112:
URL: https://github.com/apache/iceberg/pull/10112#issuecomment-2059391305

   Merged to main.
   Thanks for the PR @rodmeneses and @nastra for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Removes Flink version 1.16 [iceberg]

2024-04-16 Thread via GitHub


rodmeneses commented on PR #10154:
URL: https://github.com/apache/iceberg/pull/10154#issuecomment-2059418597

   cc: @pvary @nastra please take a look. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] JDBC Catalog:Change SQL like escape character from '\\' to '!' [iceberg]

2024-04-16 Thread via GitHub


nastra commented on PR #9407:
URL: https://github.com/apache/iceberg/pull/9407#issuecomment-2059442673

   fixes https://github.com/apache/iceberg/issues/10056
   
   @xuchuanqiu can you rebase this one please to fix the merge conflicts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] JDBC Catalog:Change SQL like escape character from '\\' to '!' [iceberg]

2024-04-16 Thread via GitHub


jbonofre commented on code in PR #9407:
URL: https://github.com/apache/iceberg/pull/9407#discussion_r1567618729


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java:
##
@@ -136,7 +136,7 @@ final class JdbcUtil {
   + TABLE_NAMESPACE
   + " = ? OR "
   + TABLE_NAMESPACE
-  + " LIKE ? ESCAPE '\\' "
+  + " LIKE ? ESCAPE '!' "

Review Comment:
   Does it work with any backend like PostgreSQL ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] JDBC Catalog:Change SQL like escape character from '\\' to '!' [iceberg]

2024-04-16 Thread via GitHub


jbonofre commented on PR #9407:
URL: https://github.com/apache/iceberg/pull/9407#issuecomment-2059448849

   If @xuchuanqiu doesn't have the bandwidth, I can help on this one. If it 
works with MySQL, SQLlite and PostgreSQL, it's a nice fix to include in Iceberg 
1.5.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Removes Flink version 1.16 [iceberg]

2024-04-16 Thread via GitHub


pvary merged PR #10154:
URL: https://github.com/apache/iceberg/pull/10154


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-04-16 Thread via GitHub


stevenzwu commented on code in PR #9464:
URL: https://github.com/apache/iceberg/pull/9464#discussion_r1567629192


##
flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/SerializerHelper.java:
##
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.flink.source.split;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.io.UTFDataFormatException;
+import org.apache.flink.core.memory.DataInputDeserializer;
+import org.apache.flink.core.memory.DataOutputSerializer;
+
+/**
+ * Helper class to serialize and deserialize strings longer than 65K. The 
inspiration is mostly
+ * taken from the class 
org.apache.flink.core.memory.DataInputSerializer.readUTF and
+ * org.apache.flink.core.memory.DataOutputSerializer.writeUTF.
+ */
+public class SerializerHelper implements Serializable {
+
+  /**
+   * Similar to {@link DataOutputSerializer#writeUTF(String)}. The size is 
only limited by the
+   * maximum java array size of the buffer.
+   *
+   * @param out the output stream to write the string to.
+   * @param str the string value to be written.
+   * @deprecated This method is deprecated because there will be a method 
within the {@link
+   * DataOutputSerializer} already which does the same thing, so use that 
one instead once that
+   * is released on Flink version 1.20.
+   * See * https://issues.apache.org/jira/browse/FLINK-34228";>FLINK-34228 * https://github.com/apache/flink/pull/24191";>https://github.com/apache/flink/pull/24191
+   */
+  @Deprecated
+  public static void writeLongUTF(DataOutputSerializer out, String str) throws 
IOException {
+int strlen = str.length();
+long utflen = 0;
+int c;
+
+/* use charAt instead of copying String to char array */
+for (int i = 0; i < strlen; i++) {
+  c = str.charAt(i);
+  utflen += getUTFBytesSize(c);
+
+  if (utflen > Integer.MAX_VALUE) {
+throw new UTFDataFormatException("Encoded string reached maximum 
length: " + utflen);
+  }
+}
+if (utflen > Integer.MAX_VALUE - 4) {
+  throw new UTFDataFormatException("Encoded string is too long: " + 
utflen);
+}
+

Review Comment:
   @pvary you are right. 
   
   I guess the resize call from the [Flink 
PR](https://github.com/apache/flink/pull/24191/files) is not necessary then. 
this is where @javrasya had the question for this thread. but I guess @javrasya 
's original comment is not 100% inaccurate.
   
   ```
  if (utflen > Integer.MAX_VALUE - 4) {
   throw new UTFDataFormatException("Encoded string is too long: " 
+ utflen);
   } else if (this.position > this.buffer.length - utflen - 2) {
   resize((int) utflen + 4);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Removes Flink version 1.16 [iceberg]

2024-04-16 Thread via GitHub


pvary commented on PR #10154:
URL: https://github.com/apache/iceberg/pull/10154#issuecomment-2059456318

   Thanks for the PR @rodmeneses and @nastra for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Don't fail to serialize IcebergSourceSplit when there is too many delete files [iceberg]

2024-04-16 Thread via GitHub


stevenzwu commented on code in PR #9464:
URL: https://github.com/apache/iceberg/pull/9464#discussion_r1567629965


##
flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/split/SerializerHelper.java:
##
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.flink.source.split;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.io.UTFDataFormatException;
+import org.apache.flink.core.memory.DataInputDeserializer;
+import org.apache.flink.core.memory.DataOutputSerializer;
+
+/**
+ * Helper class to serialize and deserialize strings longer than 65K. The 
inspiration is mostly
+ * taken from the class 
org.apache.flink.core.memory.DataInputSerializer.readUTF and
+ * org.apache.flink.core.memory.DataOutputSerializer.writeUTF.
+ */
+public class SerializerHelper implements Serializable {
+
+  /**
+   * Similar to {@link DataOutputSerializer#writeUTF(String)}. Except this 
supports larger payloads
+   * which is up to max integer value.
+   *
+   * @param out the output stream to write the string to.
+   * @param str the string value to be written.
+   * @deprecated This method is deprecated because there will be a method 
within the {@link
+   * DataOutputSerializer} already which does the same thing, so use that 
one instead once that
+   * is released on Flink version 1.20.
+   * See * https://issues.apache.org/jira/browse/FLINK-34228";>FLINK-34228 * https://github.com/apache/flink/pull/24191";>https://github.com/apache/flink/pull/24191
+   */
+  @Deprecated
+  public static void writeLongUTF(DataOutputSerializer out, String str) throws 
IOException {
+int strlen = str.length();
+long utflen = 0;
+int c;
+
+/* use charAt instead of copying String to char array */
+for (int i = 0; i < strlen; i++) {
+  c = str.charAt(i);
+  utflen += getUTFBytesSize(c);
+
+  if (utflen > Integer.MAX_VALUE) {
+throw new UTFDataFormatException("Encoded string reached maximum 
length: " + utflen);
+  }
+}
+if (utflen > Integer.MAX_VALUE - 4) {

Review Comment:
   nit: Iceberg style leave a blank line after the close of a block (for loop)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Removes Flink version 1.16 [iceberg]

2024-04-16 Thread via GitHub


Fokko commented on PR #10154:
URL: https://github.com/apache/iceberg/pull/10154#issuecomment-2059523299

   Thanks @rodmeneses for working on this. Can you [update the table 
here](https://github.com/apache/iceberg/blob/main/site/docs/multi-engine-support.md#apache-flink)
 as well? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Removes Flink version 1.16 [iceberg]

2024-04-16 Thread via GitHub


rodmeneses commented on PR #10154:
URL: https://github.com/apache/iceberg/pull/10154#issuecomment-2059543391

   @Fokko Thanks for the reminder. Here's the PR: 
https://github.com/apache/iceberg/pull/10155


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Updates flink versioning information in our docs [iceberg]

2024-04-16 Thread via GitHub


Fokko commented on code in PR #10155:
URL: https://github.com/apache/iceberg/pull/10155#discussion_r1567700280


##
site/docs/multi-engine-support.md:
##
@@ -83,15 +83,16 @@ Users should continuously upgrade their Flink version to 
stay up-to-date.
 
 
 | Version | Lifecycle Stage | Initial Iceberg Support | Latest Iceberg Support 
| Latest Runtime Jar

   |
-| --- | --- | --- 
||--|
+| --- 
|-|-||--|
 | 1.11| End of Life | 0.9.0   | 0.12.1 
| 
[iceberg-flink-runtime](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime/0.12.1/iceberg-flink-runtime-0.12.1.jar)
|
 | 1.12| End of Life | 0.12.0  | 0.13.1 
| 
[iceberg-flink-runtime-1.12](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.12/0.13.2/iceberg-flink-runtime-1.12-0.13.2.jar)
 [3]  |
 | 1.13| End of Life | 0.13.0  | 1.0.0  
| 
[iceberg-flink-runtime-1.13](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.13/1.2.0/iceberg-flink-runtime-1.13-1.0.0.jar)
|
 | 1.14| End of Life | 0.13.0  | 1.2.0  
| 
[iceberg-flink-runtime-1.14](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.14/1.2.0/iceberg-flink-runtime-1.14-1.2.0.jar)
   |
 | 1.15| End of Life | 0.14.0  | 1.4.3  
| 
[iceberg-flink-runtime-1.15](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.15/1.4.3/iceberg-flink-runtime-1.15-1.4.3.jar)
   |
-| 1.16| Deprecated  | 1.1.0   | {{ icebergVersion }}   
| 
[iceberg-flink-runtime-1.16](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.16/{{
 icebergVersion }}/iceberg-flink-runtime-1.16-{{ icebergVersion }}.jar) |
-| 1.17| Maintained  | 1.3.0   | {{ icebergVersion }}   
| 
[iceberg-flink-runtime-1.17](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.17/{{
 icebergVersion }}/iceberg-flink-runtime-1.17-{{ icebergVersion }}.jar) |
+| 1.16| End of Life | 1.1.0   | {{ icebergVersion }}   
| 
[iceberg-flink-runtime-1.16](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.16/{{
 icebergVersion }}/iceberg-flink-runtime-1.16-{{ icebergVersion }}.jar) |

Review Comment:
   ```suggestion
   | 1.16| End of Life | 1.1.0   | 1.5.0
  | 
[iceberg-flink-runtime-1.16](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.16/1.5.0/iceberg-flink-runtime-1.16-1.5.0.jar)
   |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Iceberg Spark streaming skips rows of data [iceberg]

2024-04-16 Thread via GitHub


cccs-jc opened a new issue, #10156:
URL: https://github.com/apache/iceberg/issues/10156

   ### Apache Iceberg version
   
   1.5.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   When using spark readStream the option `stream-from-timestamp` is used to 
position the read at the specified timestamp. The query below uses the current 
time to read from the head of the queue.
   ```python
   ts = int(time.time() * 1000)
   df = spark.readStream.format("iceberg")
   .option("streaming-skip-delete-snapshots", True)
   .option("streaming-skip-overwrite-snapshots", True)
   .option("streaming-max-files-per-micro-batch", max_files)
   .option("streaming-max-rows-per-micro-batch", max_rows)
   .option("stream-from-timestamp", ts)
   .load(source_table)
   ```
   
   You can kill your streaming job and wait 10 minutes. Then start it again. 
The readStream will load the checkpointed offset from disk and is supposed to 
read from that offset. However, there is a bug that cause it to skip the 
commits that occurred in that 10 minutes and instead the readStream reads from 
the latest commit.
   
   I can work around this bug by not specifying the  `stream-from-timestamp` if 
the query uses the checkpointed offset.
   ```python
   ts = int(time.time() * 1000)
   use_ts = checkpoint_dir_exists(checkpoint_location) == False
   df = ( 
   get_spark().readStream.format("iceberg")
   .option("streaming-skip-delete-snapshots", True)
   .option("streaming-skip-overwrite-snapshots", True)
   .option("streaming-max-files-per-micro-batch", max_files)
   .option("streaming-max-rows-per-micro-batch", max_rows)
   )
   if use_ts:
   df = df.option("stream-from-timestamp", ts)
   df = df.load(source_table)
   ```
   But this is error prone. As a user I expect the readStream to continue from 
the last checkpointed offset.
   
   I suspect the issue might be here. 
https://github.com/apache/iceberg/blob/fc5b2b336c774b0b8b032f7d87a1fb21e76b3f20/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java#L326
   
   This seems to short-circuit the saved `startOffset`  by checking of the last 
snapshot in the table is older than the requested `stream-from-timestamp`. 
   
   @singhpk234 I have not stepped through the code to be sure. Is my theory 
possible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Convert predicate to arrow filter and push down to parquet reader [iceberg-rust]

2024-04-16 Thread via GitHub


viirya commented on code in PR #295:
URL: https://github.com/apache/iceberg-rust/pull/295#discussion_r1567725081


##
crates/iceberg/src/arrow/reader.rs:
##
@@ -186,4 +216,399 @@ impl ArrowReader {
 Ok(ProjectionMask::leaves(parquet_schema, indices))
 }
 }
+
+fn get_row_filter(&self, parquet_schema: &SchemaDescriptor) -> 
Result> {
+if let Some(predicates) = &self.predicates {
+let field_id_map = self.build_field_id_map(parquet_schema)?;
+
+// Collect Parquet column indices from field ids
+let mut collector = CollectFieldIdVisitor { field_ids: vec![] };
+visit_predicate(&mut collector, predicates).unwrap();
+let column_indices = collector
+.field_ids
+.iter()
+.map(|field_id| {
+field_id_map.get(field_id).cloned().ok_or_else(|| {
+Error::new(ErrorKind::DataInvalid, "Field id not found 
in schema")
+})
+})
+.collect::>>()?;
+
+// Convert BoundPredicates to ArrowPredicates
+let mut converter = PredicateConverter {
+columns: &column_indices,
+projection_mask: ProjectionMask::leaves(parquet_schema, 
column_indices.clone()),
+parquet_schema,
+column_map: &field_id_map,
+};
+let arrow_predicate = visit_predicate(&mut converter, predicates)?;
+Ok(Some(RowFilter::new(vec![arrow_predicate])))
+} else {
+Ok(None)
+}
+}
+
+/// Build the map of field id to Parquet column index in the schema.
+fn build_field_id_map(&self, parquet_schema: &SchemaDescriptor) -> 
Result> {
+let mut column_map = HashMap::new();
+for (idx, field) in parquet_schema.columns().iter().enumerate() {
+let field_type = field.self_type();
+match field_type {
+ParquetType::PrimitiveType { basic_info, .. } => {
+if !basic_info.has_id() {
+return Err(Error::new(
+ErrorKind::DataInvalid,
+format!(
+"Leave column {:?} in schema doesn't have 
field id",
+field_type
+),
+));
+}
+column_map.insert(basic_info.id(), idx);
+}
+ParquetType::GroupType { .. } => {
+return Err(Error::new(
+ErrorKind::DataInvalid,
+format!(
+"Leave column in schema should be primitive type 
but got {:?}",
+field_type
+),
+));
+}
+};
+}
+
+Ok(column_map)
+}
+}
+
+/// A visitor to collect field ids from bound predicates.
+struct CollectFieldIdVisitor {
+field_ids: Vec,
+}
+
+impl BoundPredicateVisitor for CollectFieldIdVisitor {

Review Comment:
   Okay, let me take a look, not sure if it is completely suitable for the 
usage here, especially for `PredicateConverter`. But let me try it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


rahil-c commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567730841


##
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##
@@ -278,14 +286,26 @@ public void setConf(Object newConf) {
   @Override
   public List listTables(SessionContext context, Namespace 
ns) {
 checkNamespaceIsValid(ns);
+Map queryParams = Maps.newHashMap();
+List tables = Lists.newArrayList();
+String pageToken = "";
+if (restPageSize != null) {
+  queryParams.put("pageToken", pageToken);
+  queryParams.put("pageSize", restPageSize);
+}
+do {
+  ListTablesResponse response =
+  client.get(
+  paths.tables(ns),
+  queryParams,
+  ListTablesResponse.class,
+  headers(context),
+  ErrorHandlers.namespaceErrorHandler());
+  pageToken = response.nextPageToken();

Review Comment:
   Will take a look thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


rahil-c commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567734951


##
core/src/test/java/org/apache/iceberg/rest/responses/TestListNamespacesResponse.java:
##
@@ -34,7 +34,7 @@ public class TestListNamespacesResponse extends 
RequestResponseTestBase

Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-16 Thread via GitHub


sdd closed pull request #321: add `InclusiveProjection` Visitor
URL: https://github.com/apache/iceberg-rust/pull/321


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-16 Thread via GitHub


sdd commented on PR #321:
URL: https://github.com/apache/iceberg-rust/pull/321#issuecomment-2059636629

   Closing this one in favour of the alternate PR based on the 
BoundPredicateVisitor design that got merged 
(https://github.com/apache/iceberg-rust/pull/335)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


rahil-c commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567745943


##
core/src/test/java/org/apache/iceberg/rest/responses/TestListTablesResponse.java:
##
@@ -36,7 +36,7 @@ public class TestListTablesResponse extends 
RequestResponseTestBase

Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


rahil-c commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567748849


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(

Review Comment:
   will fix 



##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(
+() -> catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "-1")))
+.isInstanceOf(IllegalArgumentException.class)
+.hasMessage("Invalid value for pageSize, must be a positive 
integer");
+  }
+
+  @Test
+  public void testPaginationForListNamespaces() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+int numberOfItems = 100;
+String namespaceName = "newdb";
+
+// create several namespaces for listing and verify
+for (int i = 0; i < numberOfItems; i++) {
+  String nameSpaceName = namespaceName + i;
+  catalog.createNamespace(Namespace.of(nameSpaceName));
+}
+
+List results = catalog.listNamespaces();
+assertThat(results).hasSize(numberOfItems);

Review Comment:
   will fix 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-16 Thread via GitHub


rahil-c commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1567755548


##
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java:
##
@@ -2329,6 +2374,57 @@ public void 
multipleDiffsAgainstMultipleTablesLastFails() {
 assertThat(schema2.columns()).hasSize(1);
   }
 
+  @Test
+  public void testInvalidRestPageSize() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), 
(config) -> adapter);
+org.assertj.core.api.Assertions.assertThatThrownBy(
+() -> catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "-1")))
+.isInstanceOf(IllegalArgumentException.class)
+.hasMessage("Invalid value for pageSize, must be a positive 
integer");
+  }
+
+  @Test
+  public void testPaginationForListNamespaces() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+int numberOfItems = 100;
+String namespaceName = "newdb";
+
+// create several namespaces for listing and verify
+for (int i = 0; i < numberOfItems; i++) {
+  String nameSpaceName = namespaceName + i;
+  catalog.createNamespace(Namespace.of(nameSpaceName));
+}
+
+List results = catalog.listNamespaces();
+assertThat(results).hasSize(numberOfItems);
+  }
+
+  @Test
+  public void testPaginationForListTables() {
+RESTCatalogAdapter adapter = Mockito.spy(new 
RESTCatalogAdapter(backendCatalog));
+RESTCatalog catalog =
+new RESTCatalog(SessionCatalog.SessionContext.createEmpty(), (config) 
-> adapter);
+catalog.initialize("test", 
ImmutableMap.of(RESTSessionCatalog.REST_PAGE_SIZE, "10"));
+int numberOfItems = 100;
+String namespaceName = "newdb";
+String tableName = "newtable";
+catalog.createNamespace(Namespace.of(namespaceName));
+
+// create several tables under namespace for listing and verify
+for (int i = 0; i < numberOfItems; i++) {
+  TableIdentifier tableIdentifier = TableIdentifier.of(namespaceName, 
tableName + i);
+  catalog.createTable(tableIdentifier, SCHEMA);
+}
+
+List tables = 
catalog.listTables(Namespace.of(namespaceName));
+assertThat(tables).hasSize(numberOfItems);

Review Comment:
   will fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



  1   2   >