Re: [PR] Spark SystemFunctions are not pushed down during MERGE [iceberg]

2023-12-07 Thread via GitHub


tmnd1991 commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1844912767

   Hi @ConeyLiu, this still needs some refinement (mostly wrt testing) but do 
you think the change make sense? I would avoid to work more on it if I'm way 
off ;) 
   Thanks 🙏


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Add partition evolution [iceberg-python]

2023-12-07 Thread via GitHub


nicor88 opened a new issue, #193:
URL: https://github.com/apache/iceberg-python/issues/193

   ### Feature Request / Improvement
   
   ## Contex
   One of the advantage of Iceberg is that the original "partition" design can 
be changed over time. This is often the case, as a table is originally 
implemented for a specific query access pattern, and that can change over time.
   
   ## Requirements
   I would like to use pyiceberg to "evolve" partitions, specifically changing 
the partition specs.
   e.g.
   ```
   new_partition_spec = PartitionSpec(
   PartitionField(source_id=1, field_id=1000, transform=DayTransform(), 
name="event_date"),
  PartitionField(source_id=2, field_id=1001, name="marketing_channel")
   )
   
   with table.update_partitions() as table_update:
  table_update. update_partition_specs(new_partition_spec)
   ```
   or something similar
   
   ## Reference
   https://iceberg.apache.org/docs/latest/evolution/#partition-evolution


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]

2023-12-07 Thread via GitHub


HonahX commented on code in PR #183:
URL: https://github.com/apache/iceberg-python/pull/183#discussion_r1418516560


##
pyiceberg/io/pyarrow.py:
##
@@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> 
Optional[T]:
 """Visit a primitive type."""
 
 
-def _get_field_id(field: pa.Field) -> Optional[int]:
-for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS:
-if field_id_str := field.metadata.get(pyarrow_field_id_key):
-return int(field_id_str.decode())
-return None
+class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]):
+counter: count[int]
+missing_is_metadata: Optional[bool]
 
+def __init__(self) -> None:
+self.counter = count()

Review Comment:
   Shall we use `count(1)` here since we start from 1 
in`assign_fresh_schema_ids`
   
https://github.com/apache/iceberg-python/blob/4616d036440bf4cb3733e8d091220587cf290b75/pyiceberg/schema.py#L1221-L1230



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] The "Status" paragraph in the readme seems very outdated [iceberg]

2023-12-07 Thread via GitHub


ronkorving commented on issue #9127:
URL: https://github.com/apache/iceberg/issues/9127#issuecomment-1844981085

   @Fokko Thanks for the engagement :)
   Would you agree that perhaps the readme is not the best place to the "what 
we're currently working on" information, and perhaps it's better to just remove 
that part of it?
   I would be happy to make a PR 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-07 Thread via GitHub


tmnd1991 commented on PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1845066093

   Can I have a feedback on this one @ConeyLiu, too? Thanks a lot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


ajantha-bhat commented on PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#issuecomment-1845096677

   @snazy, @nastra: 
   If this PR is ok. Can I get an approval? If not, what more changes are 
expected?  
   
   I thought of waiting for https://github.com/apache/iceberg/pull/8857/ and 
https://github.com/apache/iceberg/pull/9012 to rebase. But they can rebase too 
once this PR is merged. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Update Slack Links [iceberg-docs]

2023-12-07 Thread via GitHub


nastra commented on PR #295:
URL: https://github.com/apache/iceberg-docs/pull/295#issuecomment-1845144252

   @bitsondatadev I think you'd need to update this on the `latest` branch as 
well to properly show up on the website


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Fix branching and tagging images on 1.2.1 branch [iceberg-docs]

2023-12-07 Thread via GitHub


nastra commented on PR #231:
URL: https://github.com/apache/iceberg-docs/pull/231#issuecomment-1845144982

   @amogh-jahagirdar is this still relevant?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Document reading in Spark using branch and tag identifiers [iceberg]

2023-12-07 Thread via GitHub


nastra merged PR #9238:
URL: https://github.com/apache/iceberg/pull/9238


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #106:
URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418802658


##
crates/iceberg/src/catalog/mod.rs:
##
@@ -758,23 +760,19 @@ mod tests {
 {
 "action": "add-spec",
 "spec": {
-"spec-id": 1,
 "fields": [
 {
 "source-id": 4,
-"field-id": 1000,

Review Comment:
   Why remove these? They are part of the `PartitionField`: 
https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.py#L131-L135



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] test: Add integration tests for rest catalog. [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #109:
URL: https://github.com/apache/iceberg-rust/pull/109#discussion_r1418811263


##
crates/catalog/rest/tests/rest_catalog_test.rs:
##
@@ -0,0 +1,376 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Integration tests for rest catalog.
+
+use iceberg::spec::{FormatVersion, NestedField, PrimitiveType, Schema, Type};
+use iceberg::transaction::Transaction;
+use iceberg::{Catalog, Namespace, NamespaceIdent, TableCreation, TableIdent};
+use iceberg_catalog_rest::{RestCatalog, RestCatalogConfig};
+use iceberg_test_utils::docker::DockerCompose;
+use iceberg_test_utils::{normalize_test_name, set_up};
+use port_scanner::scan_port_addr;
+use std::collections::HashMap;
+use tokio::time::sleep;
+
+const REST_CATALOG_PORT: u16 = 8181;
+
+struct TestFixture {
+_docker_compose: DockerCompose,
+rest_catalog: RestCatalog,
+}
+
+async fn set_test_fixture(func: &str) -> TestFixture {
+set_up();
+let docker_compose = DockerCompose::new(
+normalize_test_name(format!("{}_{func}", module_path!())),
+format!("{}/testdata/rest_catalog", env!("CARGO_MANIFEST_DIR")),
+);
+
+// Start docker compose
+docker_compose.run();
+
+let rest_catalog_ip = docker_compose.get_container_ip("rest");

Review Comment:
   We can always fix it in a follow up PR if we find out that it doesn't work 
well 👍 



##
crates/catalog/rest/tests/rest_catalog_test.rs:
##
@@ -0,0 +1,376 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Integration tests for rest catalog.
+
+use iceberg::spec::{FormatVersion, NestedField, PrimitiveType, Schema, Type};
+use iceberg::transaction::Transaction;
+use iceberg::{Catalog, Namespace, NamespaceIdent, TableCreation, TableIdent};
+use iceberg_catalog_rest::{RestCatalog, RestCatalogConfig};
+use iceberg_test_utils::docker::DockerCompose;
+use iceberg_test_utils::{normalize_test_name, set_up};
+use port_scanner::scan_port_addr;
+use std::collections::HashMap;
+use tokio::time::sleep;
+
+const REST_CATALOG_PORT: u16 = 8181;
+
+struct TestFixture {
+_docker_compose: DockerCompose,
+rest_catalog: RestCatalog,
+}
+
+async fn set_test_fixture(func: &str) -> TestFixture {
+set_up();
+let docker_compose = DockerCompose::new(
+normalize_test_name(format!("{}_{func}", module_path!())),
+format!("{}/testdata/rest_catalog", env!("CARGO_MANIFEST_DIR")),
+);
+
+// Start docker compose
+docker_compose.run();
+
+let rest_catalog_ip = docker_compose.get_container_ip("rest");

Review Comment:
   We can always fix it in a follow up PR if we find out that it doesn't work 
well 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Docs: Update default format version to 2 [iceberg]

2023-12-07 Thread via GitHub


zhongyujiang opened a new pull request, #9239:
URL: https://github.com/apache/iceberg/pull/9239

   Correct the default format version in documentation, we already switched to 
V2 by default since #8381.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] chore: Add cargo build and build guide [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko merged PR #111:
URL: https://github.com/apache/iceberg-rust/pull/111


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Add hms catalog layout [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko merged PR #112:
URL: https://github.com/apache/iceberg-rust/pull/112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]

2023-12-07 Thread via GitHub


liurenjie1024 commented on code in PR #106:
URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418814937


##
crates/iceberg/src/catalog/mod.rs:
##
@@ -758,23 +760,19 @@ mod tests {
 {
 "action": "add-spec",
 "spec": {
-"spec-id": 1,
 "fields": [
 {
 "source-id": 4,
-"field-id": 1000,

Review Comment:
   I think this `field_id` in request is optional?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] test: Add integration tests for rest catalog. [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #109:
URL: https://github.com/apache/iceberg-rust/pull/109#discussion_r1418813623


##
crates/test_utils/src/docker.rs:
##
@@ -0,0 +1,102 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::cmd::{get_cmd_output, run_command};
+use std::process::Command;
+
+/// A utility to manage lifecycle of docker compose.
+///
+/// It's will start docker compose when calling `run` method, and will be 
stopped when dropped.
+#[derive(Debug)]
+pub struct DockerCompose {
+project_name: String,
+docker_compose_dir: String,
+}
+
+impl DockerCompose {

Review Comment:
   In PyIceberg we do this in a Makefile, which seems to make more sense to me, 
but I'm also fine with this 👍 



##
crates/test_utils/src/docker.rs:
##
@@ -0,0 +1,102 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::cmd::{get_cmd_output, run_command};
+use std::process::Command;
+
+/// A utility to manage lifecycle of docker compose.
+///
+/// It's will start docker compose when calling `run` method, and will be 
stopped when dropped.
+#[derive(Debug)]
+pub struct DockerCompose {
+project_name: String,
+docker_compose_dir: String,
+}
+
+impl DockerCompose {

Review Comment:
   In PyIceberg we do this in a Makefile, which seems to make more sense to me, 
but I'm also fine with this 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #106:
URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418820203


##
crates/iceberg/src/catalog/mod.rs:
##
@@ -758,23 +760,19 @@ mod tests {
 {
 "action": "add-spec",
 "spec": {
-"spec-id": 1,
 "fields": [
 {
 "source-id": 4,
-"field-id": 1000,

Review Comment:
   The open-api spec is a bit unclear, but it isn't right now:
   
   
https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/open-api/rest-catalog-open-api.py#L283-L284
   
   References:
   
   
https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/open-api/rest-catalog-open-api.py#L138-L140
   
   References:
   
   
https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/open-api/rest-catalog-open-api.py#L131-L135
   
   And there it is non-optional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-12-07 Thread via GitHub


nastra merged PR #8857:
URL: https://github.com/apache/iceberg/pull/8857


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #106:
URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418824322


##
crates/iceberg/src/catalog/mod.rs:
##
@@ -758,23 +760,19 @@ mod tests {
 {
 "action": "add-spec",
 "spec": {
-"spec-id": 1,
 "fields": [
 {
 "source-id": 4,
-"field-id": 1000,

Review Comment:
   It looks like it is for historical reasons. For old specs they can be 
missing, and they'll just auto-increment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #106:
URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418823234


##
crates/iceberg/src/catalog/mod.rs:
##
@@ -758,23 +760,19 @@ mod tests {
 {
 "action": "add-spec",
 "spec": {
-"spec-id": 1,
 "fields": [
 {
 "source-id": 4,
-"field-id": 1000,

Review Comment:
   On Java:
   
   
https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/core/src/main/java/org/apache/iceberg/MetadataUpdateParser.java#L447-L451
   
   And there it is optional:
   
   
https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/core/src/main/java/org/apache/iceberg/PartitionSpecParser.java#L128-L145



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Parquet: Add a table property to control the Parquet row-group size of position delete files [iceberg]

2023-12-07 Thread via GitHub


zhongyujiang commented on PR #9177:
URL: https://github.com/apache/iceberg/pull/9177#issuecomment-1845189563

   @rdblue @RussellSpitzer would you mind take a look at this? Thanks in 
advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Typo? [iceberg-docs]

2023-12-07 Thread via GitHub


Fokko commented on PR #270:
URL: https://github.com/apache/iceberg-docs/pull/270#issuecomment-1845241457

   Thanks @lorenzwalthert for taking the time to fix this 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] The "Status" paragraph in the readme seems very outdated [iceberg]

2023-12-07 Thread via GitHub


Fokko commented on issue #9127:
URL: https://github.com/apache/iceberg/issues/9127#issuecomment-1845240644

   Yes, I think that makes a lot of sense. @bitsondatadev is also working on 
updating the [roadmap](https://iceberg.apache.org/roadmap/), it might be a good 
idea to link to that so we have one single place where people can see what's 
going on 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Typo? [iceberg-docs]

2023-12-07 Thread via GitHub


Fokko merged PR #270:
URL: https://github.com/apache/iceberg-docs/pull/270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Fix spelling [iceberg-docs]

2023-12-07 Thread via GitHub


Fokko merged PR #271:
URL: https://github.com/apache/iceberg-docs/pull/271


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Update vendors.md [iceberg-docs]

2023-12-07 Thread via GitHub


Fokko merged PR #275:
URL: https://github.com/apache/iceberg-docs/pull/275


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


ajantha-bhat commented on PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#issuecomment-1845247046

   Note: I am squashing the commits into one (since I am having hard time 
rebasing with 8 commits). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Open-API: Refactor updates with discriminator [iceberg]

2023-12-07 Thread via GitHub


Fokko opened a new pull request, #9240:
URL: https://github.com/apache/iceberg/pull/9240

   This generates nicer code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #106:
URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r141983


##
crates/iceberg/src/catalog/mod.rs:
##
@@ -758,23 +760,19 @@ mod tests {
 {
 "action": "add-spec",
 "spec": {
-"spec-id": 1,
 "fields": [
 {
 "source-id": 4,
-"field-id": 1000,

Review Comment:
   BTW, cleaned up the spec a bit: https://github.com/apache/iceberg/pull/9240



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko merged PR #106:
URL: https://github.com/apache/iceberg-rust/pull/106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] feat: Add support for `UnboundPartitionSpec`. [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko closed issue #98: feat: Add support for `UnboundPartitionSpec`.
URL: https://github.com/apache/iceberg-rust/issues/98


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]

2023-12-07 Thread via GitHub


Fokko commented on PR #106:
URL: https://github.com/apache/iceberg-rust/pull/106#issuecomment-1845257956

   Thanks @my-vegetable-has-exploded for working on this, and @liurenjie1024 
and @Xuanwo for reviewing this 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump actions/setup-python from 4 to 5 [iceberg-python]

2023-12-07 Thread via GitHub


Fokko merged PR #191:
URL: https://github.com/apache/iceberg-python/pull/191


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump fastavro from 1.9.0 to 1.9.1 [iceberg-python]

2023-12-07 Thread via GitHub


Fokko merged PR #190:
URL: https://github.com/apache/iceberg-python/pull/190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Fixed certain operations failing to add new data files during retries [iceberg]

2023-12-07 Thread via GitHub


nastra commented on PR #9230:
URL: https://github.com/apache/iceberg/pull/9230#issuecomment-1845262257

   @ConeyLiu could you also please take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump griffe from 0.38.0 to 0.38.1 [iceberg-python]

2023-12-07 Thread via GitHub


Fokko merged PR #189:
URL: https://github.com/apache/iceberg-python/pull/189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] maint(transforms): replace `type()` calls with `isinstance()` [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on PR #188:
URL: https://github.com/apache/iceberg-python/pull/188#issuecomment-1845273979

   @jayceslesar looks like `black` still wants to reformat some code. Can you 
run `make lint`? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Automatically create the tables for the `SqlCatalog` [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on PR #186:
URL: https://github.com/apache/iceberg-python/pull/186#issuecomment-1845276186

   This is beautiful @cosmastech, in this same PR, can you remove these lines: 
https://github.com/apache/iceberg-python/blob/5337504963dec7a410ba762da08064578976bd7a/mkdocs/docs/api.md?plain=1#L57-L61


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418929287


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,199 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+

Review Comment:
   Two things that I like to avoid; complexity and trust issues!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418931669


##
pyiceberg/table/snapshots.py:
##
@@ -19,15 +19,40 @@
 Any,
 Dict,
 List,
+Mapping,
 Optional,
 )
 
 from pydantic import Field, PrivateAttr, model_serializer
 
 from pyiceberg.io import FileIO
-from pyiceberg.manifest import ManifestFile, read_manifest_list
+from pyiceberg.manifest import DataFile, DataFileContent, ManifestFile, 
read_manifest_list
 from pyiceberg.typedef import IcebergBaseModel
 
+ADDED_DATA_FILES = 'added-data-files'
+ADDED_DELETE_FILES = 'added-delete-files'
+ADDED_EQUALITY_DELETES = 'added-equality-deletes'
+ADDED_FILE_SIZE = 'added-files-size'
+ADDED_POSITION_DELETES = 'added-position-deletes'
+ADDED_POSITION_DELETE_FILES = f'{ADDED_POSITION_DELETES}-files'

Review Comment:
   Ahh, yes!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418932283


##
pyiceberg/table/snapshots.py:
##
@@ -19,15 +19,40 @@
 Any,
 Dict,
 List,
+Mapping,
 Optional,
 )
 
 from pydantic import Field, PrivateAttr, model_serializer
 
 from pyiceberg.io import FileIO
-from pyiceberg.manifest import ManifestFile, read_manifest_list
+from pyiceberg.manifest import DataFile, DataFileContent, ManifestFile, 
read_manifest_list
 from pyiceberg.typedef import IcebergBaseModel
 
+ADDED_DATA_FILES = 'added-data-files'
+ADDED_DELETE_FILES = 'added-delete-files'
+ADDED_EQUALITY_DELETES = 'added-equality-deletes'
+ADDED_FILE_SIZE = 'added-files-size'
+ADDED_POSITION_DELETES = 'added-position-deletes'
+ADDED_POSITION_DELETE_FILES = f'{ADDED_POSITION_DELETES}-files'
+ADDED_RECORDS = 'added-records'
+DELETED_DATA_FILES = 'deleted-data-files'
+DELETED_RECORDS = 'deleted-records'
+EQUALITY_DELETE_FILES = 'added-equality-delete-files'

Review Comment:
   Yes!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418934519


##
pyiceberg/table/snapshots.py:
##
@@ -65,6 +90,25 @@ def __init__(self, operation: Operation, **data: Any) -> 
None:
 super().__init__(operation=operation, **data)
 self._additional_properties = data
 
+def __getitem__(self, __key: str) -> Optional[Any]:  # type: ignore
+"""Return a key as it is a map."""
+if __key == 'operation':

Review Comment:
   It seems to be lower-case here:
   
![image](https://github.com/apache/iceberg-python/assets/1134248/706c04e2-0f2e-4824-9ddb-8485ed9fcd3a)
   
   I can make it case-insensitive



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418935983


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +168,193 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def remove_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def build(self) -> Dict[str, str]:
+def set_when_positive(properties: Dict[str, str], num: int, 
property_name: str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_when_positive(properties, self.added_size, ADDED_FILE_SIZE)
+set_when_positive(properties, self.removed_size, REMOVED_FILE_SIZE)
+set_when_positive(properties, self.added_files, ADDED_DATA_FILES)

Review Comment:
   That's not a minor, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Automatically create the tables for the `SqlCatalog` [iceberg-python]

2023-12-07 Thread via GitHub


cosmastech commented on PR #186:
URL: https://github.com/apache/iceberg-python/pull/186#issuecomment-1845323941

   > This is beautiful @cosmastech, in this same PR, can you remove these lines:
   > 
   > 
https://github.com/apache/iceberg-python/blob/5337504963dec7a410ba762da08064578976bd7a/mkdocs/docs/api.md?plain=1#L57-L61
   > 
   > It also looks like the CI is a bit sad, can you run `make lint` as well?
   
   I believe we should be good to go. @Fokko let me know if I missed anything 
else


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Fixed certain operations failing to add new data files during retries [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #9230:
URL: https://github.com/apache/iceberg/pull/9230#discussion_r1418954500


##
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##
@@ -895,7 +895,7 @@ private void cleanUncommittedAppends(Set 
committed) {
 }
   }
 
-  this.cachedNewDataManifests = committedNewDataManifests;
+  this.cachedNewDataManifests = null;

Review Comment:
   rather than setting this to `null`, I think a better approach would be to 
handle this exactly like it's done for `cachedNewDeleteManifests`, where 
`cachedNewDeleteManifests` is initialized to be a linked list and then is 
iterated over and files are removed.
   
   I did a quick check with 
   ```
   --- a/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
   +++ b/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
   @@ -94,7 +94,7 @@ abstract class MergingSnapshotProducer extends 
SnapshotProducer {
  private PartitionSpec dataSpec;

  // cache new data manifests after writing
   -  private List cachedNewDataManifests = null;
   +  private final List cachedNewDataManifests = 
Lists.newLinkedList();
  private boolean hasNewDataFiles = false;

  // cache new manifests for delete files
   @@ -885,17 +885,15 @@ abstract class MergingSnapshotProducer extends 
SnapshotProducer {
  }

  private void cleanUncommittedAppends(Set committed) {
   -if (cachedNewDataManifests != null) {
   -  List committedNewDataManifests = Lists.newArrayList();
   -  for (ManifestFile manifest : cachedNewDataManifests) {
   -if (committed.contains(manifest)) {
   -  committedNewDataManifests.add(manifest);
   -} else {
   -  deleteFile(manifest.path());
   +if (!cachedNewDataManifests.isEmpty()) {
   +  ListIterator dataManifestsIterator = 
cachedNewDataManifests.listIterator();
   +  while (dataManifestsIterator.hasNext()) {
   +ManifestFile dataManifest = dataManifestsIterator.next();
   +if (!committed.contains(dataManifest)) {
   +  deleteFile(dataManifest.path());
   +  dataManifestsIterator.remove();
}
  }
   -
   -  this.cachedNewDataManifests = null;
}

ListIterator deleteManifestsIterator = 
cachedNewDeleteManifests.listIterator();
   @@ -950,12 +948,12 @@ abstract class MergingSnapshotProducer extends 
SnapshotProducer {
  }

  private List newDataFilesAsManifests() {
   -if (hasNewDataFiles && cachedNewDataManifests != null) {
   +if (hasNewDataFiles && !cachedNewDataManifests.isEmpty()) {
  cachedNewDataManifests.forEach(file -> deleteFile(file.path()));
   -  cachedNewDataManifests = null;
   +  cachedNewDataManifests.clear();
}

   -if (cachedNewDataManifests == null) {
   +if (cachedNewDataManifests.isEmpty()) {
  try {
RollingManifestWriter writer = 
newRollingManifestWriter(dataSpec());
try {
   @@ -968,7 +966,7 @@ abstract class MergingSnapshotProducer extends 
SnapshotProducer {
  writer.close();
}

   -this.cachedNewDataManifests = writer.toManifestFiles();
   +this.cachedNewDataManifests.addAll(writer.toManifestFiles());
   ```
   and that seemed to work. We would probably want to use the same approach in 
`FastAppend` (but I haven't checked that part for `FastAppend`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Update default format version to 2 [iceberg]

2023-12-07 Thread via GitHub


nastra commented on PR #9239:
URL: https://github.com/apache/iceberg/pull/9239#issuecomment-1845334573

   thanks @zhongyujiang for adding this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Update default format version to 2 [iceberg]

2023-12-07 Thread via GitHub


nastra merged PR #9239:
URL: https://github.com/apache/iceberg/pull/9239


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException with FileHandlingException [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418963810


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");

Review Comment:
   I wonder if that should throw in IllegalStateException here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException with FileHandlingException [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418968003


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");
+
+try {
+  return stream.getPos();
+} catch (IOException e) {
+  throw new RuntimeIOException(e, "Failed to get stream length");

Review Comment:
   is this change actually related to this particular PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException with FileHandlingException [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418968329


##
core/src/main/java/org/apache/iceberg/avro/AvroIO.java:
##
@@ -178,7 +178,7 @@ static long 
findStartingRowPos(Supplier open, long start) {
   SYNC_READER.read(decoder, blockSync);
 
   if (!Arrays.equals(fileSync, blockSync)) {
-throw new RuntimeIOException("Invalid sync at %s", nextSyncPos);
+throw new IOException(String.format("Invalid sync at %s", 
nextSyncPos));

Review Comment:
   this change doesn't seem to be related to the scope of this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException with FileHandlingException [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418968329


##
core/src/main/java/org/apache/iceberg/avro/AvroIO.java:
##
@@ -178,7 +178,7 @@ static long 
findStartingRowPos(Supplier open, long start) {
   SYNC_READER.read(decoder, blockSync);
 
   if (!Arrays.equals(fileSync, blockSync)) {
-throw new RuntimeIOException("Invalid sync at %s", nextSyncPos);
+throw new IOException(String.format("Invalid sync at %s", 
nextSyncPos));

Review Comment:
   this change doesn't seem to be related to the scope of this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] JDBC catalog fix namespaceExists check [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8340:
URL: https://github.com/apache/iceberg/pull/8340#discussion_r1418975543


##
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java:
##
@@ -601,15 +601,76 @@ public void testDropNamespace() {
   public void testCreateNamespace() {
 Namespace testNamespace = Namespace.of("testDb", "ns1", "ns2");
 assertThat(catalog.namespaceExists(testNamespace)).isFalse();
-// Test with no metadata
+assertThat(catalog.namespaceExists(Namespace.of("testDb", 
"ns1"))).isFalse();
 catalog.createNamespace(testNamespace);
 assertThat(catalog.namespaceExists(testNamespace)).isTrue();
+assertThat(catalog.namespaceExists(Namespace.of("testDb"))).isTrue();
+assertThat(catalog.namespaceExists(Namespace.of("testDb", 
"ns1"))).isTrue();
+assertThat(catalog.namespaceExists(Namespace.of("ns1", "ns2"))).isFalse();
+assertThat(catalog.namespaceExists(Namespace.of("testDb", 
"ns%"))).isFalse();
+assertThat(catalog.namespaceExists(Namespace.of("testDb", 
"ns_"))).isFalse();
+assertThat(catalog.namespaceExists(Namespace.of("testDb", "ns1", 
"ns2"))).isTrue();
+assertThat(catalog.namespaceExists(Namespace.of("testDb", "ns1", "ns2", 
"ns3"))).isFalse();
+  }
+
+  @Test
+  public void testCreateNamespaceWithBackslashCharacter() {
+

Review Comment:
   nit: unnecessary whitespace



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-07 Thread via GitHub


ajantha-bhat commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418996270


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");

Review Comment:
   Yes, I changed this to `IllegalStateException`  based on Dan's comment last 
time.
   https://github.com/apache/iceberg/pull/6887#discussion_r1114846404



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-07 Thread via GitHub


ajantha-bhat commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418998188


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");
+
+try {
+  return stream.getPos();
+} catch (IOException e) {
+  throw new RuntimeIOException(e, "Failed to get stream length");

Review Comment:
   Yes. It was also for a special case handling (RuntimeIO exception without IO 
exception cause).
   
   Updated the PR title and description. So, it makes sense now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-07 Thread via GitHub


ajantha-bhat commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418998188


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");
+
+try {
+  return stream.getPos();
+} catch (IOException e) {
+  throw new RuntimeIOException(e, "Failed to get stream length");

Review Comment:
   Yes. It was also for a special case handling (RuntimeIO exception without IO 
exception cause).
   
   Updated the PR title and description. So, it makes sense now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-07 Thread via GitHub


ajantha-bhat commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1419002428


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");
+
+try {
+  return stream.getPos();
+} catch (IOException e) {
+  throw new RuntimeIOException(e, "Failed to get stream length");

Review Comment:
   True. This one has a cause. It is not special case. I will revert it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-07 Thread via GitHub


ajantha-bhat commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1419010809


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");
+
+try {
+  return stream.getPos();
+} catch (IOException e) {
+  throw new RuntimeIOException(e, "Failed to get stream length");

Review Comment:
   Sorry. I got confused. 
   
   There was two `RuntimeIOException` from this length() method. Now, one is 
replaced with  `IllegalStateException` as per Dan's comment. So, there will 
still be one `RuntimeIOException` (which will be replaced with 
`UncheckedIOException` in the main PR as it has a cause of `IOException`) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-07 Thread via GitHub


ajantha-bhat commented on PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#issuecomment-1845411543

   @nastra: Thanks for the review. I have replied to comments. 
   
   I think no new code modifications needed. Please check again. Thanks.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #6887:
URL: https://github.com/apache/iceberg/pull/6887#discussion_r1419046382


##
core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:
##
@@ -78,14 +78,13 @@ public Metrics metrics() {
 
   @Override
   public long length() {
-if (stream != null) {
-  try {
-return stream.getPos();
-  } catch (IOException e) {
-throw new RuntimeIOException(e, "Failed to get stream length");
-  }
+Preconditions.checkNotNull(stream, "Failed to get stream length: no open 
stream");

Review Comment:
   did you potentially forget to push? Because I don't see 
`Preconditions.checkState(...)` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419059002


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##
@@ -448,33 +473,82 @@ public void renameTable(TableIdentifier from, 
TableIdentifier to) {
 // behavior. So better be safe than sorry.
   }
 
+  private static void validateToContentForRename(
+  TableIdentifier from, TableIdentifier to, IcebergContent 
existingToContent) {
+if (existingToContent != null) {
+  if (existingToContent.getType() == Content.Type.ICEBERG_VIEW) {
+throw new AlreadyExistsException("Cannot rename %s to %s. View already 
exists", from, to);
+  } else if (existingToContent.getType() == Content.Type.ICEBERG_TABLE) {
+throw new AlreadyExistsException("Cannot rename %s to %s. Table 
already exists", from, to);
+  } else {
+throw new AlreadyExistsException(
+"Cannot rename %s to %s. Another content with same name already 
exists", from, to);

Review Comment:
   nit: should this maybe mention the content type here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1419062626


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +168,193 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def remove_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def build(self) -> Dict[str, str]:
+def set_when_positive(properties: Dict[str, str], num: int, 
property_name: str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_when_positive(properties, self.added_size, ADDED_FILE_SIZE)
+set_when_positive(properties, self.removed_size, REMOVED_FILE_SIZE)
+set_when_positive(properties, self.added_files, ADDED_DATA_FILES)
+set_when_positive(properties, self.removed_files, DELETED_DATA_FILES)
+set_when_positive(properties, self.added_eq_delete_files, 
EQUALITY_DELETE_FILES)
+set_when_positive(properties, self.removed_eq_delete_files, 
REMOVED_EQUALITY_DELETE_FILES)
+set_when_positive(properties, self.added_pos_delete_files, 
ADDED_POSITION_DELETE_FILES)
+set_when_positive(properties, self.removed_pos_delete_files, 
REMOVED_POSITION_DELETE_FILES)
+set_when_positive(properties, self.added_delete_files, 
ADDED_DELETE_FILES)
+set_when_positive(properties, self.removed_delete_files, 
REMOVED_DELETE_FILES)
+set_when_positive(properties, self.added_records, ADDED_RECORDS)
+set_when_positive(properties, self.deleted_records, DELETED_RECORDS)
+set_when_positive(properties, self.added_pos_deletes, 
ADDED_POSITION_DELETES)
+set_when_positive(properties, self.removed_pos_deletes, 
REMOVED_POSITION_DELETES)
+set_when_positive(properties, self.added_eq_deletes, 
ADDED_EQUALITY_DELETES)
+set_when_positive(properties, self.removed_eq_deletes, 
REMOVED_EQUALITY_DELETES)
+
+return properties
+
+
+def _truncate_table_summary(summary: Summary, previous_summary: Mapping[str, 
str]) -> Summary:
+for prop in {
+TOTAL_DATA_FILES,
+TOTAL_DELETE_FILES,
+TOTAL_RECORDS,
+TOTAL_FILE_SIZE,
+TOTAL_

Re: [PR] Core: Fixed certain operations failing to add new data files during retries [iceberg]

2023-12-07 Thread via GitHub


ConeyLiu commented on code in PR #9230:
URL: https://github.com/apache/iceberg/pull/9230#discussion_r1419062959


##
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##
@@ -895,7 +895,7 @@ private void cleanUncommittedAppends(Set 
committed) {
 }
   }
 
-  this.cachedNewDataManifests = committedNewDataManifests;
+  this.cachedNewDataManifests = null;

Review Comment:
   I think @nastra this solution is much better and solid. We should not set 
`cachedNewDataManifests ` to null directly. Consider the following case:
   ```
   1. Transaction transaction = table.newTransaction();
   
   2. transaction.newAppend().appendFile(...).commit(); // generate new 
manifest file
   
   3. // transaction operation failed
   
   4. transaction.commitTransaction(); // failed to commit
   ```
   With the changes in this PR, the new manifest files generated by step 2 will 
not be deleted in step 4. I think the failed UTs could be verified this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419065698


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java:
##
@@ -141,16 +147,26 @@ private UpdateableReference loadReference(String 
requestedRef, String hash) {
 }
   }
 
+  /** @deprecated will be removed after 1.5.0; use listContents() instead */
+  @Deprecated

Review Comment:
   it seems a bit weird that there are `commitTable()` / `commitView()`  
methods, but all the other view/table-related methods are unified to 
`xyzContent()`. Wouldn't it be better to just leave `listTables()` (and others) 
and introduce `listViews()` (and similar)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-07 Thread via GitHub


ConeyLiu commented on code in PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#discussion_r1419067449


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala:
##
@@ -40,14 +37,20 @@ import org.apache.spark.sql.types.StructType
 object ReplaceStaticInvoke extends Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan =
-plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, 
FILTER)) {
+plan.transformWithPruning (_.containsPattern(FILTER)) {
   case filter @ Filter(condition, _) =>
-val newCondition = 
condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) {
+val newCondition = 
condition.transformWithPruning(_.containsAnyPattern(BINARY_COMPARISON, IN, 
INSET)) {
   case c @ BinaryComparison(left: StaticInvoke, right) if 
canReplace(left) && right.foldable =>
 c.withNewChildren(Seq(replaceStaticInvoke(left), right))
 
   case c @ BinaryComparison(left, right: StaticInvoke) if 
canReplace(right) && left.foldable =>
 c.withNewChildren(Seq(left, replaceStaticInvoke(right)))
+
+  case in @ In(s: StaticInvoke, _) =>

Review Comment:
   why not check if `s` can be replaced or not?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Use Awaitility instead of Thread.sleep [iceberg]

2023-12-07 Thread via GitHub


yyy1000 commented on PR #9224:
URL: https://github.com/apache/iceberg/pull/9224#issuecomment-1845461829

   > > Hi, @nk1506 It seems that my code can't pass the unit tests. Is it 
because of the default waiting time is not enough? Do you have any ideas?
   > 
   > Please refer 
[here](https://github.com/apache/iceberg/blob/6a9d3c77977baff4295ee2dde0150d73c8c46af1/aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogLock.java#L146)
 . Also it is advisable to set the `atMost` too.
   
   Oh, got it. I should use condition 'or' but not 'and'. :) Now the unit test 
can run in my local env.
   Awaitliity use default timeout of 10 seconds, and I think it is enough in 
this case. https://github.com/awaitility/awaitility/wiki/Usage#defaults
   So adding a 'atMost' seems not necessary here. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-07 Thread via GitHub


ConeyLiu commented on PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1845465132

   @tmnd1991 thanks for your contribution. I think this is a valid fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419074476


##
nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java:
##
@@ -180,6 +188,32 @@ protected Table createTable(TableIdentifier 
tableIdentifier, Schema schema) {
 return catalog.createTable(tableIdentifier, schema);
   }
 
+  protected View createView(NessieCatalog nessieCatalog, TableIdentifier 
tableIdentifier) {
+Schema schema = new Schema(StructType.of(required(1, "id", 
LongType.get())).fields());
+return createView(nessieCatalog, tableIdentifier, schema);
+  }
+
+  protected View createView(
+  NessieCatalog nessieCatalog, TableIdentifier tableIdentifier, Schema 
schema) {
+createMissingNamespaces(tableIdentifier);
+return nessieCatalog
+.buildView(tableIdentifier)
+.withSchema(schema)
+.withDefaultNamespace(tableIdentifier.namespace())
+.withQuery("spark", "select * from ns.tbl")
+.create();
+  }
+
+  protected View replaceView(NessieCatalog nessieCatalog, TableIdentifier 
identifier) {
+Schema schema = new Schema(StructType.of(required(2, "age", 
Types.IntegerType.get())).fields());

Review Comment:
   ```suggestion
   Schema schema = new Schema(required(2, "age", Types.IntegerType.get(;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419075510


##
nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java:
##
@@ -180,6 +188,32 @@ protected Table createTable(TableIdentifier 
tableIdentifier, Schema schema) {
 return catalog.createTable(tableIdentifier, schema);
   }
 
+  protected View createView(NessieCatalog nessieCatalog, TableIdentifier 
tableIdentifier) {
+Schema schema = new Schema(StructType.of(required(1, "id", 
LongType.get())).fields());

Review Comment:
   ```suggestion
   Schema schema = new Schema(required(1, "id", LongType.get()));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419073781


##
nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java:
##
@@ -267,4 +305,23 @@ static DataFile makeDataFile(Table icebergTable, String 
fileLocation) {
 .withFileSizeInBytes(Files.localInput(fileLocation).getLength())
 .build();
   }
+
+  protected static List metadataVersionFiles(String tablePath) {
+return filterByExtension(tablePath, 
getFileExtension(TableMetadataParser.Codec.NONE));
+  }
+
+  protected static List filterByExtension(String tablePath, String 
extension) {
+return metadataFiles(tablePath).stream()
+.filter(f -> f.endsWith(extension))
+.collect(Collectors.toList());
+  }
+
+  @SuppressWarnings(
+  "RegexpSinglelineJava") // respecting this rule requires a lot more 
lines of code

Review Comment:
   nit: can this be moved to the line above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419078008


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java:
##
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Path;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.CatalogProperties;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.view.BaseView;
+import org.apache.iceberg.view.View;
+import org.apache.iceberg.view.ViewCatalogTests;
+import org.apache.iceberg.view.ViewMetadata;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.junit.jupiter.api.extension.RegisterExtension;
+import org.junit.jupiter.api.io.TempDir;
+import org.projectnessie.client.api.NessieApiV1;
+import org.projectnessie.client.ext.NessieApiVersion;
+import org.projectnessie.client.ext.NessieApiVersions;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.jaxrs.ext.NessieJaxRsExtension;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.Reference;
+import org.projectnessie.model.Tag;
+import org.projectnessie.versioned.storage.common.persist.Persist;
+import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory;
+import org.projectnessie.versioned.storage.testextension.NessieBackend;
+import org.projectnessie.versioned.storage.testextension.NessiePersist;
+import org.projectnessie.versioned.storage.testextension.PersistExtension;
+
+@ExtendWith(PersistExtension.class)
+@NessieBackend(InmemoryBackendTestFactory.class)
+@NessieApiVersions // test all versions
+public class TestNessieViewCatalog extends ViewCatalogTests {
+
+  @NessiePersist static Persist persist;
+
+  @RegisterExtension
+  static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() 
-> persist);
+
+  @TempDir public Path temp;

Review Comment:
   ```suggestion
 @TempDir private Path temp;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419081069


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java:
##
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Path;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.CatalogProperties;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.view.BaseView;
+import org.apache.iceberg.view.View;
+import org.apache.iceberg.view.ViewCatalogTests;
+import org.apache.iceberg.view.ViewMetadata;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.junit.jupiter.api.extension.RegisterExtension;
+import org.junit.jupiter.api.io.TempDir;
+import org.projectnessie.client.api.NessieApiV1;
+import org.projectnessie.client.ext.NessieApiVersion;
+import org.projectnessie.client.ext.NessieApiVersions;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.jaxrs.ext.NessieJaxRsExtension;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.Reference;
+import org.projectnessie.model.Tag;
+import org.projectnessie.versioned.storage.common.persist.Persist;
+import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory;
+import org.projectnessie.versioned.storage.testextension.NessieBackend;
+import org.projectnessie.versioned.storage.testextension.NessiePersist;
+import org.projectnessie.versioned.storage.testextension.PersistExtension;
+
+@ExtendWith(PersistExtension.class)
+@NessieBackend(InmemoryBackendTestFactory.class)
+@NessieApiVersions // test all versions
+public class TestNessieViewCatalog extends ViewCatalogTests {
+
+  @NessiePersist static Persist persist;
+
+  @RegisterExtension
+  static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() 
-> persist);
+
+  @TempDir public Path temp;
+
+  private NessieCatalog catalog;
+  private NessieApiV1 api;
+  private NessieApiVersion apiVersion;
+  private Configuration hadoopConfig;
+  private String initialHashOfDefaultBranch;
+  private String uri;
+
+  @BeforeEach
+  public void setUp(NessieClientFactory clientFactory, @NessieClientUri URI 
nessieUri)
+  throws NessieNotFoundException {
+api = clientFactory.make();
+apiVersion = clientFactory.apiVersion();
+initialHashOfDefaultBranch = api.getDefaultBranch().getHash();
+uri = nessieUri.toASCIIString();
+hadoopConfig = new Configuration();
+catalog = initNessieCatalog("main");
+  }
+
+  @AfterEach
+  public void afterEach() throws IOException {
+resetData();
+try {
+  if (catalog != null) {
+catalog.close();
+  }
+  api.close();
+} finally {
+  catalog = null;
+  api = null;
+  hadoopConfig = null;
+}
+  }
+
+  private void resetData() throws NessieConflictException, 
NessieNotFoundException {
+Branch defaultBranch = api.getDefaultBranch();
+for (Reference r : api.getAllReferences().get().getReferences()) {
+  if (r instanceof Branch && !r.getName().equals(defaultBranch.getName())) 
{
+api.deleteBranch().branch((Branch) r).delete();
+  }
+  if (r instanceof Tag) {
+api.deleteTag().tag((Tag) r).delete();
+  }
+}
+api.assignBranch()
+.assignTo(Branch.of(defaultBranch.getName(), 
initialHashOfDefaultBranch))
+.branch(defaultBranch)
+.assign();
+  }
+
+  private NessieCatalog initNessieCatalog(String ref) {
+NessieCatalog newCatalog = new NessieCatalog();
+newCatalog.setConf(hadoopConfig);
+ImmutableMap options =
+ImmutableMap.of(
+"ref",
+   

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419080356


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java:
##
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Path;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.CatalogProperties;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.view.BaseView;
+import org.apache.iceberg.view.View;
+import org.apache.iceberg.view.ViewCatalogTests;
+import org.apache.iceberg.view.ViewMetadata;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.junit.jupiter.api.extension.RegisterExtension;
+import org.junit.jupiter.api.io.TempDir;
+import org.projectnessie.client.api.NessieApiV1;
+import org.projectnessie.client.ext.NessieApiVersion;
+import org.projectnessie.client.ext.NessieApiVersions;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.jaxrs.ext.NessieJaxRsExtension;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.Reference;
+import org.projectnessie.model.Tag;
+import org.projectnessie.versioned.storage.common.persist.Persist;
+import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory;
+import org.projectnessie.versioned.storage.testextension.NessieBackend;
+import org.projectnessie.versioned.storage.testextension.NessiePersist;
+import org.projectnessie.versioned.storage.testextension.PersistExtension;
+
+@ExtendWith(PersistExtension.class)
+@NessieBackend(InmemoryBackendTestFactory.class)
+@NessieApiVersions // test all versions
+public class TestNessieViewCatalog extends ViewCatalogTests {
+
+  @NessiePersist static Persist persist;
+
+  @RegisterExtension
+  static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() 
-> persist);
+
+  @TempDir public Path temp;
+
+  private NessieCatalog catalog;
+  private NessieApiV1 api;
+  private NessieApiVersion apiVersion;
+  private Configuration hadoopConfig;
+  private String initialHashOfDefaultBranch;
+  private String uri;
+
+  @BeforeEach
+  public void setUp(NessieClientFactory clientFactory, @NessieClientUri URI 
nessieUri)
+  throws NessieNotFoundException {
+api = clientFactory.make();
+apiVersion = clientFactory.apiVersion();
+initialHashOfDefaultBranch = api.getDefaultBranch().getHash();
+uri = nessieUri.toASCIIString();
+hadoopConfig = new Configuration();
+catalog = initNessieCatalog("main");
+  }
+
+  @AfterEach
+  public void afterEach() throws IOException {
+resetData();
+try {
+  if (catalog != null) {
+catalog.close();
+  }
+  api.close();
+} finally {
+  catalog = null;
+  api = null;
+  hadoopConfig = null;
+}
+  }
+
+  private void resetData() throws NessieConflictException, 
NessieNotFoundException {
+Branch defaultBranch = api.getDefaultBranch();
+for (Reference r : api.getAllReferences().get().getReferences()) {
+  if (r instanceof Branch && !r.getName().equals(defaultBranch.getName())) 
{
+api.deleteBranch().branch((Branch) r).delete();
+  }
+  if (r instanceof Tag) {
+api.deleteTag().tag((Tag) r).delete();
+  }
+}
+api.assignBranch()
+.assignTo(Branch.of(defaultBranch.getName(), 
initialHashOfDefaultBranch))
+.branch(defaultBranch)
+.assign();
+  }
+
+  private NessieCatalog initNessieCatalog(String ref) {
+NessieCatalog newCatalog = new NessieCatalog();
+newCatalog.setConf(hadoopConfig);
+ImmutableMap options =
+ImmutableMap.of(
+"ref",
+   

Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #183:
URL: https://github.com/apache/iceberg-python/pull/183#discussion_r1419083232


##
pyiceberg/io/pyarrow.py:
##
@@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> 
Optional[T]:
 """Visit a primitive type."""
 
 
-def _get_field_id(field: pa.Field) -> Optional[int]:
-for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS:
-if field_id_str := field.metadata.get(pyarrow_field_id_key):
-return int(field_id_str.decode())
-return None
+class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]):
+counter: count[int]
+missing_is_metadata: Optional[bool]
 
+def __init__(self) -> None:
+self.counter = count()

Review Comment:
   Sorry for the limited context here, it will skip the fields if it doesn't 
have an ID:
   
![image](https://github.com/apache/iceberg-python/assets/1134248/f9e3b551-d986-4c1a-a6d3-4665fff7fb5c)
   
   Which is kind of awkward. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419087100


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419083813


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419086031


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419087834


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on code in PR #183:
URL: https://github.com/apache/iceberg-python/pull/183#discussion_r1419090834


##
pyiceberg/io/pyarrow.py:
##
@@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> 
Optional[T]:
 """Visit a primitive type."""
 
 
-def _get_field_id(field: pa.Field) -> Optional[int]:
-for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS:
-if field_id_str := field.metadata.get(pyarrow_field_id_key):
-return int(field_id_str.decode())
-return None
+class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]):
+counter: count[int]
+missing_is_metadata: Optional[bool]
 
+def __init__(self) -> None:
+self.counter = count()

Review Comment:
   I like the idea of using `assign_fresh_schema_ids`, since that one is a 
pre-order, and the default is post-order. I've updated the code, let me know 
what you think!



##
pyiceberg/io/pyarrow.py:
##
@@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> 
Optional[T]:
 """Visit a primitive type."""
 
 
-def _get_field_id(field: pa.Field) -> Optional[int]:
-for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS:
-if field_id_str := field.metadata.get(pyarrow_field_id_key):
-return int(field_id_str.decode())
-return None
+class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]):
+counter: count[int]
+missing_is_metadata: Optional[bool]
 
+def __init__(self) -> None:
+self.counter = count()

Review Comment:
   I like the idea of using `assign_fresh_schema_ids`, since that one is a 
pre-order, and the default is post-order. I've updated the code, let me know 
what you think! Appreciate the review!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Automatically create the tables for the `SqlCatalog` [iceberg-python]

2023-12-07 Thread via GitHub


Fokko merged PR #186:
URL: https://github.com/apache/iceberg-python/pull/186


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419093588


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419090042


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419091171


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419092132


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419095071


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [I] Automatically create the tables for the `SqlCatalog` [iceberg-python]

2023-12-07 Thread via GitHub


Fokko closed issue #184: Automatically create the tables for the `SqlCatalog`
URL: https://github.com/apache/iceberg-python/issues/184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419093010


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema SCHEMA =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema ALTERED =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
SCHEMA).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(SCHEMA)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+IcebergView contentInitialMain = getView(BRANCH, KEY);
+IcebergView contentInitialBranc

Re: [I] When will the 0.6.0 version be released? [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on issue #192:
URL: https://github.com/apache/iceberg-python/issues/192#issuecomment-1845492573

   @1taoze The goal is this year, and I'm quite confident that it is realistic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-07 Thread via GitHub


ConeyLiu commented on code in PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#discussion_r1419107868


##
spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala:
##
@@ -39,22 +41,31 @@ import org.apache.spark.sql.types.StructType
  */
 object ReplaceStaticInvoke extends Rule[LogicalPlan] {
 
-  override def apply(plan: LogicalPlan): LogicalPlan =
-plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, 
FILTER)) {
-  case filter @ Filter(condition, _) =>
-val newCondition = 
condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) {
-  case c @ BinaryComparison(left: StaticInvoke, right) if 
canReplace(left) && right.foldable =>
-c.withNewChildren(Seq(replaceStaticInvoke(left), right))
+  private val rule:PartialFunction[Expression, Expression] = {
+case c@BinaryComparison(left: StaticInvoke, right) if canReplace(left) && 
right.foldable =>
+  c.withNewChildren(Seq(replaceStaticInvoke(left), right))
 
-  case c @ BinaryComparison(left, right: StaticInvoke) if 
canReplace(right) && left.foldable =>
-c.withNewChildren(Seq(left, replaceStaticInvoke(right)))
-}
+case c@BinaryComparison(left, right: StaticInvoke) if canReplace(right) && 
left.foldable =>
+  c.withNewChildren(Seq(left, replaceStaticInvoke(right)))
+  }
 
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformWithPruning(_.containsAnyPattern(FILTER, JOIN)) {
+  case filter @ Filter(condition, _) =>
+val newCondition = 
condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON))(rule)
 if (newCondition fastEquals condition) {
   filter
 } else {
   filter.copy(condition = newCondition)
 }
+  case j @ Join(_, _, _, Some(condition), _) =>

Review Comment:
   Here the join condition can be pushed to the leaf node by the Spark 
optimizer, right? I think this can not cover the COW/MOR cases. COW/MOR needs 
to do some special handling here. I plan to do it, however, I've been quite 
busy lately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] maint(transforms): replace `type()` calls with `isinstance()` [iceberg-python]

2023-12-07 Thread via GitHub


Fokko merged PR #188:
URL: https://github.com/apache/iceberg-python/pull/188


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] maint(transforms): replace `type()` calls with `isinstance()` [iceberg-python]

2023-12-07 Thread via GitHub


Fokko commented on PR #188:
URL: https://github.com/apache/iceberg-python/pull/188#issuecomment-1845509371

   Thanks for fixing this @jayceslesar 🙌 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419122432


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java:
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.apache.iceberg.view.ViewMetadataParser;
+import org.projectnessie.client.http.HttpClientException;
+import org.projectnessie.error.NessieBadRequestException;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Content;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergTable;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.Reference;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class NessieViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(NessieViewOperations.class);
+
+  private final NessieIcebergClient client;
+  private final ContentKey key;
+  private final FileIO fileIO;
+  private IcebergView icebergView;
+
+  NessieViewOperations(ContentKey key, NessieIcebergClient client, FileIO 
fileIO) {
+this.key = key;
+this.client = client;
+this.fileIO = fileIO;
+  }
+
+  @Override
+  public void doRefresh() {
+try {
+  client.refresh();
+} catch (NessieNotFoundException e) {
+  throw new RuntimeException(
+  String.format(
+  "Failed to refresh as ref '%s' is no longer valid.", 
client.getRef().getName()),
+  e);
+}
+String metadataLocation = null;
+Reference reference = client.getRef().getReference();
+try {
+  Content content = 
client.getApi().getContent().key(key).reference(reference).get().get(key);
+  LOG.debug("Content '{}' at '{}': {}", key, reference, content);
+  if (content == null) {
+if (currentMetadataLocation() != null) {
+  throw new NoSuchViewException("View does not exist: %s in %s", key, 
reference);
+}
+  } else {
+this.icebergView =
+content
+.unwrap(IcebergView.class)
+.orElseThrow(
+() -> {
+  if (content instanceof IcebergTable) {
+return new AlreadyExistsException(
+"Table with same name already exists: %s in %s", 
key, reference);
+  } else {
+return new AlreadyExistsException(
+"Cannot refresh Iceberg view: Nessie points to a 
non-Iceberg object for path: %s in %s",
+key, reference);
+  }
+});
+metadataLocation = icebergView.getMetadataLocation();
+  }
+} catch (NessieNotFoundException ex) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s in %s", key, 
reference);
+  }
+}
+refreshFromMetadataLocation(
+metadataLocation,
+null,
+2,
+location ->
+NessieUtil.loadViewMetadata(
+ViewMetadataParser.read(io().newInputFile(location)), 
location, reference));
+  }
+
+  @Override
+  public void doCommit(ViewMetadata base, ViewMetadata metadata) {
+String newMetadataLocation = writeNewMetadataIfRequired(metadata);
+
+boolean failure = false;
+try {
+  String contentId = icebergView == null ? null : icebergView.getId();
+  client.commitView(base, metadata, newMetadataLocation, contentId, key);
+} catch (NessieConflictException | NessieNotFoundException | 
HttpClientException ex) {
+  if (ex instanceof NessieConflictException || ex instanceof 
NessieNotFoundException) {
+failure = true;
+  }
+  NessieUtil.handleExceptionsForCommits(ex, client.ref

Re: [I] The "Status" paragraph in the readme seems very outdated [iceberg]

2023-12-07 Thread via GitHub


bitsondatadev commented on issue #9127:
URL: https://github.com/apache/iceberg/issues/9127#issuecomment-1845537180

   @ronkorving Let me know if you'd like to raise the PR, or I can do it as 
well. For now pointing to the roadmap page will be the best. Eventually, I'd 
like there to be something less static that exists directly in GitHub for this. 
A main project board or possibly a wiki with links to individual projects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Optimize manifest evaluation for super wide tables [iceberg]

2023-12-07 Thread via GitHub


irshadcc commented on PR #9147:
URL: https://github.com/apache/iceberg/pull/9147#issuecomment-1845560669

   > Thanks for raising this @irshadcc, this looks good to me. I've left two 
small comments, could you take a peek at those? Thanks for fixing this! 🙌
   
   I've added the Javadoc and removed the empty. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]

2023-12-07 Thread via GitHub


nastra commented on code in PR #9161:
URL: https://github.com/apache/iceberg/pull/9161#discussion_r1419205681


##
api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java:
##
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.Method;
+import java.text.MessageFormat;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Stream;
+import org.assertj.core.util.Preconditions;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.BeforeEachCallback;
+import org.junit.jupiter.api.extension.Extension;
+import org.junit.jupiter.api.extension.ExtensionContext;
+import org.junit.jupiter.api.extension.ParameterContext;
+import org.junit.jupiter.api.extension.ParameterResolutionException;
+import org.junit.jupiter.api.extension.ParameterResolver;
+import org.junit.jupiter.api.extension.TestTemplateInvocationContext;
+import org.junit.jupiter.api.extension.TestTemplateInvocationContextProvider;
+import org.junit.platform.commons.support.AnnotationSupport;
+import org.junit.platform.commons.support.HierarchyTraversalMode;
+
+/**
+ * This extension is used to implement parameterized tests for Junit 5 to 
replace Parameterized in
+ * Junit4.
+ *
+ * When use this extension, all tests must be annotated by {@link 
TestTemplate}.

Review Comment:
   just an FYI that we need to make sure to mention where this was taken from 
and properly update the `LICENSE` (similarly to how it was done in #8366)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

2023-12-07 Thread via GitHub


cccs-jc commented on PR #8980:
URL: https://github.com/apache/iceberg/pull/8980#issuecomment-1845659813

   > @cccs-jc i mean let's have changes for 3.5 with it's test only in 3.5 and 
we can backport the change with it's test in lower spark version like 3.4 and 
3.3, 3.4 test failures are expected right as we don't have changes for 
SparkMicrobatch stream for 3.4 in it.
   > 
   > Also i would request to revert the change in core for Microbatch.java if 
we don't have coverage for it as i am unsure when would that fail (may be some 
legacy handling)
   > 
   > Apologies for getting being late in getting back at this.
   
   Keeping the `+ existingFilesCount();` in the SparkMicrobatch.java makes no 
sense to me.
   
   What is the purpose of adding that to the currentFileIndex ?
   
   The way I understand it currentFileIndex is a position of the added files. 
So we want to only count the added files (addedFilesCount()). These are the 
files that you want a streaming job to consume.
   
   Can you explain what is the purpose of using `existingFilesCount` here ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-07 Thread via GitHub


tmnd1991 commented on code in PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#discussion_r1419309041


##
spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala:
##
@@ -39,22 +41,31 @@ import org.apache.spark.sql.types.StructType
  */
 object ReplaceStaticInvoke extends Rule[LogicalPlan] {
 
-  override def apply(plan: LogicalPlan): LogicalPlan =
-plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, 
FILTER)) {
-  case filter @ Filter(condition, _) =>
-val newCondition = 
condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) {
-  case c @ BinaryComparison(left: StaticInvoke, right) if 
canReplace(left) && right.foldable =>
-c.withNewChildren(Seq(replaceStaticInvoke(left), right))
+  private val rule:PartialFunction[Expression, Expression] = {
+case c@BinaryComparison(left: StaticInvoke, right) if canReplace(left) && 
right.foldable =>
+  c.withNewChildren(Seq(replaceStaticInvoke(left), right))
 
-  case c @ BinaryComparison(left, right: StaticInvoke) if 
canReplace(right) && left.foldable =>
-c.withNewChildren(Seq(left, replaceStaticInvoke(right)))
-}
+case c@BinaryComparison(left, right: StaticInvoke) if canReplace(right) && 
left.foldable =>
+  c.withNewChildren(Seq(left, replaceStaticInvoke(right)))
+  }
 
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+plan.transformWithPruning(_.containsAnyPattern(FILTER, JOIN)) {
+  case filter @ Filter(condition, _) =>
+val newCondition = 
condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON))(rule)
 if (newCondition fastEquals condition) {
   filter
 } else {
   filter.copy(condition = newCondition)
 }
+  case j @ Join(_, _, _, Some(condition), _) =>

Review Comment:
   I discovered the bug working with a MERGE statement and actually this works 
both with CoW and MoR, I have it running on my cluster like that, and it's 
correctly pruning all the partitions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-07 Thread via GitHub


tmnd1991 commented on code in PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#discussion_r1419314263


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala:
##
@@ -40,14 +37,20 @@ import org.apache.spark.sql.types.StructType
 object ReplaceStaticInvoke extends Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan =
-plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, 
FILTER)) {
+plan.transformWithPruning (_.containsPattern(FILTER)) {
   case filter @ Filter(condition, _) =>
-val newCondition = 
condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) {
+val newCondition = 
condition.transformWithPruning(_.containsAnyPattern(BINARY_COMPARISON, IN, 
INSET)) {
   case c @ BinaryComparison(left: StaticInvoke, right) if 
canReplace(left) && right.foldable =>
 c.withNewChildren(Seq(replaceStaticInvoke(left), right))
 
   case c @ BinaryComparison(left, right: StaticInvoke) if 
canReplace(right) && left.foldable =>
 c.withNewChildren(Seq(left, replaceStaticInvoke(right)))
+
+  case in @ In(s: StaticInvoke, _) =>

Review Comment:
   my bad, I fixed it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Switch to junit5 for mr [iceberg]

2023-12-07 Thread via GitHub


lschetanrao opened a new pull request, #9241:
URL: https://github.com/apache/iceberg/pull/9241

   ### Description
   This PR fixes https://github.com/apache/iceberg/issues/9083
   
   The goal here is to switch all imports to Junit5 and to use AssertJ-style 
assertions.
   
   ### Implementation
   All test cases except the ones using parameterized tests have been migrated. 
I thought it is better to wait for https://github.com/apache/iceberg/pull/9161 
to get merged. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2023-12-07 Thread via GitHub


tmnd1991 commented on PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1845761731

   cc @nastra @dramaticlly @advancedxy for review
   thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]

2023-12-07 Thread via GitHub


tmnd1991 commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1845762590

   cc @nastra @dramaticlly @advancedxy for review
   thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



  1   2   >