Re: [PR] Spark SystemFunctions are not pushed down during MERGE [iceberg]
tmnd1991 commented on PR #9233: URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1844912767 Hi @ConeyLiu, this still needs some refinement (mostly wrt testing) but do you think the change make sense? I would avoid to work more on it if I'm way off ;) Thanks 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Add partition evolution [iceberg-python]
nicor88 opened a new issue, #193: URL: https://github.com/apache/iceberg-python/issues/193 ### Feature Request / Improvement ## Contex One of the advantage of Iceberg is that the original "partition" design can be changed over time. This is often the case, as a table is originally implemented for a specific query access pattern, and that can change over time. ## Requirements I would like to use pyiceberg to "evolve" partitions, specifically changing the partition specs. e.g. ``` new_partition_spec = PartitionSpec( PartitionField(source_id=1, field_id=1000, transform=DayTransform(), name="event_date"), PartitionField(source_id=2, field_id=1001, name="marketing_channel") ) with table.update_partitions() as table_update: table_update. update_partition_specs(new_partition_spec) ``` or something similar ## Reference https://iceberg.apache.org/docs/latest/evolution/#partition-evolution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]
HonahX commented on code in PR #183: URL: https://github.com/apache/iceberg-python/pull/183#discussion_r1418516560 ## pyiceberg/io/pyarrow.py: ## @@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> Optional[T]: """Visit a primitive type.""" -def _get_field_id(field: pa.Field) -> Optional[int]: -for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS: -if field_id_str := field.metadata.get(pyarrow_field_id_key): -return int(field_id_str.decode()) -return None +class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]): +counter: count[int] +missing_is_metadata: Optional[bool] +def __init__(self) -> None: +self.counter = count() Review Comment: Shall we use `count(1)` here since we start from 1 in`assign_fresh_schema_ids` https://github.com/apache/iceberg-python/blob/4616d036440bf4cb3733e8d091220587cf290b75/pyiceberg/schema.py#L1221-L1230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] The "Status" paragraph in the readme seems very outdated [iceberg]
ronkorving commented on issue #9127: URL: https://github.com/apache/iceberg/issues/9127#issuecomment-1844981085 @Fokko Thanks for the engagement :) Would you agree that perhaps the readme is not the best place to the "what we're currently working on" information, and perhaps it's better to just remove that part of it? I would be happy to make a PR 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]
tmnd1991 commented on PR #9192: URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1845066093 Can I have a feedback on this one @ConeyLiu, too? Thanks a lot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on PR #8909: URL: https://github.com/apache/iceberg/pull/8909#issuecomment-1845096677 @snazy, @nastra: If this PR is ok. Can I get an approval? If not, what more changes are expected? I thought of waiting for https://github.com/apache/iceberg/pull/8857/ and https://github.com/apache/iceberg/pull/9012 to rebase. But they can rebase too once this PR is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Update Slack Links [iceberg-docs]
nastra commented on PR #295: URL: https://github.com/apache/iceberg-docs/pull/295#issuecomment-1845144252 @bitsondatadev I think you'd need to update this on the `latest` branch as well to properly show up on the website -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix branching and tagging images on 1.2.1 branch [iceberg-docs]
nastra commented on PR #231: URL: https://github.com/apache/iceberg-docs/pull/231#issuecomment-1845144982 @amogh-jahagirdar is this still relevant? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Document reading in Spark using branch and tag identifiers [iceberg]
nastra merged PR #9238: URL: https://github.com/apache/iceberg/pull/9238 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]
Fokko commented on code in PR #106: URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418802658 ## crates/iceberg/src/catalog/mod.rs: ## @@ -758,23 +760,19 @@ mod tests { { "action": "add-spec", "spec": { -"spec-id": 1, "fields": [ { "source-id": 4, -"field-id": 1000, Review Comment: Why remove these? They are part of the `PartitionField`: https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.py#L131-L135 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] test: Add integration tests for rest catalog. [iceberg-rust]
Fokko commented on code in PR #109: URL: https://github.com/apache/iceberg-rust/pull/109#discussion_r1418811263 ## crates/catalog/rest/tests/rest_catalog_test.rs: ## @@ -0,0 +1,376 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Integration tests for rest catalog. + +use iceberg::spec::{FormatVersion, NestedField, PrimitiveType, Schema, Type}; +use iceberg::transaction::Transaction; +use iceberg::{Catalog, Namespace, NamespaceIdent, TableCreation, TableIdent}; +use iceberg_catalog_rest::{RestCatalog, RestCatalogConfig}; +use iceberg_test_utils::docker::DockerCompose; +use iceberg_test_utils::{normalize_test_name, set_up}; +use port_scanner::scan_port_addr; +use std::collections::HashMap; +use tokio::time::sleep; + +const REST_CATALOG_PORT: u16 = 8181; + +struct TestFixture { +_docker_compose: DockerCompose, +rest_catalog: RestCatalog, +} + +async fn set_test_fixture(func: &str) -> TestFixture { +set_up(); +let docker_compose = DockerCompose::new( +normalize_test_name(format!("{}_{func}", module_path!())), +format!("{}/testdata/rest_catalog", env!("CARGO_MANIFEST_DIR")), +); + +// Start docker compose +docker_compose.run(); + +let rest_catalog_ip = docker_compose.get_container_ip("rest"); Review Comment: We can always fix it in a follow up PR if we find out that it doesn't work well 👍 ## crates/catalog/rest/tests/rest_catalog_test.rs: ## @@ -0,0 +1,376 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Integration tests for rest catalog. + +use iceberg::spec::{FormatVersion, NestedField, PrimitiveType, Schema, Type}; +use iceberg::transaction::Transaction; +use iceberg::{Catalog, Namespace, NamespaceIdent, TableCreation, TableIdent}; +use iceberg_catalog_rest::{RestCatalog, RestCatalogConfig}; +use iceberg_test_utils::docker::DockerCompose; +use iceberg_test_utils::{normalize_test_name, set_up}; +use port_scanner::scan_port_addr; +use std::collections::HashMap; +use tokio::time::sleep; + +const REST_CATALOG_PORT: u16 = 8181; + +struct TestFixture { +_docker_compose: DockerCompose, +rest_catalog: RestCatalog, +} + +async fn set_test_fixture(func: &str) -> TestFixture { +set_up(); +let docker_compose = DockerCompose::new( +normalize_test_name(format!("{}_{func}", module_path!())), +format!("{}/testdata/rest_catalog", env!("CARGO_MANIFEST_DIR")), +); + +// Start docker compose +docker_compose.run(); + +let rest_catalog_ip = docker_compose.get_container_ip("rest"); Review Comment: We can always fix it in a follow up PR if we find out that it doesn't work well 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Docs: Update default format version to 2 [iceberg]
zhongyujiang opened a new pull request, #9239: URL: https://github.com/apache/iceberg/pull/9239 Correct the default format version in documentation, we already switched to V2 by default since #8381. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] chore: Add cargo build and build guide [iceberg-rust]
Fokko merged PR #111: URL: https://github.com/apache/iceberg-rust/pull/111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Add hms catalog layout [iceberg-rust]
Fokko merged PR #112: URL: https://github.com/apache/iceberg-rust/pull/112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]
liurenjie1024 commented on code in PR #106: URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418814937 ## crates/iceberg/src/catalog/mod.rs: ## @@ -758,23 +760,19 @@ mod tests { { "action": "add-spec", "spec": { -"spec-id": 1, "fields": [ { "source-id": 4, -"field-id": 1000, Review Comment: I think this `field_id` in request is optional? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] test: Add integration tests for rest catalog. [iceberg-rust]
Fokko commented on code in PR #109: URL: https://github.com/apache/iceberg-rust/pull/109#discussion_r1418813623 ## crates/test_utils/src/docker.rs: ## @@ -0,0 +1,102 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::cmd::{get_cmd_output, run_command}; +use std::process::Command; + +/// A utility to manage lifecycle of docker compose. +/// +/// It's will start docker compose when calling `run` method, and will be stopped when dropped. +#[derive(Debug)] +pub struct DockerCompose { +project_name: String, +docker_compose_dir: String, +} + +impl DockerCompose { Review Comment: In PyIceberg we do this in a Makefile, which seems to make more sense to me, but I'm also fine with this 👍 ## crates/test_utils/src/docker.rs: ## @@ -0,0 +1,102 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::cmd::{get_cmd_output, run_command}; +use std::process::Command; + +/// A utility to manage lifecycle of docker compose. +/// +/// It's will start docker compose when calling `run` method, and will be stopped when dropped. +#[derive(Debug)] +pub struct DockerCompose { +project_name: String, +docker_compose_dir: String, +} + +impl DockerCompose { Review Comment: In PyIceberg we do this in a Makefile, which seems to make more sense to me, but I'm also fine with this 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]
Fokko commented on code in PR #106: URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418820203 ## crates/iceberg/src/catalog/mod.rs: ## @@ -758,23 +760,19 @@ mod tests { { "action": "add-spec", "spec": { -"spec-id": 1, "fields": [ { "source-id": 4, -"field-id": 1000, Review Comment: The open-api spec is a bit unclear, but it isn't right now: https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/open-api/rest-catalog-open-api.py#L283-L284 References: https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/open-api/rest-catalog-open-api.py#L138-L140 References: https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/open-api/rest-catalog-open-api.py#L131-L135 And there it is non-optional. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: reimplement namespace operations [iceberg]
nastra merged PR #8857: URL: https://github.com/apache/iceberg/pull/8857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]
Fokko commented on code in PR #106: URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418824322 ## crates/iceberg/src/catalog/mod.rs: ## @@ -758,23 +760,19 @@ mod tests { { "action": "add-spec", "spec": { -"spec-id": 1, "fields": [ { "source-id": 4, -"field-id": 1000, Review Comment: It looks like it is for historical reasons. For old specs they can be missing, and they'll just auto-increment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]
Fokko commented on code in PR #106: URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r1418823234 ## crates/iceberg/src/catalog/mod.rs: ## @@ -758,23 +760,19 @@ mod tests { { "action": "add-spec", "spec": { -"spec-id": 1, "fields": [ { "source-id": 4, -"field-id": 1000, Review Comment: On Java: https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/core/src/main/java/org/apache/iceberg/MetadataUpdateParser.java#L447-L451 And there it is optional: https://github.com/apache/iceberg/blob/af9522ac7e8e25dc044622c566b66301b6df9581/core/src/main/java/org/apache/iceberg/PartitionSpecParser.java#L128-L145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Add a table property to control the Parquet row-group size of position delete files [iceberg]
zhongyujiang commented on PR #9177: URL: https://github.com/apache/iceberg/pull/9177#issuecomment-1845189563 @rdblue @RussellSpitzer would you mind take a look at this? Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Typo? [iceberg-docs]
Fokko commented on PR #270: URL: https://github.com/apache/iceberg-docs/pull/270#issuecomment-1845241457 Thanks @lorenzwalthert for taking the time to fix this 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] The "Status" paragraph in the readme seems very outdated [iceberg]
Fokko commented on issue #9127: URL: https://github.com/apache/iceberg/issues/9127#issuecomment-1845240644 Yes, I think that makes a lot of sense. @bitsondatadev is also working on updating the [roadmap](https://iceberg.apache.org/roadmap/), it might be a good idea to link to that so we have one single place where people can see what's going on 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Typo? [iceberg-docs]
Fokko merged PR #270: URL: https://github.com/apache/iceberg-docs/pull/270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix spelling [iceberg-docs]
Fokko merged PR #271: URL: https://github.com/apache/iceberg-docs/pull/271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Update vendors.md [iceberg-docs]
Fokko merged PR #275: URL: https://github.com/apache/iceberg-docs/pull/275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
ajantha-bhat commented on PR #8909: URL: https://github.com/apache/iceberg/pull/8909#issuecomment-1845247046 Note: I am squashing the commits into one (since I am having hard time rebasing with 8 commits). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Open-API: Refactor updates with discriminator [iceberg]
Fokko opened a new pull request, #9240: URL: https://github.com/apache/iceberg/pull/9240 This generates nicer code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]
Fokko commented on code in PR #106: URL: https://github.com/apache/iceberg-rust/pull/106#discussion_r141983 ## crates/iceberg/src/catalog/mod.rs: ## @@ -758,23 +760,19 @@ mod tests { { "action": "add-spec", "spec": { -"spec-id": 1, "fields": [ { "source-id": 4, -"field-id": 1000, Review Comment: BTW, cleaned up the spec a bit: https://github.com/apache/iceberg/pull/9240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]
Fokko merged PR #106: URL: https://github.com/apache/iceberg-rust/pull/106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] feat: Add support for `UnboundPartitionSpec`. [iceberg-rust]
Fokko closed issue #98: feat: Add support for `UnboundPartitionSpec`. URL: https://github.com/apache/iceberg-rust/issues/98 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support UnboundPartitionSpec [iceberg-rust]
Fokko commented on PR #106: URL: https://github.com/apache/iceberg-rust/pull/106#issuecomment-1845257956 Thanks @my-vegetable-has-exploded for working on this, and @liurenjie1024 and @Xuanwo for reviewing this 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump actions/setup-python from 4 to 5 [iceberg-python]
Fokko merged PR #191: URL: https://github.com/apache/iceberg-python/pull/191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump fastavro from 1.9.0 to 1.9.1 [iceberg-python]
Fokko merged PR #190: URL: https://github.com/apache/iceberg-python/pull/190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fixed certain operations failing to add new data files during retries [iceberg]
nastra commented on PR #9230: URL: https://github.com/apache/iceberg/pull/9230#issuecomment-1845262257 @ConeyLiu could you also please take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump griffe from 0.38.0 to 0.38.1 [iceberg-python]
Fokko merged PR #189: URL: https://github.com/apache/iceberg-python/pull/189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] maint(transforms): replace `type()` calls with `isinstance()` [iceberg-python]
Fokko commented on PR #188: URL: https://github.com/apache/iceberg-python/pull/188#issuecomment-1845273979 @jayceslesar looks like `black` still wants to reformat some code. Can you run `make lint`? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Automatically create the tables for the `SqlCatalog` [iceberg-python]
Fokko commented on PR #186: URL: https://github.com/apache/iceberg-python/pull/186#issuecomment-1845276186 This is beautiful @cosmastech, in this same PR, can you remove these lines: https://github.com/apache/iceberg-python/blob/5337504963dec7a410ba762da08064578976bd7a/mkdocs/docs/api.md?plain=1#L57-L61 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]
Fokko commented on code in PR #61: URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418929287 ## pyiceberg/table/snapshots.py: ## @@ -116,3 +144,199 @@ class MetadataLogEntry(IcebergBaseModel): class SnapshotLogEntry(IcebergBaseModel): snapshot_id: int = Field(alias="snapshot-id") timestamp_ms: int = Field(alias="timestamp-ms") + + +class SnapshotSummaryCollector: +added_size: int +removed_size: int +added_files: int +removed_files: int +added_eq_delete_files: int +removed_eq_delete_files: int +added_pos_delete_files: int +removed_pos_delete_files: int +added_delete_files: int +removed_delete_files: int +added_records: int +deleted_records: int +added_pos_deletes: int +removed_pos_deletes: int +added_eq_deletes: int +removed_eq_deletes: int + Review Comment: Two things that I like to avoid; complexity and trust issues! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]
Fokko commented on code in PR #61: URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418931669 ## pyiceberg/table/snapshots.py: ## @@ -19,15 +19,40 @@ Any, Dict, List, +Mapping, Optional, ) from pydantic import Field, PrivateAttr, model_serializer from pyiceberg.io import FileIO -from pyiceberg.manifest import ManifestFile, read_manifest_list +from pyiceberg.manifest import DataFile, DataFileContent, ManifestFile, read_manifest_list from pyiceberg.typedef import IcebergBaseModel +ADDED_DATA_FILES = 'added-data-files' +ADDED_DELETE_FILES = 'added-delete-files' +ADDED_EQUALITY_DELETES = 'added-equality-deletes' +ADDED_FILE_SIZE = 'added-files-size' +ADDED_POSITION_DELETES = 'added-position-deletes' +ADDED_POSITION_DELETE_FILES = f'{ADDED_POSITION_DELETES}-files' Review Comment: Ahh, yes! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]
Fokko commented on code in PR #61: URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418932283 ## pyiceberg/table/snapshots.py: ## @@ -19,15 +19,40 @@ Any, Dict, List, +Mapping, Optional, ) from pydantic import Field, PrivateAttr, model_serializer from pyiceberg.io import FileIO -from pyiceberg.manifest import ManifestFile, read_manifest_list +from pyiceberg.manifest import DataFile, DataFileContent, ManifestFile, read_manifest_list from pyiceberg.typedef import IcebergBaseModel +ADDED_DATA_FILES = 'added-data-files' +ADDED_DELETE_FILES = 'added-delete-files' +ADDED_EQUALITY_DELETES = 'added-equality-deletes' +ADDED_FILE_SIZE = 'added-files-size' +ADDED_POSITION_DELETES = 'added-position-deletes' +ADDED_POSITION_DELETE_FILES = f'{ADDED_POSITION_DELETES}-files' +ADDED_RECORDS = 'added-records' +DELETED_DATA_FILES = 'deleted-data-files' +DELETED_RECORDS = 'deleted-records' +EQUALITY_DELETE_FILES = 'added-equality-delete-files' Review Comment: Yes! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]
Fokko commented on code in PR #61: URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418934519 ## pyiceberg/table/snapshots.py: ## @@ -65,6 +90,25 @@ def __init__(self, operation: Operation, **data: Any) -> None: super().__init__(operation=operation, **data) self._additional_properties = data +def __getitem__(self, __key: str) -> Optional[Any]: # type: ignore +"""Return a key as it is a map.""" +if __key == 'operation': Review Comment: It seems to be lower-case here:  I can make it case-insensitive -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]
Fokko commented on code in PR #61: URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1418935983 ## pyiceberg/table/snapshots.py: ## @@ -116,3 +168,193 @@ class MetadataLogEntry(IcebergBaseModel): class SnapshotLogEntry(IcebergBaseModel): snapshot_id: int = Field(alias="snapshot-id") timestamp_ms: int = Field(alias="timestamp-ms") + + +class SnapshotSummaryCollector: +added_size: int +removed_size: int +added_files: int +removed_files: int +added_eq_delete_files: int +removed_eq_delete_files: int +added_pos_delete_files: int +removed_pos_delete_files: int +added_delete_files: int +removed_delete_files: int +added_records: int +deleted_records: int +added_pos_deletes: int +removed_pos_deletes: int +added_eq_deletes: int +removed_eq_deletes: int + +def __init__(self) -> None: +self.added_size = 0 +self.removed_size = 0 +self.added_files = 0 +self.removed_files = 0 +self.added_eq_delete_files = 0 +self.removed_eq_delete_files = 0 +self.added_pos_delete_files = 0 +self.removed_pos_delete_files = 0 +self.added_delete_files = 0 +self.removed_delete_files = 0 +self.added_records = 0 +self.deleted_records = 0 +self.added_pos_deletes = 0 +self.removed_pos_deletes = 0 +self.added_eq_deletes = 0 +self.removed_eq_deletes = 0 + +def add_file(self, data_file: DataFile) -> None: +self.added_size += data_file.file_size_in_bytes + +if data_file.content == DataFileContent.DATA: +self.added_files += 1 +self.added_records += data_file.record_count +elif data_file.content == DataFileContent.POSITION_DELETES: +self.added_delete_files += 1 +self.added_pos_delete_files += 1 +self.added_pos_deletes += data_file.record_count +elif data_file.content == DataFileContent.EQUALITY_DELETES: +self.added_delete_files += 1 +self.added_eq_delete_files += 1 +self.added_eq_deletes += data_file.record_count +else: +raise ValueError(f"Unknown data file content: {data_file.content}") + +def remove_file(self, data_file: DataFile) -> None: +self.removed_size += data_file.file_size_in_bytes + +if data_file.content == DataFileContent.DATA: +self.removed_files += 1 +self.deleted_records += data_file.record_count +elif data_file.content == DataFileContent.POSITION_DELETES: +self.removed_delete_files += 1 +self.removed_pos_delete_files += 1 +self.removed_pos_deletes += data_file.record_count +elif data_file.content == DataFileContent.EQUALITY_DELETES: +self.removed_delete_files += 1 +self.removed_eq_delete_files += 1 +self.removed_eq_deletes += data_file.record_count +else: +raise ValueError(f"Unknown data file content: {data_file.content}") + +def build(self) -> Dict[str, str]: +def set_when_positive(properties: Dict[str, str], num: int, property_name: str) -> None: +if num > 0: +properties[property_name] = str(num) + +properties: Dict[str, str] = {} +set_when_positive(properties, self.added_size, ADDED_FILE_SIZE) +set_when_positive(properties, self.removed_size, REMOVED_FILE_SIZE) +set_when_positive(properties, self.added_files, ADDED_DATA_FILES) Review Comment: That's not a minor, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Automatically create the tables for the `SqlCatalog` [iceberg-python]
cosmastech commented on PR #186: URL: https://github.com/apache/iceberg-python/pull/186#issuecomment-1845323941 > This is beautiful @cosmastech, in this same PR, can you remove these lines: > > https://github.com/apache/iceberg-python/blob/5337504963dec7a410ba762da08064578976bd7a/mkdocs/docs/api.md?plain=1#L57-L61 > > It also looks like the CI is a bit sad, can you run `make lint` as well? I believe we should be good to go. @Fokko let me know if I missed anything else -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fixed certain operations failing to add new data files during retries [iceberg]
nastra commented on code in PR #9230: URL: https://github.com/apache/iceberg/pull/9230#discussion_r1418954500 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -895,7 +895,7 @@ private void cleanUncommittedAppends(Set committed) { } } - this.cachedNewDataManifests = committedNewDataManifests; + this.cachedNewDataManifests = null; Review Comment: rather than setting this to `null`, I think a better approach would be to handle this exactly like it's done for `cachedNewDeleteManifests`, where `cachedNewDeleteManifests` is initialized to be a linked list and then is iterated over and files are removed. I did a quick check with ``` --- a/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java +++ b/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java @@ -94,7 +94,7 @@ abstract class MergingSnapshotProducer extends SnapshotProducer { private PartitionSpec dataSpec; // cache new data manifests after writing - private List cachedNewDataManifests = null; + private final List cachedNewDataManifests = Lists.newLinkedList(); private boolean hasNewDataFiles = false; // cache new manifests for delete files @@ -885,17 +885,15 @@ abstract class MergingSnapshotProducer extends SnapshotProducer { } private void cleanUncommittedAppends(Set committed) { -if (cachedNewDataManifests != null) { - List committedNewDataManifests = Lists.newArrayList(); - for (ManifestFile manifest : cachedNewDataManifests) { -if (committed.contains(manifest)) { - committedNewDataManifests.add(manifest); -} else { - deleteFile(manifest.path()); +if (!cachedNewDataManifests.isEmpty()) { + ListIterator dataManifestsIterator = cachedNewDataManifests.listIterator(); + while (dataManifestsIterator.hasNext()) { +ManifestFile dataManifest = dataManifestsIterator.next(); +if (!committed.contains(dataManifest)) { + deleteFile(dataManifest.path()); + dataManifestsIterator.remove(); } } - - this.cachedNewDataManifests = null; } ListIterator deleteManifestsIterator = cachedNewDeleteManifests.listIterator(); @@ -950,12 +948,12 @@ abstract class MergingSnapshotProducer extends SnapshotProducer { } private List newDataFilesAsManifests() { -if (hasNewDataFiles && cachedNewDataManifests != null) { +if (hasNewDataFiles && !cachedNewDataManifests.isEmpty()) { cachedNewDataManifests.forEach(file -> deleteFile(file.path())); - cachedNewDataManifests = null; + cachedNewDataManifests.clear(); } -if (cachedNewDataManifests == null) { +if (cachedNewDataManifests.isEmpty()) { try { RollingManifestWriter writer = newRollingManifestWriter(dataSpec()); try { @@ -968,7 +966,7 @@ abstract class MergingSnapshotProducer extends SnapshotProducer { writer.close(); } -this.cachedNewDataManifests = writer.toManifestFiles(); +this.cachedNewDataManifests.addAll(writer.toManifestFiles()); ``` and that seemed to work. We would probably want to use the same approach in `FastAppend` (but I haven't checked that part for `FastAppend`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Update default format version to 2 [iceberg]
nastra commented on PR #9239: URL: https://github.com/apache/iceberg/pull/9239#issuecomment-1845334573 thanks @zhongyujiang for adding this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Update default format version to 2 [iceberg]
nastra merged PR #9239: URL: https://github.com/apache/iceberg/pull/9239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException with FileHandlingException [iceberg]
nastra commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418963810 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); Review Comment: I wonder if that should throw in IllegalStateException here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException with FileHandlingException [iceberg]
nastra commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418968003 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); + +try { + return stream.getPos(); +} catch (IOException e) { + throw new RuntimeIOException(e, "Failed to get stream length"); Review Comment: is this change actually related to this particular PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException with FileHandlingException [iceberg]
nastra commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418968329 ## core/src/main/java/org/apache/iceberg/avro/AvroIO.java: ## @@ -178,7 +178,7 @@ static long findStartingRowPos(Supplier open, long start) { SYNC_READER.read(decoder, blockSync); if (!Arrays.equals(fileSync, blockSync)) { -throw new RuntimeIOException("Invalid sync at %s", nextSyncPos); +throw new IOException(String.format("Invalid sync at %s", nextSyncPos)); Review Comment: this change doesn't seem to be related to the scope of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException with FileHandlingException [iceberg]
nastra commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418968329 ## core/src/main/java/org/apache/iceberg/avro/AvroIO.java: ## @@ -178,7 +178,7 @@ static long findStartingRowPos(Supplier open, long start) { SYNC_READER.read(decoder, blockSync); if (!Arrays.equals(fileSync, blockSync)) { -throw new RuntimeIOException("Invalid sync at %s", nextSyncPos); +throw new IOException(String.format("Invalid sync at %s", nextSyncPos)); Review Comment: this change doesn't seem to be related to the scope of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] JDBC catalog fix namespaceExists check [iceberg]
nastra commented on code in PR #8340: URL: https://github.com/apache/iceberg/pull/8340#discussion_r1418975543 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -601,15 +601,76 @@ public void testDropNamespace() { public void testCreateNamespace() { Namespace testNamespace = Namespace.of("testDb", "ns1", "ns2"); assertThat(catalog.namespaceExists(testNamespace)).isFalse(); -// Test with no metadata +assertThat(catalog.namespaceExists(Namespace.of("testDb", "ns1"))).isFalse(); catalog.createNamespace(testNamespace); assertThat(catalog.namespaceExists(testNamespace)).isTrue(); +assertThat(catalog.namespaceExists(Namespace.of("testDb"))).isTrue(); +assertThat(catalog.namespaceExists(Namespace.of("testDb", "ns1"))).isTrue(); +assertThat(catalog.namespaceExists(Namespace.of("ns1", "ns2"))).isFalse(); +assertThat(catalog.namespaceExists(Namespace.of("testDb", "ns%"))).isFalse(); +assertThat(catalog.namespaceExists(Namespace.of("testDb", "ns_"))).isFalse(); +assertThat(catalog.namespaceExists(Namespace.of("testDb", "ns1", "ns2"))).isTrue(); +assertThat(catalog.namespaceExists(Namespace.of("testDb", "ns1", "ns2", "ns3"))).isFalse(); + } + + @Test + public void testCreateNamespaceWithBackslashCharacter() { + Review Comment: nit: unnecessary whitespace -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
ajantha-bhat commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418996270 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); Review Comment: Yes, I changed this to `IllegalStateException` based on Dan's comment last time. https://github.com/apache/iceberg/pull/6887#discussion_r1114846404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
ajantha-bhat commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418998188 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); + +try { + return stream.getPos(); +} catch (IOException e) { + throw new RuntimeIOException(e, "Failed to get stream length"); Review Comment: Yes. It was also for a special case handling (RuntimeIO exception without IO exception cause). Updated the PR title and description. So, it makes sense now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
ajantha-bhat commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1418998188 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); + +try { + return stream.getPos(); +} catch (IOException e) { + throw new RuntimeIOException(e, "Failed to get stream length"); Review Comment: Yes. It was also for a special case handling (RuntimeIO exception without IO exception cause). Updated the PR title and description. So, it makes sense now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
ajantha-bhat commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1419002428 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); + +try { + return stream.getPos(); +} catch (IOException e) { + throw new RuntimeIOException(e, "Failed to get stream length"); Review Comment: True. This one has a cause. It is not special case. I will revert it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
ajantha-bhat commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1419010809 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); + +try { + return stream.getPos(); +} catch (IOException e) { + throw new RuntimeIOException(e, "Failed to get stream length"); Review Comment: Sorry. I got confused. There was two `RuntimeIOException` from this length() method. Now, one is replaced with `IllegalStateException` as per Dan's comment. So, there will still be one `RuntimeIOException` (which will be replaced with `UncheckedIOException` in the main PR as it has a cause of `IOException`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
ajantha-bhat commented on PR #6887: URL: https://github.com/apache/iceberg/pull/6887#issuecomment-1845411543 @nastra: Thanks for the review. I have replied to comments. I think no new code modifications needed. Please check again. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Replace special case of deprecated RuntimeIOException [iceberg]
nastra commented on code in PR #6887: URL: https://github.com/apache/iceberg/pull/6887#discussion_r1419046382 ## core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java: ## @@ -78,14 +78,13 @@ public Metrics metrics() { @Override public long length() { -if (stream != null) { - try { -return stream.getPos(); - } catch (IOException e) { -throw new RuntimeIOException(e, "Failed to get stream length"); - } +Preconditions.checkNotNull(stream, "Failed to get stream length: no open stream"); Review Comment: did you potentially forget to push? Because I don't see `Preconditions.checkState(...)` here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419059002 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -448,33 +473,82 @@ public void renameTable(TableIdentifier from, TableIdentifier to) { // behavior. So better be safe than sorry. } + private static void validateToContentForRename( + TableIdentifier from, TableIdentifier to, IcebergContent existingToContent) { +if (existingToContent != null) { + if (existingToContent.getType() == Content.Type.ICEBERG_VIEW) { +throw new AlreadyExistsException("Cannot rename %s to %s. View already exists", from, to); + } else if (existingToContent.getType() == Content.Type.ICEBERG_TABLE) { +throw new AlreadyExistsException("Cannot rename %s to %s. Table already exists", from, to); + } else { +throw new AlreadyExistsException( +"Cannot rename %s to %s. Another content with same name already exists", from, to); Review Comment: nit: should this maybe mention the content type here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]
Fokko commented on code in PR #61: URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1419062626 ## pyiceberg/table/snapshots.py: ## @@ -116,3 +168,193 @@ class MetadataLogEntry(IcebergBaseModel): class SnapshotLogEntry(IcebergBaseModel): snapshot_id: int = Field(alias="snapshot-id") timestamp_ms: int = Field(alias="timestamp-ms") + + +class SnapshotSummaryCollector: +added_size: int +removed_size: int +added_files: int +removed_files: int +added_eq_delete_files: int +removed_eq_delete_files: int +added_pos_delete_files: int +removed_pos_delete_files: int +added_delete_files: int +removed_delete_files: int +added_records: int +deleted_records: int +added_pos_deletes: int +removed_pos_deletes: int +added_eq_deletes: int +removed_eq_deletes: int + +def __init__(self) -> None: +self.added_size = 0 +self.removed_size = 0 +self.added_files = 0 +self.removed_files = 0 +self.added_eq_delete_files = 0 +self.removed_eq_delete_files = 0 +self.added_pos_delete_files = 0 +self.removed_pos_delete_files = 0 +self.added_delete_files = 0 +self.removed_delete_files = 0 +self.added_records = 0 +self.deleted_records = 0 +self.added_pos_deletes = 0 +self.removed_pos_deletes = 0 +self.added_eq_deletes = 0 +self.removed_eq_deletes = 0 + +def add_file(self, data_file: DataFile) -> None: +self.added_size += data_file.file_size_in_bytes + +if data_file.content == DataFileContent.DATA: +self.added_files += 1 +self.added_records += data_file.record_count +elif data_file.content == DataFileContent.POSITION_DELETES: +self.added_delete_files += 1 +self.added_pos_delete_files += 1 +self.added_pos_deletes += data_file.record_count +elif data_file.content == DataFileContent.EQUALITY_DELETES: +self.added_delete_files += 1 +self.added_eq_delete_files += 1 +self.added_eq_deletes += data_file.record_count +else: +raise ValueError(f"Unknown data file content: {data_file.content}") + +def remove_file(self, data_file: DataFile) -> None: +self.removed_size += data_file.file_size_in_bytes + +if data_file.content == DataFileContent.DATA: +self.removed_files += 1 +self.deleted_records += data_file.record_count +elif data_file.content == DataFileContent.POSITION_DELETES: +self.removed_delete_files += 1 +self.removed_pos_delete_files += 1 +self.removed_pos_deletes += data_file.record_count +elif data_file.content == DataFileContent.EQUALITY_DELETES: +self.removed_delete_files += 1 +self.removed_eq_delete_files += 1 +self.removed_eq_deletes += data_file.record_count +else: +raise ValueError(f"Unknown data file content: {data_file.content}") + +def build(self) -> Dict[str, str]: +def set_when_positive(properties: Dict[str, str], num: int, property_name: str) -> None: +if num > 0: +properties[property_name] = str(num) + +properties: Dict[str, str] = {} +set_when_positive(properties, self.added_size, ADDED_FILE_SIZE) +set_when_positive(properties, self.removed_size, REMOVED_FILE_SIZE) +set_when_positive(properties, self.added_files, ADDED_DATA_FILES) +set_when_positive(properties, self.removed_files, DELETED_DATA_FILES) +set_when_positive(properties, self.added_eq_delete_files, EQUALITY_DELETE_FILES) +set_when_positive(properties, self.removed_eq_delete_files, REMOVED_EQUALITY_DELETE_FILES) +set_when_positive(properties, self.added_pos_delete_files, ADDED_POSITION_DELETE_FILES) +set_when_positive(properties, self.removed_pos_delete_files, REMOVED_POSITION_DELETE_FILES) +set_when_positive(properties, self.added_delete_files, ADDED_DELETE_FILES) +set_when_positive(properties, self.removed_delete_files, REMOVED_DELETE_FILES) +set_when_positive(properties, self.added_records, ADDED_RECORDS) +set_when_positive(properties, self.deleted_records, DELETED_RECORDS) +set_when_positive(properties, self.added_pos_deletes, ADDED_POSITION_DELETES) +set_when_positive(properties, self.removed_pos_deletes, REMOVED_POSITION_DELETES) +set_when_positive(properties, self.added_eq_deletes, ADDED_EQUALITY_DELETES) +set_when_positive(properties, self.removed_eq_deletes, REMOVED_EQUALITY_DELETES) + +return properties + + +def _truncate_table_summary(summary: Summary, previous_summary: Mapping[str, str]) -> Summary: +for prop in { +TOTAL_DATA_FILES, +TOTAL_DELETE_FILES, +TOTAL_RECORDS, +TOTAL_FILE_SIZE, +TOTAL_
Re: [PR] Core: Fixed certain operations failing to add new data files during retries [iceberg]
ConeyLiu commented on code in PR #9230: URL: https://github.com/apache/iceberg/pull/9230#discussion_r1419062959 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -895,7 +895,7 @@ private void cleanUncommittedAppends(Set committed) { } } - this.cachedNewDataManifests = committedNewDataManifests; + this.cachedNewDataManifests = null; Review Comment: I think @nastra this solution is much better and solid. We should not set `cachedNewDataManifests ` to null directly. Consider the following case: ``` 1. Transaction transaction = table.newTransaction(); 2. transaction.newAppend().appendFile(...).commit(); // generate new manifest file 3. // transaction operation failed 4. transaction.commitTransaction(); // failed to commit ``` With the changes in this PR, the new manifest files generated by step 2 will not be deleted in step 4. I think the failed UTs could be verified this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419065698 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -141,16 +147,26 @@ private UpdateableReference loadReference(String requestedRef, String hash) { } } + /** @deprecated will be removed after 1.5.0; use listContents() instead */ + @Deprecated Review Comment: it seems a bit weird that there are `commitTable()` / `commitView()` methods, but all the other view/table-related methods are unified to `xyzContent()`. Wouldn't it be better to just leave `listTables()` (and others) and introduce `listViews()` (and similar)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]
ConeyLiu commented on code in PR #9192: URL: https://github.com/apache/iceberg/pull/9192#discussion_r1419067449 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala: ## @@ -40,14 +37,20 @@ import org.apache.spark.sql.types.StructType object ReplaceStaticInvoke extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = -plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, FILTER)) { +plan.transformWithPruning (_.containsPattern(FILTER)) { case filter @ Filter(condition, _) => -val newCondition = condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) { +val newCondition = condition.transformWithPruning(_.containsAnyPattern(BINARY_COMPARISON, IN, INSET)) { case c @ BinaryComparison(left: StaticInvoke, right) if canReplace(left) && right.foldable => c.withNewChildren(Seq(replaceStaticInvoke(left), right)) case c @ BinaryComparison(left, right: StaticInvoke) if canReplace(right) && left.foldable => c.withNewChildren(Seq(left, replaceStaticInvoke(right))) + + case in @ In(s: StaticInvoke, _) => Review Comment: why not check if `s` can be replaced or not? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Use Awaitility instead of Thread.sleep [iceberg]
yyy1000 commented on PR #9224: URL: https://github.com/apache/iceberg/pull/9224#issuecomment-1845461829 > > Hi, @nk1506 It seems that my code can't pass the unit tests. Is it because of the default waiting time is not enough? Do you have any ideas? > > Please refer [here](https://github.com/apache/iceberg/blob/6a9d3c77977baff4295ee2dde0150d73c8c46af1/aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogLock.java#L146) . Also it is advisable to set the `atMost` too. Oh, got it. I should use condition 'or' but not 'and'. :) Now the unit test can run in my local env. Awaitliity use default timeout of 10 seconds, and I think it is enough in this case. https://github.com/awaitility/awaitility/wiki/Usage#defaults So adding a 'atMost' seems not necessary here. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]
ConeyLiu commented on PR #9192: URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1845465132 @tmnd1991 thanks for your contribution. I think this is a valid fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419074476 ## nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java: ## @@ -180,6 +188,32 @@ protected Table createTable(TableIdentifier tableIdentifier, Schema schema) { return catalog.createTable(tableIdentifier, schema); } + protected View createView(NessieCatalog nessieCatalog, TableIdentifier tableIdentifier) { +Schema schema = new Schema(StructType.of(required(1, "id", LongType.get())).fields()); +return createView(nessieCatalog, tableIdentifier, schema); + } + + protected View createView( + NessieCatalog nessieCatalog, TableIdentifier tableIdentifier, Schema schema) { +createMissingNamespaces(tableIdentifier); +return nessieCatalog +.buildView(tableIdentifier) +.withSchema(schema) +.withDefaultNamespace(tableIdentifier.namespace()) +.withQuery("spark", "select * from ns.tbl") +.create(); + } + + protected View replaceView(NessieCatalog nessieCatalog, TableIdentifier identifier) { +Schema schema = new Schema(StructType.of(required(2, "age", Types.IntegerType.get())).fields()); Review Comment: ```suggestion Schema schema = new Schema(required(2, "age", Types.IntegerType.get(; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419075510 ## nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java: ## @@ -180,6 +188,32 @@ protected Table createTable(TableIdentifier tableIdentifier, Schema schema) { return catalog.createTable(tableIdentifier, schema); } + protected View createView(NessieCatalog nessieCatalog, TableIdentifier tableIdentifier) { +Schema schema = new Schema(StructType.of(required(1, "id", LongType.get())).fields()); Review Comment: ```suggestion Schema schema = new Schema(required(1, "id", LongType.get())); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419073781 ## nessie/src/test/java/org/apache/iceberg/nessie/BaseTestIceberg.java: ## @@ -267,4 +305,23 @@ static DataFile makeDataFile(Table icebergTable, String fileLocation) { .withFileSizeInBytes(Files.localInput(fileLocation).getLength()) .build(); } + + protected static List metadataVersionFiles(String tablePath) { +return filterByExtension(tablePath, getFileExtension(TableMetadataParser.Codec.NONE)); + } + + protected static List filterByExtension(String tablePath, String extension) { +return metadataFiles(tablePath).stream() +.filter(f -> f.endsWith(extension)) +.collect(Collectors.toList()); + } + + @SuppressWarnings( + "RegexpSinglelineJava") // respecting this rule requires a lot more lines of code Review Comment: nit: can this be moved to the line above? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419078008 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java: ## @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.assertj.core.api.Assertions.assertThat; + +import java.io.IOException; +import java.net.URI; +import java.nio.file.Path; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.CatalogProperties; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.view.BaseView; +import org.apache.iceberg.view.View; +import org.apache.iceberg.view.ViewCatalogTests; +import org.apache.iceberg.view.ViewMetadata; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.api.extension.RegisterExtension; +import org.junit.jupiter.api.io.TempDir; +import org.projectnessie.client.api.NessieApiV1; +import org.projectnessie.client.ext.NessieApiVersion; +import org.projectnessie.client.ext.NessieApiVersions; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.jaxrs.ext.NessieJaxRsExtension; +import org.projectnessie.model.Branch; +import org.projectnessie.model.Reference; +import org.projectnessie.model.Tag; +import org.projectnessie.versioned.storage.common.persist.Persist; +import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory; +import org.projectnessie.versioned.storage.testextension.NessieBackend; +import org.projectnessie.versioned.storage.testextension.NessiePersist; +import org.projectnessie.versioned.storage.testextension.PersistExtension; + +@ExtendWith(PersistExtension.class) +@NessieBackend(InmemoryBackendTestFactory.class) +@NessieApiVersions // test all versions +public class TestNessieViewCatalog extends ViewCatalogTests { + + @NessiePersist static Persist persist; + + @RegisterExtension + static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() -> persist); + + @TempDir public Path temp; Review Comment: ```suggestion @TempDir private Path temp; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419081069 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java: ## @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.assertj.core.api.Assertions.assertThat; + +import java.io.IOException; +import java.net.URI; +import java.nio.file.Path; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.CatalogProperties; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.view.BaseView; +import org.apache.iceberg.view.View; +import org.apache.iceberg.view.ViewCatalogTests; +import org.apache.iceberg.view.ViewMetadata; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.api.extension.RegisterExtension; +import org.junit.jupiter.api.io.TempDir; +import org.projectnessie.client.api.NessieApiV1; +import org.projectnessie.client.ext.NessieApiVersion; +import org.projectnessie.client.ext.NessieApiVersions; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.jaxrs.ext.NessieJaxRsExtension; +import org.projectnessie.model.Branch; +import org.projectnessie.model.Reference; +import org.projectnessie.model.Tag; +import org.projectnessie.versioned.storage.common.persist.Persist; +import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory; +import org.projectnessie.versioned.storage.testextension.NessieBackend; +import org.projectnessie.versioned.storage.testextension.NessiePersist; +import org.projectnessie.versioned.storage.testextension.PersistExtension; + +@ExtendWith(PersistExtension.class) +@NessieBackend(InmemoryBackendTestFactory.class) +@NessieApiVersions // test all versions +public class TestNessieViewCatalog extends ViewCatalogTests { + + @NessiePersist static Persist persist; + + @RegisterExtension + static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() -> persist); + + @TempDir public Path temp; + + private NessieCatalog catalog; + private NessieApiV1 api; + private NessieApiVersion apiVersion; + private Configuration hadoopConfig; + private String initialHashOfDefaultBranch; + private String uri; + + @BeforeEach + public void setUp(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws NessieNotFoundException { +api = clientFactory.make(); +apiVersion = clientFactory.apiVersion(); +initialHashOfDefaultBranch = api.getDefaultBranch().getHash(); +uri = nessieUri.toASCIIString(); +hadoopConfig = new Configuration(); +catalog = initNessieCatalog("main"); + } + + @AfterEach + public void afterEach() throws IOException { +resetData(); +try { + if (catalog != null) { +catalog.close(); + } + api.close(); +} finally { + catalog = null; + api = null; + hadoopConfig = null; +} + } + + private void resetData() throws NessieConflictException, NessieNotFoundException { +Branch defaultBranch = api.getDefaultBranch(); +for (Reference r : api.getAllReferences().get().getReferences()) { + if (r instanceof Branch && !r.getName().equals(defaultBranch.getName())) { +api.deleteBranch().branch((Branch) r).delete(); + } + if (r instanceof Tag) { +api.deleteTag().tag((Tag) r).delete(); + } +} +api.assignBranch() +.assignTo(Branch.of(defaultBranch.getName(), initialHashOfDefaultBranch)) +.branch(defaultBranch) +.assign(); + } + + private NessieCatalog initNessieCatalog(String ref) { +NessieCatalog newCatalog = new NessieCatalog(); +newCatalog.setConf(hadoopConfig); +ImmutableMap options = +ImmutableMap.of( +"ref", +
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419080356 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieViewCatalog.java: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.assertj.core.api.Assertions.assertThat; + +import java.io.IOException; +import java.net.URI; +import java.nio.file.Path; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.CatalogProperties; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.view.BaseView; +import org.apache.iceberg.view.View; +import org.apache.iceberg.view.ViewCatalogTests; +import org.apache.iceberg.view.ViewMetadata; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.api.extension.RegisterExtension; +import org.junit.jupiter.api.io.TempDir; +import org.projectnessie.client.api.NessieApiV1; +import org.projectnessie.client.ext.NessieApiVersion; +import org.projectnessie.client.ext.NessieApiVersions; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.jaxrs.ext.NessieJaxRsExtension; +import org.projectnessie.model.Branch; +import org.projectnessie.model.Reference; +import org.projectnessie.model.Tag; +import org.projectnessie.versioned.storage.common.persist.Persist; +import org.projectnessie.versioned.storage.inmemory.InmemoryBackendTestFactory; +import org.projectnessie.versioned.storage.testextension.NessieBackend; +import org.projectnessie.versioned.storage.testextension.NessiePersist; +import org.projectnessie.versioned.storage.testextension.PersistExtension; + +@ExtendWith(PersistExtension.class) +@NessieBackend(InmemoryBackendTestFactory.class) +@NessieApiVersions // test all versions +public class TestNessieViewCatalog extends ViewCatalogTests { + + @NessiePersist static Persist persist; + + @RegisterExtension + static NessieJaxRsExtension server = NessieJaxRsExtension.jaxRsExtension(() -> persist); + + @TempDir public Path temp; + + private NessieCatalog catalog; + private NessieApiV1 api; + private NessieApiVersion apiVersion; + private Configuration hadoopConfig; + private String initialHashOfDefaultBranch; + private String uri; + + @BeforeEach + public void setUp(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws NessieNotFoundException { +api = clientFactory.make(); +apiVersion = clientFactory.apiVersion(); +initialHashOfDefaultBranch = api.getDefaultBranch().getHash(); +uri = nessieUri.toASCIIString(); +hadoopConfig = new Configuration(); +catalog = initNessieCatalog("main"); + } + + @AfterEach + public void afterEach() throws IOException { +resetData(); +try { + if (catalog != null) { +catalog.close(); + } + api.close(); +} finally { + catalog = null; + api = null; + hadoopConfig = null; +} + } + + private void resetData() throws NessieConflictException, NessieNotFoundException { +Branch defaultBranch = api.getDefaultBranch(); +for (Reference r : api.getAllReferences().get().getReferences()) { + if (r instanceof Branch && !r.getName().equals(defaultBranch.getName())) { +api.deleteBranch().branch((Branch) r).delete(); + } + if (r instanceof Tag) { +api.deleteTag().tag((Tag) r).delete(); + } +} +api.assignBranch() +.assignTo(Branch.of(defaultBranch.getName(), initialHashOfDefaultBranch)) +.branch(defaultBranch) +.assign(); + } + + private NessieCatalog initNessieCatalog(String ref) { +NessieCatalog newCatalog = new NessieCatalog(); +newCatalog.setConf(hadoopConfig); +ImmutableMap options = +ImmutableMap.of( +"ref", +
Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]
Fokko commented on code in PR #183: URL: https://github.com/apache/iceberg-python/pull/183#discussion_r1419083232 ## pyiceberg/io/pyarrow.py: ## @@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> Optional[T]: """Visit a primitive type.""" -def _get_field_id(field: pa.Field) -> Optional[int]: -for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS: -if field_id_str := field.metadata.get(pyarrow_field_id_key): -return int(field_id_str.decode()) -return None +class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]): +counter: count[int] +missing_is_metadata: Optional[bool] +def __init__(self) -> None: +self.counter = count() Review Comment: Sorry for the limited context here, it will skip the fields if it doesn't have an ID:  Which is kind of awkward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419087100 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419083813 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419086031 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419087834 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]
Fokko commented on code in PR #183: URL: https://github.com/apache/iceberg-python/pull/183#discussion_r1419090834 ## pyiceberg/io/pyarrow.py: ## @@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> Optional[T]: """Visit a primitive type.""" -def _get_field_id(field: pa.Field) -> Optional[int]: -for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS: -if field_id_str := field.metadata.get(pyarrow_field_id_key): -return int(field_id_str.decode()) -return None +class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]): +counter: count[int] +missing_is_metadata: Optional[bool] +def __init__(self) -> None: +self.counter = count() Review Comment: I like the idea of using `assign_fresh_schema_ids`, since that one is a pre-order, and the default is post-order. I've updated the code, let me know what you think! ## pyiceberg/io/pyarrow.py: ## @@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> Optional[T]: """Visit a primitive type.""" -def _get_field_id(field: pa.Field) -> Optional[int]: -for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS: -if field_id_str := field.metadata.get(pyarrow_field_id_key): -return int(field_id_str.decode()) -return None +class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]): +counter: count[int] +missing_is_metadata: Optional[bool] +def __init__(self) -> None: +self.counter = count() Review Comment: I like the idea of using `assign_fresh_schema_ids`, since that one is a pre-order, and the default is post-order. I've updated the code, let me know what you think! Appreciate the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Automatically create the tables for the `SqlCatalog` [iceberg-python]
Fokko merged PR #186: URL: https://github.com/apache/iceberg-python/pull/186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419093588 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419090042 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419091171 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419092132 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419095071 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [I] Automatically create the tables for the `SqlCatalog` [iceberg-python]
Fokko closed issue #184: Automatically create the tables for the `SqlCatalog` URL: https://github.com/apache/iceberg-python/issues/184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419093010 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java: ## @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import static org.apache.iceberg.types.Types.NestedField.optional; +import static org.apache.iceberg.types.Types.NestedField.required; + +import java.io.File; +import java.io.IOException; +import java.net.URI; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.view.SQLViewRepresentation; +import org.apache.iceberg.view.View; +import org.assertj.core.api.Assertions; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.projectnessie.client.ext.NessieClientFactory; +import org.projectnessie.client.ext.NessieClientUri; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Branch; +import org.projectnessie.model.CommitMeta; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.ImmutableTableReference; +import org.projectnessie.model.LogResponse.LogEntry; + +public class TestNessieView extends BaseTestIceberg { + + private static final String BRANCH = "iceberg-view-test"; + + private static final String DB_NAME = "db"; + private static final String VIEW_NAME = "view"; + private static final TableIdentifier VIEW_IDENTIFIER = TableIdentifier.of(DB_NAME, VIEW_NAME); + private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME); + private static final Schema SCHEMA = + new Schema(Types.StructType.of(required(1, "id", Types.LongType.get())).fields()); + private static final Schema ALTERED = + new Schema( + Types.StructType.of( + required(1, "id", Types.LongType.get()), + optional(2, "data", Types.LongType.get())) + .fields()); + + private String viewLocation; + + public TestNessieView() { +super(BRANCH); + } + + @Override + @BeforeEach + public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri URI nessieUri) + throws IOException { +super.beforeEach(clientFactory, nessieUri); +this.viewLocation = +createView(catalog, VIEW_IDENTIFIER, SCHEMA).location().replaceFirst("file:", ""); + } + + @Override + @AfterEach + public void afterEach() throws Exception { +// drop the view data +if (viewLocation != null) { + try (Stream walk = Files.walk(Paths.get(viewLocation))) { + walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete); + } + catalog.dropView(VIEW_IDENTIFIER); +} + +super.afterEach(); + } + + private IcebergView getView(ContentKey key) throws NessieNotFoundException { +return getView(BRANCH, key); + } + + private IcebergView getView(String ref, ContentKey key) throws NessieNotFoundException { +return api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get(); + } + + /** Verify that Nessie always returns the globally-current global-content w/ only DMLs. */ + @Test + public void verifyStateMovesForDML() throws Exception { +// 1. initialize view +View icebergView = catalog.loadView(VIEW_IDENTIFIER); +icebergView +.replaceVersion() +.withQuery("spark", "some query") +.withSchema(SCHEMA) +.withDefaultNamespace(VIEW_IDENTIFIER.namespace()) +.commit(); + +// 2. create 2nd branch +String testCaseBranch = "verify-global-moving"; +api.createReference() +.sourceRefName(BRANCH) +.reference(Branch.of(testCaseBranch, catalog.currentHash())) +.create(); +IcebergView contentInitialMain = getView(BRANCH, KEY); +IcebergView contentInitialBranc
Re: [I] When will the 0.6.0 version be released? [iceberg-python]
Fokko commented on issue #192: URL: https://github.com/apache/iceberg-python/issues/192#issuecomment-1845492573 @1taoze The goal is this year, and I'm quite confident that it is realistic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]
ConeyLiu commented on code in PR #9233: URL: https://github.com/apache/iceberg/pull/9233#discussion_r1419107868 ## spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala: ## @@ -39,22 +41,31 @@ import org.apache.spark.sql.types.StructType */ object ReplaceStaticInvoke extends Rule[LogicalPlan] { - override def apply(plan: LogicalPlan): LogicalPlan = -plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, FILTER)) { - case filter @ Filter(condition, _) => -val newCondition = condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) { - case c @ BinaryComparison(left: StaticInvoke, right) if canReplace(left) && right.foldable => -c.withNewChildren(Seq(replaceStaticInvoke(left), right)) + private val rule:PartialFunction[Expression, Expression] = { +case c@BinaryComparison(left: StaticInvoke, right) if canReplace(left) && right.foldable => + c.withNewChildren(Seq(replaceStaticInvoke(left), right)) - case c @ BinaryComparison(left, right: StaticInvoke) if canReplace(right) && left.foldable => -c.withNewChildren(Seq(left, replaceStaticInvoke(right))) -} +case c@BinaryComparison(left, right: StaticInvoke) if canReplace(right) && left.foldable => + c.withNewChildren(Seq(left, replaceStaticInvoke(right))) + } + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformWithPruning(_.containsAnyPattern(FILTER, JOIN)) { + case filter @ Filter(condition, _) => +val newCondition = condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON))(rule) if (newCondition fastEquals condition) { filter } else { filter.copy(condition = newCondition) } + case j @ Join(_, _, _, Some(condition), _) => Review Comment: Here the join condition can be pushed to the leaf node by the Spark optimizer, right? I think this can not cover the COW/MOR cases. COW/MOR needs to do some special handling here. I plan to do it, however, I've been quite busy lately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] maint(transforms): replace `type()` calls with `isinstance()` [iceberg-python]
Fokko merged PR #188: URL: https://github.com/apache/iceberg-python/pull/188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] maint(transforms): replace `type()` calls with `isinstance()` [iceberg-python]
Fokko commented on PR #188: URL: https://github.com/apache/iceberg-python/pull/188#issuecomment-1845509371 Thanks for fixing this @jayceslesar 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Support views for NessieCatalog [iceberg]
nastra commented on code in PR #8909: URL: https://github.com/apache/iceberg/pull/8909#discussion_r1419122432 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java: ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.nessie; + +import org.apache.iceberg.exceptions.AlreadyExistsException; +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.apache.iceberg.view.ViewMetadataParser; +import org.projectnessie.client.http.HttpClientException; +import org.projectnessie.error.NessieBadRequestException; +import org.projectnessie.error.NessieConflictException; +import org.projectnessie.error.NessieNotFoundException; +import org.projectnessie.model.Content; +import org.projectnessie.model.ContentKey; +import org.projectnessie.model.IcebergTable; +import org.projectnessie.model.IcebergView; +import org.projectnessie.model.Reference; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class NessieViewOperations extends BaseViewOperations { + + private static final Logger LOG = LoggerFactory.getLogger(NessieViewOperations.class); + + private final NessieIcebergClient client; + private final ContentKey key; + private final FileIO fileIO; + private IcebergView icebergView; + + NessieViewOperations(ContentKey key, NessieIcebergClient client, FileIO fileIO) { +this.key = key; +this.client = client; +this.fileIO = fileIO; + } + + @Override + public void doRefresh() { +try { + client.refresh(); +} catch (NessieNotFoundException e) { + throw new RuntimeException( + String.format( + "Failed to refresh as ref '%s' is no longer valid.", client.getRef().getName()), + e); +} +String metadataLocation = null; +Reference reference = client.getRef().getReference(); +try { + Content content = client.getApi().getContent().key(key).reference(reference).get().get(key); + LOG.debug("Content '{}' at '{}': {}", key, reference, content); + if (content == null) { +if (currentMetadataLocation() != null) { + throw new NoSuchViewException("View does not exist: %s in %s", key, reference); +} + } else { +this.icebergView = +content +.unwrap(IcebergView.class) +.orElseThrow( +() -> { + if (content instanceof IcebergTable) { +return new AlreadyExistsException( +"Table with same name already exists: %s in %s", key, reference); + } else { +return new AlreadyExistsException( +"Cannot refresh Iceberg view: Nessie points to a non-Iceberg object for path: %s in %s", +key, reference); + } +}); +metadataLocation = icebergView.getMetadataLocation(); + } +} catch (NessieNotFoundException ex) { + if (currentMetadataLocation() != null) { +throw new NoSuchViewException("View does not exist: %s in %s", key, reference); + } +} +refreshFromMetadataLocation( +metadataLocation, +null, +2, +location -> +NessieUtil.loadViewMetadata( +ViewMetadataParser.read(io().newInputFile(location)), location, reference)); + } + + @Override + public void doCommit(ViewMetadata base, ViewMetadata metadata) { +String newMetadataLocation = writeNewMetadataIfRequired(metadata); + +boolean failure = false; +try { + String contentId = icebergView == null ? null : icebergView.getId(); + client.commitView(base, metadata, newMetadataLocation, contentId, key); +} catch (NessieConflictException | NessieNotFoundException | HttpClientException ex) { + if (ex instanceof NessieConflictException || ex instanceof NessieNotFoundException) { +failure = true; + } + NessieUtil.handleExceptionsForCommits(ex, client.ref
Re: [I] The "Status" paragraph in the readme seems very outdated [iceberg]
bitsondatadev commented on issue #9127: URL: https://github.com/apache/iceberg/issues/9127#issuecomment-1845537180 @ronkorving Let me know if you'd like to raise the PR, or I can do it as well. For now pointing to the roadmap page will be the best. Eventually, I'd like there to be something less static that exists directly in GitHub for this. A main project board or possibly a wiki with links to individual projects. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Optimize manifest evaluation for super wide tables [iceberg]
irshadcc commented on PR #9147: URL: https://github.com/apache/iceberg/pull/9147#issuecomment-1845560669 > Thanks for raising this @irshadcc, this looks good to me. I've left two small comments, could you take a peek at those? Thanks for fixing this! 🙌 I've added the Javadoc and removed the empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Support parameterized tests at class-level with JUnit5 [iceberg]
nastra commented on code in PR #9161: URL: https://github.com/apache/iceberg/pull/9161#discussion_r1419205681 ## api/src/test/java/org/apache/iceberg/ParameterizedTestExtension.java: ## @@ -0,0 +1,245 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import java.lang.reflect.Field; +import java.lang.reflect.Method; +import java.text.MessageFormat; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.List; +import java.util.function.Function; +import java.util.stream.Stream; +import org.assertj.core.util.Preconditions; +import org.junit.jupiter.api.TestTemplate; +import org.junit.jupiter.api.extension.BeforeEachCallback; +import org.junit.jupiter.api.extension.Extension; +import org.junit.jupiter.api.extension.ExtensionContext; +import org.junit.jupiter.api.extension.ParameterContext; +import org.junit.jupiter.api.extension.ParameterResolutionException; +import org.junit.jupiter.api.extension.ParameterResolver; +import org.junit.jupiter.api.extension.TestTemplateInvocationContext; +import org.junit.jupiter.api.extension.TestTemplateInvocationContextProvider; +import org.junit.platform.commons.support.AnnotationSupport; +import org.junit.platform.commons.support.HierarchyTraversalMode; + +/** + * This extension is used to implement parameterized tests for Junit 5 to replace Parameterized in + * Junit4. + * + * When use this extension, all tests must be annotated by {@link TestTemplate}. Review Comment: just an FYI that we need to make sure to mention where this was taken from and properly update the `LICENSE` (similarly to how it was done in #8366) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]
cccs-jc commented on PR #8980: URL: https://github.com/apache/iceberg/pull/8980#issuecomment-1845659813 > @cccs-jc i mean let's have changes for 3.5 with it's test only in 3.5 and we can backport the change with it's test in lower spark version like 3.4 and 3.3, 3.4 test failures are expected right as we don't have changes for SparkMicrobatch stream for 3.4 in it. > > Also i would request to revert the change in core for Microbatch.java if we don't have coverage for it as i am unsure when would that fail (may be some legacy handling) > > Apologies for getting being late in getting back at this. Keeping the `+ existingFilesCount();` in the SparkMicrobatch.java makes no sense to me. What is the purpose of adding that to the currentFileIndex ? The way I understand it currentFileIndex is a position of the added files. So we want to only count the added files (addedFilesCount()). These are the files that you want a streaming job to consume. Can you explain what is the purpose of using `existingFilesCount` here ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]
tmnd1991 commented on code in PR #9233: URL: https://github.com/apache/iceberg/pull/9233#discussion_r1419309041 ## spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala: ## @@ -39,22 +41,31 @@ import org.apache.spark.sql.types.StructType */ object ReplaceStaticInvoke extends Rule[LogicalPlan] { - override def apply(plan: LogicalPlan): LogicalPlan = -plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, FILTER)) { - case filter @ Filter(condition, _) => -val newCondition = condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) { - case c @ BinaryComparison(left: StaticInvoke, right) if canReplace(left) && right.foldable => -c.withNewChildren(Seq(replaceStaticInvoke(left), right)) + private val rule:PartialFunction[Expression, Expression] = { +case c@BinaryComparison(left: StaticInvoke, right) if canReplace(left) && right.foldable => + c.withNewChildren(Seq(replaceStaticInvoke(left), right)) - case c @ BinaryComparison(left, right: StaticInvoke) if canReplace(right) && left.foldable => -c.withNewChildren(Seq(left, replaceStaticInvoke(right))) -} +case c@BinaryComparison(left, right: StaticInvoke) if canReplace(right) && left.foldable => + c.withNewChildren(Seq(left, replaceStaticInvoke(right))) + } + override def apply(plan: LogicalPlan): LogicalPlan = { +plan.transformWithPruning(_.containsAnyPattern(FILTER, JOIN)) { + case filter @ Filter(condition, _) => +val newCondition = condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON))(rule) if (newCondition fastEquals condition) { filter } else { filter.copy(condition = newCondition) } + case j @ Join(_, _, _, Some(condition), _) => Review Comment: I discovered the bug working with a MERGE statement and actually this works both with CoW and MoR, I have it running on my cluster like that, and it's correctly pruning all the partitions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]
tmnd1991 commented on code in PR #9192: URL: https://github.com/apache/iceberg/pull/9192#discussion_r1419314263 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceStaticInvoke.scala: ## @@ -40,14 +37,20 @@ import org.apache.spark.sql.types.StructType object ReplaceStaticInvoke extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = -plan.transformWithPruning (_.containsAllPatterns(BINARY_COMPARISON, FILTER)) { +plan.transformWithPruning (_.containsPattern(FILTER)) { case filter @ Filter(condition, _) => -val newCondition = condition.transformWithPruning(_.containsPattern(BINARY_COMPARISON)) { +val newCondition = condition.transformWithPruning(_.containsAnyPattern(BINARY_COMPARISON, IN, INSET)) { case c @ BinaryComparison(left: StaticInvoke, right) if canReplace(left) && right.foldable => c.withNewChildren(Seq(replaceStaticInvoke(left), right)) case c @ BinaryComparison(left, right: StaticInvoke) if canReplace(right) && left.foldable => c.withNewChildren(Seq(left, replaceStaticInvoke(right))) + + case in @ In(s: StaticInvoke, _) => Review Comment: my bad, I fixed it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Switch to junit5 for mr [iceberg]
lschetanrao opened a new pull request, #9241: URL: https://github.com/apache/iceberg/pull/9241 ### Description This PR fixes https://github.com/apache/iceberg/issues/9083 The goal here is to switch all imports to Junit5 and to use AssertJ-style assertions. ### Implementation All test cases except the ones using parameterized tests have been migrated. I thought it is better to wait for https://github.com/apache/iceberg/pull/9161 to get merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]
tmnd1991 commented on PR #9192: URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1845761731 cc @nastra @dramaticlly @advancedxy for review thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark SystemFunctions are not pushed down during JOIN [iceberg]
tmnd1991 commented on PR #9233: URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1845762590 cc @nastra @dramaticlly @advancedxy for review thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org