Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
liurenjie1024 commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1462861172 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo +
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
nastra commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1462866365 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.jdbc; + +import java.sql.DataTruncation; +import java.sql.SQLException; +import java.sql.SQLIntegrityConstraintViolationException; +import java.sql.SQLNonTransientConnectionException; +import java.sql.SQLTimeoutException; +import java.sql.SQLTransientConnectionException; +import java.sql.SQLWarning; +import java.util.Map; +import java.util.Objects; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.AlreadyExistsException; +import org.apache.iceberg.exceptions.CommitFailedException; +import org.apache.iceberg.exceptions.NoSuchNamespaceException; +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.util.PropertyUtil; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** JDBC implementation of Iceberg ViewOperations. */ +public class JdbcViewOperations extends BaseViewOperations { + + private static final Logger LOG = LoggerFactory.getLogger(JdbcViewOperations.class); + private final String catalogName; + private final TableIdentifier viewIdentifier; + private final FileIO fileIO; + private final JdbcClientPool connections; + private final Map catalogProperties; + + protected JdbcViewOperations( + JdbcClientPool dbConnPool, + FileIO fileIO, + String catalogName, + TableIdentifier viewIdentifier, + Map catalogProperties) { +this.catalogName = catalogName; +this.viewIdentifier = viewIdentifier; +this.fileIO = fileIO; +this.connections = dbConnPool; +this.catalogProperties = catalogProperties; + } + + @Override + protected void doRefresh() { +Map view; + +try { + view = JdbcUtil.view(connections, catalogName, viewIdentifier); +} catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new UncheckedInterruptedException(e, "Interrupted during refresh"); +} catch (SQLException e) { + // SQL exception happened when getting view from catalog + throw new UncheckedSQLException( + e, "Failed to get view %s from catalog %s", viewIdentifier, catalogName); +} + +if (view.isEmpty()) { + if (currentMetadataLocation() != null) { +throw new NoSuchViewException("View does not exist: %s", viewIdentifier); + } else { +this.disableRefresh(); +return; + } +} + +String newMetadataLocation = view.get(JdbcTableOperations.METADATA_LOCATION_PROP); +Preconditions.checkState( +newMetadataLocation != null, "Invalid view %s: metadata location is null", viewIdentifier); +refreshFromMetadataLocation(newMetadataLocation); + } + + @Override + protected void doCommit(ViewMetadata base, ViewMetadata metadata) { +String newMetadataLocation = writeNewMetadataIfRequired(metadata); +try { + Map view = JdbcUtil.view(connections, catalogName, viewIdentifier); + if (base != null) { +validateMetadataLocation(view, base); +String oldMetadataLocation = base.metadataFileLocation(); +// Start atomic update +LOG.debug("Committing existing view: {}", viewName()); +updateView(newMetadataLocation, oldMetadataLocation); + } else { +// view not exists create it +LOG.debug("Committing new view: {}", viewName()); +createView(newMetadataLocation); + } + +} catch (SQLIntegrityConstraintViolationException e) { + if (currentMetadataLocation() == null) { +throw new AlreadyExistsException(e, "View already exists: %s", viewIdentifier); + } else { +throw new UncheckedSQLException(e, "View already exists: %s", viewIdentifier); + } + +} catch (SQLTimeou
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
nastra commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1462869217 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -303,7 +287,294 @@ public static Properties filterAndRemovePrefix(Map properties, S return result; } - public static String updatePropertiesStatement(int size) { + static String getSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(GET, tableName, namespace, name); + } + + static String getTableSql() { Review Comment: this one still has a `get` prefix. Maybe rename this to `tableSql()` or `sqlForTable()`, but the former probably better aligns with the other method naming -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]
nastra commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462880626 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -58,9 +61,11 @@ public class TestIcebergSourceContinuous { + public static final InMemoryReporter METRIC_REPORTER = InMemoryReporter.create(); + @ClassRule public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE = - MiniClusterResource.createWithClassloaderCheckDisabled(); + MiniClusterResource.createWithClassloaderCheckDisabled(METRIC_REPORTER); Review Comment: I haven't seen a PR that would include migrating this class here to JUnit5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]
tmnd1991 commented on PR #9192: URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1905614023 any chance to have this reviewed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Xuanwo commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1462719672 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo + +SVN i
Re: [I] Adopt `Catalog` API to include references to the `TableMetadata` and the `metadata_location` in the `TableCommit` payload for the `update_table` method [iceberg-rust]
JanKaul closed issue #75: Adopt `Catalog` API to include references to the `TableMetadata` and the `metadata_location` in the `TableCommit` payload for the `update_table` method URL: https://github.com/apache/iceberg-rust/issues/75 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]
nicor88 commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1905753459 @HonahX @mgmarino all good on my side, all worked with a new table and with the last commit 💯 great work @mgmarino ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Add UnionByName functionality [iceberg-python]
Fokko opened a new pull request, #296: URL: https://github.com/apache/iceberg-python/pull/296 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump pyarrow from 14.0.2 to 15.0.0 [iceberg-python]
Fokko merged PR #295: URL: https://github.com/apache/iceberg-python/pull/295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump pyspark from 3.4.2 to 3.5.0 [iceberg-python]
Fokko commented on PR #283: URL: https://github.com/apache/iceberg-python/pull/283#issuecomment-1906044198 @HonahX I was thinking of adding this to the `Dockerfile` through an `ENV` step -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Support partitioned writes [iceberg-python]
asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906088623 @jqin61 just wondering if we can use this directly https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463483616 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcViewCatalog.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.jdbc; + +import java.util.Map; +import java.util.UUID; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.CatalogProperties; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.view.ViewCatalogTests; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.io.TempDir; + +public class TestJdbcViewCatalog extends ViewCatalogTests { + + private JdbcCatalog catalog; + + @TempDir private java.nio.file.Path tableDir; + + @BeforeEach + public void before() { +Map properties = Maps.newHashMap(); +properties.put( +CatalogProperties.URI, +"jdbc:sqlite:file::memory:?ic" + UUID.randomUUID().toString().replace("-", "")); +properties.put(JdbcCatalog.PROPERTY_PREFIX + "username", "user"); +properties.put(JdbcCatalog.PROPERTY_PREFIX + "password", "password"); +properties.put(CatalogProperties.WAREHOUSE_LOCATION, tableDir.toAbsolutePath().toString()); + +catalog = new JdbcCatalog(); +catalog.setConf(new Configuration()); Review Comment: After double checking, Hadoop conf is used by `CatalogUtil.loadFileIO()` via `configureHadoopConf()`. I think `CatalogUtil` can check if Hadoop conf is not null before calling `configureHadoopConf()`. I propose to evaluate this in a separate PR (not in this one). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463495264 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.jdbc; + +import java.sql.DataTruncation; +import java.sql.SQLException; +import java.sql.SQLIntegrityConstraintViolationException; +import java.sql.SQLNonTransientConnectionException; +import java.sql.SQLTimeoutException; +import java.sql.SQLTransientConnectionException; +import java.sql.SQLWarning; +import java.util.Map; +import java.util.Objects; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.AlreadyExistsException; +import org.apache.iceberg.exceptions.CommitFailedException; +import org.apache.iceberg.exceptions.NoSuchNamespaceException; +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.util.PropertyUtil; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** JDBC implementation of Iceberg ViewOperations. */ +public class JdbcViewOperations extends BaseViewOperations { + + private static final Logger LOG = LoggerFactory.getLogger(JdbcViewOperations.class); + private final String catalogName; + private final TableIdentifier viewIdentifier; + private final FileIO fileIO; + private final JdbcClientPool connections; + private final Map catalogProperties; + + protected JdbcViewOperations( + JdbcClientPool dbConnPool, + FileIO fileIO, + String catalogName, + TableIdentifier viewIdentifier, + Map catalogProperties) { +this.catalogName = catalogName; +this.viewIdentifier = viewIdentifier; +this.fileIO = fileIO; +this.connections = dbConnPool; +this.catalogProperties = catalogProperties; + } + + @Override + protected void doRefresh() { +Map view; + +try { + view = JdbcUtil.view(connections, catalogName, viewIdentifier); +} catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new UncheckedInterruptedException(e, "Interrupted during refresh"); +} catch (SQLException e) { + // SQL exception happened when getting view from catalog + throw new UncheckedSQLException( + e, "Failed to get view %s from catalog %s", viewIdentifier, catalogName); +} + +if (view.isEmpty()) { + if (currentMetadataLocation() != null) { +throw new NoSuchViewException("View does not exist: %s", viewIdentifier); + } else { +this.disableRefresh(); +return; + } +} + +String newMetadataLocation = view.get(JdbcTableOperations.METADATA_LOCATION_PROP); +Preconditions.checkState( +newMetadataLocation != null, "Invalid view %s: metadata location is null", viewIdentifier); +refreshFromMetadataLocation(newMetadataLocation); + } + + @Override + protected void doCommit(ViewMetadata base, ViewMetadata metadata) { +String newMetadataLocation = writeNewMetadataIfRequired(metadata); +try { + Map view = JdbcUtil.view(connections, catalogName, viewIdentifier); + if (base != null) { +validateMetadataLocation(view, base); +String oldMetadataLocation = base.metadataFileLocation(); +// Start atomic update +LOG.debug("Committing existing view: {}", viewName()); +updateView(newMetadataLocation, oldMetadataLocation); + } else { +// view not exists create it +LOG.debug("Committing new view: {}", viewName()); +createView(newMetadataLocation); + } + +} catch (SQLIntegrityConstraintViolationException e) { + if (currentMetadataLocation() == null) { +throw new AlreadyExistsException(e, "View already exists: %s", viewIdentifier); + } else { +throw new UncheckedSQLException(e, "View already exists: %s", viewIdentifier); + } + +} catch (SQLTime
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
nastra commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463497746 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.jdbc; + +import java.sql.DataTruncation; +import java.sql.SQLException; +import java.sql.SQLIntegrityConstraintViolationException; +import java.sql.SQLNonTransientConnectionException; +import java.sql.SQLTimeoutException; +import java.sql.SQLTransientConnectionException; +import java.sql.SQLWarning; +import java.util.Map; +import java.util.Objects; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.AlreadyExistsException; +import org.apache.iceberg.exceptions.CommitFailedException; +import org.apache.iceberg.exceptions.NoSuchNamespaceException; +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.util.PropertyUtil; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** JDBC implementation of Iceberg ViewOperations. */ +public class JdbcViewOperations extends BaseViewOperations { + + private static final Logger LOG = LoggerFactory.getLogger(JdbcViewOperations.class); + private final String catalogName; + private final TableIdentifier viewIdentifier; + private final FileIO fileIO; + private final JdbcClientPool connections; + private final Map catalogProperties; + + protected JdbcViewOperations( + JdbcClientPool dbConnPool, + FileIO fileIO, + String catalogName, + TableIdentifier viewIdentifier, + Map catalogProperties) { +this.catalogName = catalogName; +this.viewIdentifier = viewIdentifier; +this.fileIO = fileIO; +this.connections = dbConnPool; +this.catalogProperties = catalogProperties; + } + + @Override + protected void doRefresh() { +Map view; + +try { + view = JdbcUtil.view(connections, catalogName, viewIdentifier); +} catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new UncheckedInterruptedException(e, "Interrupted during refresh"); +} catch (SQLException e) { + // SQL exception happened when getting view from catalog + throw new UncheckedSQLException( + e, "Failed to get view %s from catalog %s", viewIdentifier, catalogName); +} + +if (view.isEmpty()) { + if (currentMetadataLocation() != null) { +throw new NoSuchViewException("View does not exist: %s", viewIdentifier); + } else { +this.disableRefresh(); +return; + } +} + +String newMetadataLocation = view.get(JdbcTableOperations.METADATA_LOCATION_PROP); +Preconditions.checkState( +newMetadataLocation != null, "Invalid view %s: metadata location is null", viewIdentifier); +refreshFromMetadataLocation(newMetadataLocation); + } + + @Override + protected void doCommit(ViewMetadata base, ViewMetadata metadata) { +String newMetadataLocation = writeNewMetadataIfRequired(metadata); +try { + Map view = JdbcUtil.view(connections, catalogName, viewIdentifier); + if (base != null) { +validateMetadataLocation(view, base); +String oldMetadataLocation = base.metadataFileLocation(); +// Start atomic update +LOG.debug("Committing existing view: {}", viewName()); +updateView(newMetadataLocation, oldMetadataLocation); + } else { +// view not exists create it +LOG.debug("Committing new view: {}", viewName()); +createView(newMetadataLocation); + } + +} catch (SQLIntegrityConstraintViolationException e) { + if (currentMetadataLocation() == null) { +throw new AlreadyExistsException(e, "View already exists: %s", viewIdentifier); + } else { +throw new UncheckedSQLException(e, "View already exists: %s", viewIdentifier); + } + +} catch (SQLTimeou
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463527520 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.jdbc; + +import java.sql.DataTruncation; +import java.sql.SQLException; +import java.sql.SQLIntegrityConstraintViolationException; +import java.sql.SQLNonTransientConnectionException; +import java.sql.SQLTimeoutException; +import java.sql.SQLTransientConnectionException; +import java.sql.SQLWarning; +import java.util.Map; +import java.util.Objects; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.AlreadyExistsException; +import org.apache.iceberg.exceptions.CommitFailedException; +import org.apache.iceberg.exceptions.NoSuchNamespaceException; +import org.apache.iceberg.exceptions.NoSuchViewException; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.util.PropertyUtil; +import org.apache.iceberg.view.BaseViewOperations; +import org.apache.iceberg.view.ViewMetadata; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** JDBC implementation of Iceberg ViewOperations. */ +public class JdbcViewOperations extends BaseViewOperations { + + private static final Logger LOG = LoggerFactory.getLogger(JdbcViewOperations.class); + private final String catalogName; + private final TableIdentifier viewIdentifier; + private final FileIO fileIO; + private final JdbcClientPool connections; + private final Map catalogProperties; + + protected JdbcViewOperations( + JdbcClientPool dbConnPool, + FileIO fileIO, + String catalogName, + TableIdentifier viewIdentifier, + Map catalogProperties) { +this.catalogName = catalogName; +this.viewIdentifier = viewIdentifier; +this.fileIO = fileIO; +this.connections = dbConnPool; +this.catalogProperties = catalogProperties; + } + + @Override + protected void doRefresh() { +Map view; + +try { + view = JdbcUtil.view(connections, catalogName, viewIdentifier); +} catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new UncheckedInterruptedException(e, "Interrupted during refresh"); +} catch (SQLException e) { + // SQL exception happened when getting view from catalog + throw new UncheckedSQLException( + e, "Failed to get view %s from catalog %s", viewIdentifier, catalogName); +} + +if (view.isEmpty()) { + if (currentMetadataLocation() != null) { +throw new NoSuchViewException("View does not exist: %s", viewIdentifier); + } else { +this.disableRefresh(); +return; + } +} + +String newMetadataLocation = view.get(JdbcTableOperations.METADATA_LOCATION_PROP); +Preconditions.checkState( +newMetadataLocation != null, "Invalid view %s: metadata location is null", viewIdentifier); +refreshFromMetadataLocation(newMetadataLocation); + } + + @Override + protected void doCommit(ViewMetadata base, ViewMetadata metadata) { +String newMetadataLocation = writeNewMetadataIfRequired(metadata); +try { + Map view = JdbcUtil.view(connections, catalogName, viewIdentifier); + if (base != null) { +validateMetadataLocation(view, base); +String oldMetadataLocation = base.metadataFileLocation(); +// Start atomic update +LOG.debug("Committing existing view: {}", viewName()); +updateView(newMetadataLocation, oldMetadataLocation); + } else { +// view not exists create it +LOG.debug("Committing new view: {}", viewName()); +createView(newMetadataLocation); + } + +} catch (SQLIntegrityConstraintViolationException e) { + if (currentMetadataLocation() == null) { +throw new AlreadyExistsException(e, "View already exists: %s", viewIdentifier); + } else { +throw new UncheckedSQLException(e, "View already exists: %s", viewIdentifier); + } + +} catch (SQLTime
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
nastra commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463554981 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -303,7 +287,294 @@ public static Properties filterAndRemovePrefix(Map properties, S return result; } - public static String updatePropertiesStatement(int size) { + private static String getSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(GET, tableName, namespace, name); + } + + static String tableSql() { +return getSql(true); + } + + static String viewSql() { +return getSql(false); + } + + private static String namespaceSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(GET_NAMESPACE, namespace, tableName, namespace, namespace); + } + + static String tableNamespaceSql() { +return namespaceSql(true); + } + + static String viewNamespaceSql() { +return namespaceSql(false); + } + + private static String listNamespacesSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST_NAMESPACES, namespace, tableName, namespace); + } + + static String listTableNamespacesSql() { +return listNamespacesSql(true); + } + + static String listViewNamespacesSql() { +return listNamespacesSql(false); + } + + private static String listAllNamespacesSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST_ALL_NAMESPACES, namespace, tableName); + } + + static String listAllTableNamespacesSql() { +return listAllNamespacesSql(true); + } + + static String listAllViewNamespacesSql() { +return listAllNamespacesSql(false); + } + + private static String doCommitSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DO_COMMIT, tableName, namespace, name); + } + + private static String createCatalogSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(CREATE_CATALOG, tableName, namespace, name, namespace, name); + } + + static String createCatalogTableSql() { +return createCatalogSql(true); + } + + static String createCatalogViewSql() { +return createCatalogSql(false); + } + + private static String listSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST, tableName, namespace); + } + + static String listTablesSql() { +return listSql(true); + } + + static String listViewsSql() { +return listSql(false); + } + + private static String renameSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(RENAME, tableName, namespace, name, namespace, name); + } + + static String renameTableSql() { +return renameSql(true); + } + + static String renameViewSql() { +return renameSql(false); + } + + private static String dropSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DROP, tableName, namespace, name); + } + + static String dropTableSql() { +return dropSql(true); + } + + static String dropViewSql() { +return dropSql(false); + } + + private static String doCommitCreateSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DO_COMMIT_CREATE, tableName, namespace, name); + } + + private static int update( + boolean isTable, + JdbcClientPool connections, + String catalogName, + TableIdentifier identifier, + String newMetadataLocation, + String oldMetadataLocation) + throws SQLException, InterruptedException { +return connection
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
nastra commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463558073 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -303,7 +287,294 @@ public static Properties filterAndRemovePrefix(Map properties, S return result; } - public static String updatePropertiesStatement(int size) { + private static String getSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(GET, tableName, namespace, name); + } + + static String tableSql() { +return getSql(true); + } + + static String viewSql() { +return getSql(false); + } + + private static String namespaceSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(GET_NAMESPACE, namespace, tableName, namespace, namespace); + } + + static String tableNamespaceSql() { +return namespaceSql(true); + } + + static String viewNamespaceSql() { +return namespaceSql(false); + } + + private static String listNamespacesSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST_NAMESPACES, namespace, tableName, namespace); + } + + static String listTableNamespacesSql() { +return listNamespacesSql(true); + } + + static String listViewNamespacesSql() { +return listNamespacesSql(false); + } + + private static String listAllNamespacesSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST_ALL_NAMESPACES, namespace, tableName); + } + + static String listAllTableNamespacesSql() { +return listAllNamespacesSql(true); + } + + static String listAllViewNamespacesSql() { +return listAllNamespacesSql(false); + } + + private static String doCommitSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DO_COMMIT, tableName, namespace, name); + } + + private static String createCatalogSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(CREATE_CATALOG, tableName, namespace, name, namespace, name); + } + + static String createCatalogTableSql() { +return createCatalogSql(true); + } + + static String createCatalogViewSql() { +return createCatalogSql(false); + } + + private static String listSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST, tableName, namespace); + } + + static String listTablesSql() { +return listSql(true); + } + + static String listViewsSql() { +return listSql(false); + } + + private static String renameSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(RENAME, tableName, namespace, name, namespace, name); + } + + static String renameTableSql() { +return renameSql(true); + } + + static String renameViewSql() { +return renameSql(false); + } + + private static String dropSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DROP, tableName, namespace, name); + } + + static String dropTableSql() { +return dropSql(true); + } + + static String dropViewSql() { +return dropSql(false); + } + + private static String doCommitCreateSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DO_COMMIT_CREATE, tableName, namespace, name); + } + + private static int update( + boolean isTable, + JdbcClientPool connections, + String catalogName, + TableIdentifier identifier, + String newMetadataLocation, + String oldMetadataLocation) + throws SQLException, InterruptedException { +return connection
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463559622 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -303,7 +287,294 @@ public static Properties filterAndRemovePrefix(Map properties, S return result; } - public static String updatePropertiesStatement(int size) { + private static String getSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(GET, tableName, namespace, name); + } + + static String tableSql() { +return getSql(true); + } + + static String viewSql() { +return getSql(false); + } + + private static String namespaceSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(GET_NAMESPACE, namespace, tableName, namespace, namespace); + } + + static String tableNamespaceSql() { +return namespaceSql(true); + } + + static String viewNamespaceSql() { +return namespaceSql(false); + } + + private static String listNamespacesSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST_NAMESPACES, namespace, tableName, namespace); + } + + static String listTableNamespacesSql() { +return listNamespacesSql(true); + } + + static String listViewNamespacesSql() { +return listNamespacesSql(false); + } + + private static String listAllNamespacesSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST_ALL_NAMESPACES, namespace, tableName); + } + + static String listAllTableNamespacesSql() { +return listAllNamespacesSql(true); + } + + static String listAllViewNamespacesSql() { +return listAllNamespacesSql(false); + } + + private static String doCommitSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DO_COMMIT, tableName, namespace, name); + } + + private static String createCatalogSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(CREATE_CATALOG, tableName, namespace, name, namespace, name); + } + + static String createCatalogTableSql() { +return createCatalogSql(true); + } + + static String createCatalogViewSql() { +return createCatalogSql(false); + } + + private static String listSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +return String.format(LIST, tableName, namespace); + } + + static String listTablesSql() { +return listSql(true); + } + + static String listViewsSql() { +return listSql(false); + } + + private static String renameSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(RENAME, tableName, namespace, name, namespace, name); + } + + static String renameTableSql() { +return renameSql(true); + } + + static String renameViewSql() { +return renameSql(false); + } + + private static String dropSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DROP, tableName, namespace, name); + } + + static String dropTableSql() { +return dropSql(true); + } + + static String dropViewSql() { +return dropSql(false); + } + + private static String doCommitCreateSql(boolean isTable) { +String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME; +String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE; +String name = isTable ? TABLE_NAME : VIEW_NAME; +return String.format(DO_COMMIT_CREATE, tableName, namespace, name); + } + + private static int update( + boolean isTable, + JdbcClientPool connections, + String catalogName, + TableIdentifier identifier, + String newMetadataLocation, + String oldMetadataLocation) + throws SQLException, InterruptedException { +return connecti
Re: [PR] API: New API For sequential / streaming updates [iceberg]
rdblue commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1906444781 @jasonf20, explicitly setting the sequence number isn't safe. Sequence numbers are assigned when the client attempts to commit and must be updated if the client has to retry. You could make this work by writing separate manifest files for each interim commit and reassigning the sequence numbers for those in order, but that seems time consuming and would require large changes to the snapshot producer base class. Maybe we should catch up sometime to talk through what you're trying to accomplish. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906520489 In order to avoid any potential concurrency issue, I'm working on a new approach where both tables and views are store in the same JDBC Table, with a column indicating if it's a table or view. I should have this for tomorrow. The rest will be untouched, only the SQL statements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
ajantha-bhat commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906542388 > In order to avoid any potential concurrency issue, I'm working on a new approach where both tables and views are store in the same JDBC Table, with a column indicating if it's a table or view. I should have this for tomorrow. The rest will be untouched, only the SQL statements. Don't jump into chaining the code again. let us discuss and come to conclusion first. Can you please elaborate the potential problem with the current PR? As per me, the current implementation of using different namespace and different tables for iceberg table and view is fine. For example, `iceberg_namespace_properties` is maintained as a different JDBC table currently instead of storing them with the same JDBC table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906552439 I'm reworking a bit the SQL statement to have views and tables in the same JDBC table, especially to address name atomically between table and view. When we create a view, we check if a table with the same name doesn't already exist. With the current PR, we can have mismatch if table and view are created at the same table (the lock is at JDBC table level). By having table and view in the same JDBC table (and using a column to define if it's a view or table), we insure we have atomicity here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906555047 @ajantha-bhat the potential issue is about name/check between table and view, it's separated today. My concern is not about namespace but more concurrency on name between table and view. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
ajantha-bhat commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906559757 > I'm reworking a bit the SQL statement to have views and tables in the same JDBC table, especially to address name atomically between table and view. When we create a view, we check if a table with the same name doesn't already exist. With the current PR, we can have mismatch if table and view are created at the same table (the lock is at JDBC table level). By having table and view in the same JDBC table (and using a column to define if it's a view or table), we insure we have atomicity here. I see. You are talking about an edge case where user is creating the view and table concurrently with the same name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add view support for JDBC catalog [iceberg]
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906568528 @ajantha-bhat correct, corner case but possible 😸 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Adding Snowflake's public documentation [iceberg-docs]
Fokko merged PR #297: URL: https://github.com/apache/iceberg-docs/pull/297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] `schema_id` not incremented during schema evolution [iceberg-python]
kevinjqliu commented on issue #290: URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1906606223 Thanks for the explanation @HonahX The equality check (`==`) for `Schema` here is overloaded. Sometimes it's used to check whether two tables have the same structure, i.e. `tbl1.schema() == tbl2.schema()`. Other times it's used to check whether the two schemas are objects with the same fields, including `schema_id`, i.e. in [`test_base.py#L592-L618`](https://github.com/apache/iceberg-python/blob/main/tests/catalog/test_base.py#L592-L618). I think it could be a common foot gun to use `==` with the `schema_id` constructor. Such as, ``` assert given_table.schema() == Schema( NestedField(field_id=1, name="x", field_type=LongType(), required=True), NestedField(field_id=2, name="y", field_type=LongType(), required=True, doc="comment"), NestedField(field_id=3, name="z", field_type=LongType(), required=True), NestedField(field_id=4, name="new_column1", field_type=IntegerType(), required=False), schema_id=0, identifier_field_ids=[], ) ``` Looking at this code, I'd assume it's asserting that the `schema_id` must be `0` Couple options: 1. Changing the definition of `__eq__` for `Schema`. 2. Creating a helper function which compares `Schema` for all fields 3. Rely on the author to not use `schema_id` in the constructor and check for `schema_id` separately Option (1) is a big refactor and changes the assumption in a lot of places, i.e. in `update_schema()`. I'm leaning toward options (2) and/or (3) but no preference -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]
Fokko commented on issue #278: URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1906620668 Alright, I went to the source and talked with @danielcweeks and @rdblue. It looks like we made things more complicated than actually needed. So when reading and writing Parquet, we need to make sure that the IDs are aligned properly. When we are working with runtime data (`pa.Table`'s) then we match everything up based on names. I also discussed with Dan about adding Arrow types to the `create_table` statement, and he liked the idea, where I was a bit reluctant. But thinking of it, I think it makes sense since it will allow us to create Iceberg tables from a dataframe: ```python catalog = load_catalog() catalog.create_table('some.table', df=df) ``` And then: ```python # It will wire up the schema by name tbl.overwrite(df) ``` ```python # Should be quite easy with union by name: tbl.append(df, merge_schema=True) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add install extra for sqlite [iceberg-python]
Fokko commented on PR #297: URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906625437 @kevinjqliu Thanks for working on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]
syun64 commented on issue #278: URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1906647358 That makes sense @Fokko . Just to make sure we are on the same page, does the following approach align with your thoughts? We are proposing to update the create_table API to: ``` def create_table( self, identifier: Union[str, Identifier], schema: Union[Schema, pa.sSchema], location: Optional[str] = None, partition_spec: PartitionSpec = UNPARTITIONED_PARTITION_SPEC, sort_order: SortOrder = UNSORTED_SORT_ORDER, properties: Properties = EMPTY_DICT, ) -> Table: ... if isinstance(schema, pa.Schema): schema = pre_order_visit_pyarrow(schema, _ConvertToIcebergWithFreshIds()) ... # existing code ``` We will call the function like: ``` table: pa.Table catalog = load_catalog() catalog.create_table('some.table', schema=table.schema) ``` And use the previously proposed Visitor: https://github.com/syun64/iceberg-python/blob/preorder-fresh-schema/pyiceberg/io/pyarrow.py#L994 since [new_table_metadata](https://github.com/apache/iceberg-python/blob/46b25be424d1ef0f28778b0873c4d91bf117a2a7/pyiceberg/table/metadata.py#L399) has to take `field_id`ed Iceberg Schema as the input? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fix setting updated parquet compression property [iceberg]
rdblue commented on PR #9503: URL: https://github.com/apache/iceberg/pull/9503#issuecomment-1906654805 @amogh-jahagirdar, I'm not sure I agree with this change. It seems to me that this is the Parquet default regardless of whether the default write format is Avro or ORC. It's a format-specific property so it doesn't apply in those cases. And if the table switches to Parquet at write time or is reconfigured, we want to have the new default configuration set. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Purge support for Iceberg view [iceberg]
rdblue commented on issue #9433: URL: https://github.com/apache/iceberg/issues/9433#issuecomment-1906663747 I think cleaning up metadata files is a catalog concern, not something that should be exposed to users through a PURGE flag. A user is not prepared to make a good choice about whether they should use PURGE. The user would assume that the metadata is cleaned up automatically. And if the user doesn't know to use PURGE, someone needs to clean it up anyway. This is distinct from the table case, where a PURGE signals that the user needs to get rid of the data _now_ rather than potentially keeping it around (most likely because of PII or other compliance concerns). I don't think PURGE makes sense for views. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add install extra for sqlite [iceberg-python]
kevinjqliu commented on PR #297: URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906669155 @Fokko np. I had to rebase and regenerate the lock file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support creating views via SQL [iceberg]
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1463815754 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -59,6 +62,10 @@ case class ResolveViews(spark: SparkSession) extends Rule[LogicalPlan] with Look loadView(catalog, ident) .map(_ => ResolvedV2View(catalog.asViewCatalog, ident)) .getOrElse(u) + +case c@CreateIcebergView(ResolvedIdentifier(_: ViewCatalog, ident), _, _, _, _, query, _, _) => + ViewHelper.verifyTemporaryObjectsNotExists(false, Spark3Util.toV1TableIdentifier(ident), query, Seq.empty) Review Comment: This is a validation that needs to be done, but I don't think that calling the Spark version is a good choice. This coerces a multipart identifier to a v1 identifier, which is a bad idea because multi-part identifiers can't necessarily be converted. The method used here will throw an exception if the namespace has too many parts, which would break view creation in catalogs with nested namespaces. This also passes `Seq.empty` instead of a list of the temporary functions that are referenced, which isn't guaranteed to be correct. Last, checks like this one should be done in a validation rule, not in `ResolveViews`. I think this was originally in a validation rule. Why did that change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] InMemory Catalog [iceberg-python]
Fokko commented on issue #293: URL: https://github.com/apache/iceberg-python/issues/293#issuecomment-1906706092 @kevinjqliu Alright, that's fair, I just wanted to make sure that we considered the option before making the `InMemory` one public 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support creating views via SQL [iceberg]
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1463848914 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala: ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.iceberg.spark.Spark3Util +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.ViewAlreadyExistsException +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.connector.catalog.Identifier +import org.apache.spark.sql.connector.catalog.ViewCatalog +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.execution.CommandExecutionMode +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.MetadataBuilder +import org.apache.spark.sql.util.SchemaUtils +import scala.collection.JavaConverters._ + + +case class CreateV2ViewExec( + catalog: ViewCatalog, + ident: Identifier, + originalText: String, + query: LogicalPlan, + userSpecifiedColumns: Seq[(String, Option[String])], + comment: Option[String], + properties: Map[String, String], + allowExisting: Boolean, + replace: Boolean) extends LeafV2CommandExec { + + override lazy val output: Seq[Attribute] = Nil + + override protected def run(): Seq[InternalRow] = { +val analyzedPlan = session.sessionState.executePlan(query, CommandExecutionMode.SKIP).analyzed +val identifier = Spark3Util.toV1TableIdentifier(ident) + +if (userSpecifiedColumns.nonEmpty) { Review Comment: I think this should be moved to a validation rule. That way we know the analyzer has finished and don't need to call it again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]
zeroshade commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1463940580 ## catalog/glue.go: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package catalog + +import ( + "context" + "errors" + "fmt" + + "github.com/apache/iceberg-go/io" + "github.com/apache/iceberg-go/table" + "github.com/aws/aws-sdk-go-v2/aws" + "github.com/aws/aws-sdk-go-v2/service/glue" + "github.com/aws/aws-sdk-go-v2/service/glue/types" +) + +var ( + _ Catalog = (*GlueCatalog)(nil) +) + +type GlueAPI interface { + GetTable(ctx context.Context, params *glue.GetTableInput, optFns ...func(*glue.Options)) (*glue.GetTableOutput, error) + GetTables(ctx context.Context, params *glue.GetTablesInput, optFns ...func(*glue.Options)) (*glue.GetTablesOutput, error) +} + +type GlueCatalog struct { + glueSvc GlueAPI +} + +func NewGlueCatalog(awscfg aws.Config) *GlueCatalog { + return &GlueCatalog{ + glueSvc: glue.NewFromConfig(awscfg), + } +} Review Comment: > As an integration point, I think it would be better to provide a set of functional options and strip back the abstraction to specific create methods for each catalogue as it is more typical of a Go API. Go tends to encourage a flatter API with fewer abstractions like these. That's fair and I'm fine with that. Perhaps in the future we'd add the more generic abstraction to help people convert from existing iceberg libraries but a per-catalog-type `New` functions will work for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Support partitioned writes [iceberg-python]
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906856252 > @jqin61 just wondering if we can use this directly https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html Thank you Ashish! I overlooked it, as you mention, we could just use write_dataset() with specified args of partitioning base_nametemplate to write out the partitioned datafiles as iceberg needs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Support partitioned writes [iceberg-python]
syun64 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906879564 > @Fokko @jqin61 I am also interested in this to move forward as we also deal with lot of write involves partitions. Happy to collaborate on to this. For write_dataset() we might need to look if we need to add field meta data at parquet in terms of field-id etc while committing data files. Yes - I think we learned this from our earlier attempts: https://github.com/apache/iceberg-python/pull/41/files/1398a2fb01341087a1334482db84a193843a2362#r1427302782 As @jqin61 pointed out in a previous PR, adding these to the schema should output parquet files with the correct field_id. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add install extra for sqlite [iceberg-python]
Fokko commented on PR #297: URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906902491 @kevinjqliu There is no upper bound constraint on Ray, and therefore it is bumped to the latest version, which is not compatible for some reason. I've created https://github.com/apache/iceberg-python/pull/298 to fix this for now. If you run revert the changes in `poetry.lock` and run `poetry lock --no-update` then it will not update Ray to the latest version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add install extra for sqlite [iceberg-python]
kevinjqliu commented on PR #297: URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906912995 @Fokko thanks, I just ran `poetry lock --no-update` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add install extra for sqlite [iceberg-python]
Fokko merged PR #297: URL: https://github.com/apache/iceberg-python/pull/297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add install extra for sqlite [iceberg-python]
Fokko commented on PR #297: URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906969525 Thanks @kevinjqliu for picking this up 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Spark: Fix aggregation pushdown on struct fields [iceberg]
amogh-jahagirdar commented on code in PR #9176: URL: https://github.com/apache/iceberg/pull/9176#discussion_r1464032789 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestAggregatePushDown.java: ## @@ -249,6 +250,78 @@ public void testAggregateNotPushDownIfOneCantPushDown() { assertEquals("expected and actual should equal", expected, actual); } + @Test + public void testAggregationPushdownStructInteger() { +testAggregationPushdownStruct( +2L, +3L, +2L, +"(id BIGINT, struct_with_int STRUCT)", +"struct_with_int.c1", +"(1, named_struct(\"c1\", NULL))", +"(2, named_struct(\"c1\", 2))", +"(3, named_struct(\"c1\", 3))"); + } + + @Test + public void testAggregationPushdownNestedStruct() { +testAggregationPushdownStruct( +2L, +3L, +2L, +"(id BIGINT, struct_with_int STRUCT>>>)", +"struct_with_int.c1.c2.c3.c4", +"(1, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", named_struct(\"c4\", NULL)", +"(2, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", named_struct(\"c4\", 2)", +"(3, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", named_struct(\"c4\", 3)"); + } + + @Test + public void testAggregationPushdownStructTimestamp() { +long timestamp = System.currentTimeMillis(); +long futureTimestamp = timestamp + 5000; +Timestamp expectedMax = new Timestamp(futureTimestamp / 1000 * 1000); +Timestamp expectedMin = new Timestamp(1000 * (timestamp / 1000)); +testAggregationPushdownStruct( +2L, +expectedMax, +expectedMin, +"(id BIGINT, struct_with_ts STRUCT)", +"struct_with_ts.c1", +"(1, named_struct(\"c1\", NULL))", +String.format( +"(2, named_struct(\"c1\", CAST(from_unixtime(%d/1000) AS TIMESTAMP)))", timestamp), +String.format( +"(3, named_struct(\"c1\", CAST(from_unixtime(%d/1000) AS TIMESTAMP)))", +timestamp + 5000)); + } + + private void testAggregationPushdownStruct( + Object expectedCount, + Object expectedMax, + Object expectedMin, + String schema, + String aggField, + String... rows) { +sql("CREATE TABLE %s %s USING iceberg", tableName, schema); +sql("INSERT INTO TABLE %s VALUES %s", tableName, String.join(",", rows)); +List actual = Review Comment: I should write an assertion that the plan actually contains the pushed down aggregates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Adds the ability to read from a branch on the Flink Iceberg Source [iceberg]
stevenzwu commented on code in PR #9547: URL: https://github.com/apache/iceberg/pull/9547#discussion_r1464034995 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/StreamingMonitorFunction.java: ## @@ -195,7 +192,10 @@ void monitorAndForwardSplits() { // Refresh the table to get the latest committed snapshot. table.refresh(); -Snapshot snapshot = table.currentSnapshot(); +Snapshot snapshot = Review Comment: I am wondering if we should change the default to `SnapshotRef.MAIN_BRANCH`? This way we can simplify the code here a bit. ``` public static final ConfigOption BRANCH = ConfigOptions.key("branch").stringType().defaultValue(null); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Adds the ability to read from a branch on the Flink Iceberg Source [iceberg]
stevenzwu commented on code in PR #9547: URL: https://github.com/apache/iceberg/pull/9547#discussion_r1464036002 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/StreamingMonitorFunction.java: ## @@ -195,7 +192,10 @@ void monitorAndForwardSplits() { // Refresh the table to get the latest committed snapshot. table.refresh(); -Snapshot snapshot = table.currentSnapshot(); +Snapshot snapshot = Review Comment: I also just found out `FlinkWriteOptions` has the default to main branch already. so we probably should make it consistent in `FlinkReadOptions` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Query optimization fails after upgrading to 1.4.0+ with nullif in predicate [iceberg]
singhpk234 commented on issue #9518: URL: https://github.com/apache/iceberg/issues/9518#issuecomment-1907096291 > Apologies, I edited out a quick test I had done that I actually ran on a cluster that is on EMR 6.14 so it's expected that my test would succeed instead of hit the reported error Sorry I didn't fully get this, you mean to say it can pass as well ? or is it consistently re-producable ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] AWS: Support setting description for Glue table [iceberg]
singhpk234 commented on code in PR #9530: URL: https://github.com/apache/iceberg/pull/9530#discussion_r1464132457 ## aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java: ## @@ -59,7 +60,7 @@ private IcebergToGlueConverter() {} private static final Pattern GLUE_DB_PATTERN = Pattern.compile("^[a-z0-9_]{1,252}$"); private static final Pattern GLUE_TABLE_PATTERN = Pattern.compile("^[a-z0-9_]{1,255}$"); public static final String GLUE_DB_LOCATION_KEY = "location"; - public static final String GLUE_DB_DESCRIPTION_KEY = "comment"; + public static final String GLUE_DESCRIPTION_KEY = "comment"; Review Comment: can you please add a comment that this is being used for both DB & TBL setting description ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Fix Parquet's fallback field id assignment [iceberg]
github-actions[bot] commented on issue #586: URL: https://github.com/apache/iceberg/issues/586#issuecomment-1907128245 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Don't use internal Spark classes to migrate existing tables to Iceberg [iceberg]
github-actions[bot] commented on issue #595: URL: https://github.com/apache/iceberg/issues/595#issuecomment-1907128274 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Broadcast Join Failure [iceberg]
github-actions[bot] commented on issue #621: URL: https://github.com/apache/iceberg/issues/621#issuecomment-1907128297 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Views on top of Iceberg tables [iceberg]
github-actions[bot] commented on issue #644: URL: https://github.com/apache/iceberg/issues/644#issuecomment-1907128329 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Fix Filter and Projection unit tests on vectorized-read branch [iceberg]
github-actions[bot] commented on issue #660: URL: https://github.com/apache/iceberg/issues/660#issuecomment-1907128385 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Use uncompressed page size to set initial size for Arrow data buffers [iceberg]
github-actions[bot] commented on issue #661: URL: https://github.com/apache/iceberg/issues/661#issuecomment-1907128408 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Prefetch parquet data pages [iceberg]
github-actions[bot] commented on issue #647: URL: https://github.com/apache/iceberg/issues/647#issuecomment-1907128362 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add namespace support in Iceberg Catalog [iceberg]
github-actions[bot] commented on issue #672: URL: https://github.com/apache/iceberg/issues/672#issuecomment-1907128437 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support creating views via SQL [iceberg]
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464142918 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.iceberg.spark.Spark3Util +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.ViewAlreadyExistsException +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.connector.catalog.Identifier +import org.apache.spark.sql.connector.catalog.ViewCatalog +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.execution.CommandExecutionMode +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.MetadataBuilder +import org.apache.spark.sql.util.SchemaUtils +import scala.collection.JavaConverters._ + + +case class CreateV2ViewExec( + catalog: ViewCatalog, + ident: Identifier, + originalText: String, + query: LogicalPlan, + userSpecifiedColumns: Seq[(String, Option[String])], + comment: Option[String], + properties: Map[String, String], + allowExisting: Boolean, + replace: Boolean) extends LeafV2CommandExec { + + override lazy val output: Seq[Attribute] = Nil + + override protected def run(): Seq[InternalRow] = { +val qe = session.sessionState.executePlan(query, CommandExecutionMode.SKIP) +qe.assertAnalyzed() +val analyzedPlan = qe.analyzed + +val identifier = Spark3Util.toV1TableIdentifier(ident) + +if (userSpecifiedColumns.nonEmpty) { + if (userSpecifiedColumns.length > analyzedPlan.output.length) { +throw QueryCompilationErrors.cannotCreateViewNotEnoughColumnsError( + identifier, userSpecifiedColumns.map(_._1), analyzedPlan) + } else if (userSpecifiedColumns.length < analyzedPlan.output.length) { +throw QueryCompilationErrors.cannotCreateViewTooManyColumnsError( + identifier, userSpecifiedColumns.map(_._1), analyzedPlan) + } +} + +val queryColumnNames = analyzedPlan.schema.fieldNames +SchemaUtils.checkColumnNameDuplication(queryColumnNames, SQLConf.get.resolver) Review Comment: After looking into this a bit more, `query` does correspond to `originalText`. That's good because we don't have to call the parser separately and this ensures that the query can be analyzed correctly. We should keep this design for the logical plan. I think there are a few things that we can improve, though. Since the query plan is linked in like this, it will be automatically analyzed so this should already catch things like trying to project a missing column (by the way, we should have a test for that). It will also catch missing relations (tables and other views) but we probably need to handle resolution a bit differently --- we want view resolution to happen just like it would when loading a view. I think that means that we should call our own code from `createViewRelation` to rewrite identifiers. (Ideally, we could alias here as well, but it may need some special handling if `GetColumnByOrdinal` doesn't work with an undetermined type.) Rewriting identifiers needs to happen immediately so that Spark doesn't substitute any temporary views. I think this needs to be done _before_ `ResolveViews` so we should do it in `RewriteViewCommands`. To summarize: * `RewriteViewCommands` should call our code to rewrite identifiers in the parsed `query` (and alias columns, if possible) * If `RewriteViewCommands` can't apply the aliases, then we will need a rule that applies them once the `query` is `resolved` * `RewriteViewCommands` should also check for temporary view references and fail if any are found (and maybe temporary functions as well?) * Checking for the right number of columns should be done in the checker (validation phase) if it isn't
Re: [PR] Spark: Support creating views via SQL [iceberg]
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464144880 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/views/CreateIcebergView.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.catalyst.plans.logical.views + +import org.apache.spark.sql.catalyst.plans.logical.BinaryCommand +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan + +case class CreateIcebergView( Review Comment: I think this should have: * `child: LogicalPlan` * `queryText: String` (should not be optional) * `query: LogicalPlan` * `columnAliases: Seq[String]` (translate from Spark's weirdness in conversion) * `columnComments: Seq[Option[String]]` * `comment: String` * `properties: Map[String, String]` * `allowExisting: Boolean` * `replace: Boolean` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support creating views via SQL [iceberg]
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464145346 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala: ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.iceberg.spark.Spark3Util +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.ViewAlreadyExistsException +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.connector.catalog.Identifier +import org.apache.spark.sql.connector.catalog.ViewCatalog +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.execution.CommandExecutionMode +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.MetadataBuilder +import org.apache.spark.sql.util.SchemaUtils +import scala.collection.JavaConverters._ + + +case class CreateV2ViewExec( Review Comment: I think this should have: * `catalog: ViewCatalog` * `ident: Identifier` * `queryText: String` (no need for query at this point, not optional) * `schema: StructType` (from the logical plan's `query.schema`) * `columnAliases: Seq[String]` * `columnComments: Seq[Option[String]]` * `comment: String` * `properties: Map[String, String]` * `allowExisting: Boolean` * `replace: Boolean` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support creating views via SQL [iceberg]
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464145827 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala: ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.iceberg.spark.Spark3Util +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.ViewAlreadyExistsException +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.connector.catalog.Identifier +import org.apache.spark.sql.connector.catalog.ViewCatalog +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.execution.CommandExecutionMode +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.MetadataBuilder +import org.apache.spark.sql.util.SchemaUtils +import scala.collection.JavaConverters._ + + +case class CreateV2ViewExec( + catalog: ViewCatalog, + ident: Identifier, + originalText: String, + query: LogicalPlan, + userSpecifiedColumns: Seq[(String, Option[String])], + comment: Option[String], + properties: Map[String, String], + allowExisting: Boolean, + replace: Boolean) extends LeafV2CommandExec { + + override lazy val output: Seq[Attribute] = Nil + + override protected def run(): Seq[InternalRow] = { +val analyzedPlan = session.sessionState.executePlan(query, CommandExecutionMode.SKIP).analyzed +val identifier = Spark3Util.toV1TableIdentifier(ident) + +if (userSpecifiedColumns.nonEmpty) { + if (userSpecifiedColumns.length > analyzedPlan.output.length) { +throw QueryCompilationErrors.cannotCreateViewNotEnoughColumnsError( + identifier, userSpecifiedColumns.map(_._1), analyzedPlan) + } else if (userSpecifiedColumns.length < analyzedPlan.output.length) { +throw QueryCompilationErrors.cannotCreateViewTooManyColumnsError( + identifier, userSpecifiedColumns.map(_._1), analyzedPlan) + } +} + +val queryColumnNames = analyzedPlan.schema.fieldNames +SchemaUtils.checkColumnNameDuplication(queryColumnNames, SQLConf.get.resolver) + +val viewSchema = aliasPlan(analyzedPlan, userSpecifiedColumns).schema +val columnAliases = userSpecifiedColumns.map(_._1).toArray +val columnComments = userSpecifiedColumns.map(_._2.orNull).toArray + +val currentCatalogName = session.sessionState.catalogManager.currentCatalog.name +val currentCatalog = if (!catalog.name().equals(currentCatalogName)) currentCatalogName else null +val currentNamespace = session.sessionState.catalogManager.currentNamespace + +val engineVersion = "Spark " + org.apache.spark.SPARK_VERSION +val newProperties = properties ++ + comment.map(ViewCatalog.PROP_COMMENT -> _) + + (ViewCatalog.PROP_CREATE_ENGINE_VERSION -> engineVersion, +ViewCatalog.PROP_ENGINE_VERSION -> engineVersion) + +if (replace) { + // CREATE OR REPLACE VIEW + if (catalog.viewExists(ident)) { +catalog.dropView(ident) + } + // FIXME: replaceView API doesn't exist in Spark 3.5 + catalog.createView( +ident, +originalText, +currentCatalog, +currentNamespace, +viewSchema, +queryColumnNames, +columnAliases, +columnComments, +newProperties.asJava) +} else { + try { +// CREATE VIEW [IF NOT EXISTS] +catalog.createView( + ident, + originalText, + currentCatalog, + currentNamespace, + viewSchema, + queryColumnNames, + columnAliases, + columnComments, + newProperties.asJava) + } catch { +case _: ViewAlreadyExistsException if allowExisting => // Ignore + } +} + +Nil + } + + override def simpleString(maxFields: Int): String = { +s"CreateV2ViewExec: ${ident}" + } + + /** +
Re: [PR] Spark: Support creating views via SQL [iceberg]
rdblue commented on code in PR #9423: URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464146096 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -139,24 +139,9 @@ public void readFromMultipleViews() throws NoSuchTableException { insertRows(6); String viewName = "firstView"; String secondView = "secondView"; -String viewSQL = String.format("SELECT id FROM %s WHERE id <= 3", tableName); -String secondViewSQL = String.format("SELECT id FROM %s WHERE id > 3", tableName); -ViewCatalog viewCatalog = viewCatalog(); - -viewCatalog -.buildView(TableIdentifier.of(NAMESPACE, viewName)) -.withQuery("spark", viewSQL) -.withDefaultNamespace(NAMESPACE) -.withSchema(schema(viewSQL)) -.create(); - -viewCatalog -.buildView(TableIdentifier.of(NAMESPACE, secondView)) -.withQuery("spark", secondViewSQL) -.withDefaultNamespace(NAMESPACE) -.withSchema(schema(secondViewSQL)) -.create(); +sql("CREATE VIEW %s AS SELECT id FROM %s WHERE id <= 3", viewName, tableName); +sql("CREATE VIEW %s AS SELECT id FROM %s WHERE id > 3", secondView, tableName); Review Comment: I don't think this should be changed. I like that it was previously isolated so that we weren't relying on Spark for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]
wolfeidau commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1464148047 ## catalog/glue.go: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package catalog + +import ( + "context" + "errors" + "fmt" + + "github.com/apache/iceberg-go/io" + "github.com/apache/iceberg-go/table" + "github.com/aws/aws-sdk-go-v2/aws" + "github.com/aws/aws-sdk-go-v2/service/glue" + "github.com/aws/aws-sdk-go-v2/service/glue/types" +) + +var ( + _ Catalog = (*GlueCatalog)(nil) +) + +type GlueAPI interface { + GetTable(ctx context.Context, params *glue.GetTableInput, optFns ...func(*glue.Options)) (*glue.GetTableOutput, error) + GetTables(ctx context.Context, params *glue.GetTablesInput, optFns ...func(*glue.Options)) (*glue.GetTablesOutput, error) +} + +type GlueCatalog struct { + glueSvc GlueAPI +} + +func NewGlueCatalog(awscfg aws.Config) *GlueCatalog { + return &GlueCatalog{ + glueSvc: glue.NewFromConfig(awscfg), + } +} Review Comment: @zeroshade The current model feels like it is following the factory method pattern https://en.wikipedia.org/wiki/Factory_method_pattern, which is great pattern for certain use cases, but in this case there are lots of other things to consider when creating these catalogs, like identity, custom platform specific configuration ect. I prefer the Go model which although using common interfaces, encourages a more explicit creation for these things as this avoids "magic", which is important for reliable, deterministic operation of systems. You still have common configuration/metrics/logging components, however these are instantiated based on the New routine in that catalog implementation. Really it comes down to migrations being more of a considered action in most Go libraries, this ensures the configuration and security of these implementations is respected. For example accessing an S3 data store could require specific role configuration for authentication, endpoints and event hooks, which may differ from the information used between different tables. A developer can do all this with the AWS SDK config, then pass it in when creating an IO for that particular store, and this is clearly specified in their code. That said the operation of the catalog using a common interface is great, this i where other common components interact and operate so it is nice to have. Again as I said before, this is just something I have observed is a idiomatic in Go libraries, like the AWS SDK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]
wolfeidau commented on code in PR #51: URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1464148047 ## catalog/glue.go: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package catalog + +import ( + "context" + "errors" + "fmt" + + "github.com/apache/iceberg-go/io" + "github.com/apache/iceberg-go/table" + "github.com/aws/aws-sdk-go-v2/aws" + "github.com/aws/aws-sdk-go-v2/service/glue" + "github.com/aws/aws-sdk-go-v2/service/glue/types" +) + +var ( + _ Catalog = (*GlueCatalog)(nil) +) + +type GlueAPI interface { + GetTable(ctx context.Context, params *glue.GetTableInput, optFns ...func(*glue.Options)) (*glue.GetTableOutput, error) + GetTables(ctx context.Context, params *glue.GetTablesInput, optFns ...func(*glue.Options)) (*glue.GetTablesOutput, error) +} + +type GlueCatalog struct { + glueSvc GlueAPI +} + +func NewGlueCatalog(awscfg aws.Config) *GlueCatalog { + return &GlueCatalog{ + glueSvc: glue.NewFromConfig(awscfg), + } +} Review Comment: @zeroshade The current model feels like it is following the factory method pattern https://en.wikipedia.org/wiki/Factory_method_pattern, which is great pattern for certain use cases, but in this case there are lots of other things to consider when creating these catalogs, like identity, custom platform specific configuration ect. I prefer the Go model which although using common interfaces, encourages a more explicit creation for these things as this avoids "magic", which is important for reliable, deterministic operation of systems. You still have common configuration/metrics/logging components, however these are instantiated based on the New routine in that catalog implementation. Really it comes down to migrations being more of a considered action in most Go libraries, this ensures the configuration and security of these implementations is respected. For example accessing an S3 data store could require specific role configuration for authentication, endpoints and event hooks, which may differ from the information used between different tables. A developer can do all this with the AWS SDK config, then pass it in when creating an IO for that particular store, and this is clearly specified in their code. That said the operation of the catalog using a common interface is great, this i where other common components interact and operate so it is nice to have. Again as I said before, this is just something I have observed as idiomatic in Go libraries, like the AWS SDK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Query optimization fails after upgrading to 1.4.0+ with nullif in predicate [iceberg]
jakelong95 commented on issue #9518: URL: https://github.com/apache/iceberg/issues/9518#issuecomment-1907148175 The issue is reproducible on Iceberg 1.4.2 and Spark 3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]
mas-chen commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1464155134 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -58,9 +61,11 @@ public class TestIcebergSourceContinuous { + public static final InMemoryReporter METRIC_REPORTER = InMemoryReporter.create(); + @ClassRule public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE = - MiniClusterResource.createWithClassloaderCheckDisabled(); + MiniClusterResource.createWithClassloaderCheckDisabled(METRIC_REPORTER); Review Comment: @pvary @stevenzwu @nastra I'm fine with that and I can volunteer to migrate this class to JUnit5 after the PR is merged. I am limited in my bandwidth this week, so I can address it next week, while I would like these metrics to land in the upcoming iceberg release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]
stevenzwu commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1464178171 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -58,9 +61,11 @@ public class TestIcebergSourceContinuous { + public static final InMemoryReporter METRIC_REPORTER = InMemoryReporter.create(); + @ClassRule public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE = - MiniClusterResource.createWithClassloaderCheckDisabled(); + MiniClusterResource.createWithClassloaderCheckDisabled(METRIC_REPORTER); Review Comment: I agree with @mas-chen that the JUnit5 can be tackled as a separate PR. this PR doesn't add any new test classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: add multi-arg transform support [iceberg]
advancedxy commented on PR #8579: URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1907243141 @szehon-ho @aokolnychyi the `bucketV2` part is removed from this PR. Let me know if you have any more comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Lock Ray on <2.8.0 [iceberg-python]
HonahX commented on PR #298: URL: https://github.com/apache/iceberg-python/pull/298#issuecomment-1907409790 OOPS. Seems there is a conflict. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]
HonahX commented on code in PR #288: URL: https://github.com/apache/iceberg-python/pull/288#discussion_r1464342275 ## tests/catalog/integration_test_glue.py: ## @@ -279,6 +379,20 @@ def test_commit_table_update_schema( assert test_catalog._parse_metadata_version(table.metadata_location) == 0 assert original_table_metadata.current_schema_id == 0 +assert athena.get_query_results(f'SELECT * FROM "{database_name}"."{table_name}"') == [ Review Comment: > In other words, if the table was created and not queryable, then a table was not actually successfully created. I hadn't considered this perspective initially. But indeed, these are very sound reasoning! Let's maintain the current state of these tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]
pvary commented on code in PR #9524: URL: https://github.com/apache/iceberg/pull/9524#discussion_r1464351088 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java: ## @@ -58,9 +61,11 @@ public class TestIcebergSourceContinuous { + public static final InMemoryReporter METRIC_REPORTER = InMemoryReporter.create(); + @ClassRule public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE = - MiniClusterResource.createWithClassloaderCheckDisabled(); + MiniClusterResource.createWithClassloaderCheckDisabled(METRIC_REPORTER); Review Comment: @mas-chen: It seems more complicated this way (modify, migrate, remove), but I am mostly interested in the final result 😀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]
HonahX commented on code in PR #288: URL: https://github.com/apache/iceberg-python/pull/288#discussion_r1464346417 ## pyiceberg/catalog/glue.py: ## @@ -84,17 +110,97 @@ def _construct_parameters( return new_parameters +GLUE_PRIMITIVE_TYPES = { +BooleanType: "boolean", +IntegerType: "int", +LongType: "bigint", +FloatType: "float", +DoubleType: "double", +DateType: "date", +TimeType: "string", +StringType: "string", +UUIDType: "string", +TimestampType: "timestamp", +FixedType: "binary", +BinaryType: "binary", +} + + +class IcebergSchemaToGlueType(SchemaVisitor[str]): +def schema(self, schema: Schema, struct_result: str) -> str: +return struct_result + +def struct(self, struct: StructType, field_results: List[str]) -> str: +return f"struct<{','.join(field_results)}>" + +def field(self, field: NestedField, field_result: str) -> str: +return f"{field.name}:{field_result}" + +def list(self, list_type: ListType, element_result: str) -> str: +return f"array<{element_result}>" + +def map(self, map_type: MapType, key_result: str, value_result: str) -> str: +return f"map<{key_result},{value_result}>" + +def primitive(self, primitive: PrimitiveType) -> str: +if isinstance(primitive, DecimalType): +return f"decimal({primitive.precision},{primitive.scale})" +if (primitive_type := type(primitive)) not in GLUE_PRIMITIVE_TYPES: +raise ValueError(f"Unknown primitive type: {primitive}") +return GLUE_PRIMITIVE_TYPES[primitive_type] + + +def _to_columns(metadata: TableMetadata) -> List[ColumnTypeDef]: +results: Dict[str, ColumnTypeDef] = {} + +def _append_to_results(field: NestedField, is_current: bool) -> None: +if field.name in results: +return + +results[field.name] = cast( +ColumnTypeDef, +{ +"Name": field.name, +"Type": visit(field.field_type, IcebergSchemaToGlueType()), Review Comment: ```suggestion "Type": visit(field.field_type, _IcebergSchemaToGlueType()), ``` ## pyiceberg/catalog/glue.py: ## @@ -84,17 +110,97 @@ def _construct_parameters( return new_parameters +GLUE_PRIMITIVE_TYPES = { +BooleanType: "boolean", +IntegerType: "int", +LongType: "bigint", +FloatType: "float", +DoubleType: "double", +DateType: "date", +TimeType: "string", +StringType: "string", +UUIDType: "string", +TimestampType: "timestamp", +FixedType: "binary", +BinaryType: "binary", +} + + +class IcebergSchemaToGlueType(SchemaVisitor[str]): Review Comment: ```suggestion class _IcebergSchemaToGlueType(SchemaVisitor[str]): ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] `schema_id` not incremented during schema evolution [iceberg-python]
HonahX commented on issue #290: URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1907475405 @kevinjqliu Thanks for sharing these options. I think (3) is enough here since this is just in test. There are only few places which require checking equality of both fields and schema_id. If you think (2) is more helpful, you can add one in `conftest.py`. Do you want to include these in https://github.com/apache/iceberg-python/pull/289/ ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add install extra for sqlite [iceberg-python]
HonahX closed issue #285: Add install extra for sqlite URL: https://github.com/apache/iceberg-python/issues/285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]
mgmarino commented on PR #288: URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1907487756 > Overall LGTM! Thank you very much @mgmarino! Just adding one final request to make `IcebergSchemaToGlueType` internal by adding `_`. Sorry I missed that early Sure, np! > I've added this to 0.6.0 milestone to ensure that Athena users can query the table written by pyiceberg after releasing write support. 🎉 Thanks, @HonahX and @nicor88, for the input and reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.4, 3.5: Enable drop table with purge in tests [iceberg]
nastra merged PR #9548: URL: https://github.com/apache/iceberg/pull/9548 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Don't run CI's on unrelated changes [iceberg]
nastra merged PR #9526: URL: https://github.com/apache/iceberg/pull/9526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Spark: Fix aggregation pushdown on struct fields [iceberg]
nastra commented on code in PR #9176: URL: https://github.com/apache/iceberg/pull/9176#discussion_r1464437182 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestAggregatePushDown.java: ## @@ -249,6 +250,78 @@ public void testAggregateNotPushDownIfOneCantPushDown() { assertEquals("expected and actual should equal", expected, actual); } + @Test + public void testAggregationPushdownStructInteger() { +testAggregationPushdownStruct( +2L, +3L, +2L, +"(id BIGINT, struct_with_int STRUCT)", +"struct_with_int.c1", +"(1, named_struct(\"c1\", NULL))", +"(2, named_struct(\"c1\", 2))", +"(3, named_struct(\"c1\", 3))"); + } + + @Test + public void testAggregationPushdownNestedStruct() { +testAggregationPushdownStruct( +2L, +3L, +2L, +"(id BIGINT, struct_with_int STRUCT>>>)", +"struct_with_int.c1.c2.c3.c4", +"(1, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", named_struct(\"c4\", NULL)", +"(2, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", named_struct(\"c4\", 2)", +"(3, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", named_struct(\"c4\", 3)"); + } + + @Test + public void testAggregationPushdownStructTimestamp() { +long timestamp = System.currentTimeMillis(); +long futureTimestamp = timestamp + 5000; +Timestamp expectedMax = new Timestamp(futureTimestamp / 1000 * 1000); +Timestamp expectedMin = new Timestamp(1000 * (timestamp / 1000)); +testAggregationPushdownStruct( +2L, +expectedMax, +expectedMin, +"(id BIGINT, struct_with_ts STRUCT)", +"struct_with_ts.c1", +"(1, named_struct(\"c1\", NULL))", +String.format( +"(2, named_struct(\"c1\", CAST(from_unixtime(%d/1000) AS TIMESTAMP)))", timestamp), +String.format( +"(3, named_struct(\"c1\", CAST(from_unixtime(%d/1000) AS TIMESTAMP)))", +timestamp + 5000)); + } + + private void testAggregationPushdownStruct( + Object expectedCount, + Object expectedMax, + Object expectedMin, + String schema, + String aggField, + String... rows) { +sql("CREATE TABLE %s %s USING iceberg", tableName, schema); +sql("INSERT INTO TABLE %s VALUES %s", tableName, String.join(",", rows)); +List actual = +sql("SELECT COUNT(%s), MAX(%s), MIN(%s) FROM %s", aggField, aggField, aggField, tableName); +Object actualCount = actual.get(0)[0]; +Object actualMax = actual.get(0)[1]; +Object actualMin = actual.get(0)[2]; +Assertions.assertThat(actualCount) +.withFailMessage("Expected and actual count should equal") +.isEqualTo(expectedCount); +Assertions.assertThat(actualMax) +.withFailMessage("Expected and actual max should equal") +.isEqualTo(expectedMax); +Assertions.assertThat(actualMin) +.withFailMessage("Expected and actual min should equal") Review Comment: it's better to use `.as()` instead of `withFailMessage()` as otherwise we'd be losing the entire context why the assertion failed and what values actual/expected had -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Spark: Fix aggregation pushdown on struct fields [iceberg]
nastra commented on code in PR #9176: URL: https://github.com/apache/iceberg/pull/9176#discussion_r1464446258 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestAggregatePushDown.java: ## @@ -249,6 +250,78 @@ public void testAggregateNotPushDownIfOneCantPushDown() { assertEquals("expected and actual should equal", expected, actual); } + @Test + public void testAggregationPushdownStructInteger() { Review Comment: an alternative to having multiple methods that effectively call the same test code is to have a single test method that is parameterized -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org