date:20240123

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-23 Thread via GitHub



liurenjie1024 commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1462861172


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



nastra commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1462866365


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.jdbc;
+
+import java.sql.DataTruncation;
+import java.sql.SQLException;
+import java.sql.SQLIntegrityConstraintViolationException;
+import java.sql.SQLNonTransientConnectionException;
+import java.sql.SQLTimeoutException;
+import java.sql.SQLTransientConnectionException;
+import java.sql.SQLWarning;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.PropertyUtil;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** JDBC implementation of Iceberg ViewOperations. */
+public class JdbcViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(JdbcViewOperations.class);
+  private final String catalogName;
+  private final TableIdentifier viewIdentifier;
+  private final FileIO fileIO;
+  private final JdbcClientPool connections;
+  private final Map catalogProperties;
+
+  protected JdbcViewOperations(
+  JdbcClientPool dbConnPool,
+  FileIO fileIO,
+  String catalogName,
+  TableIdentifier viewIdentifier,
+  Map catalogProperties) {
+this.catalogName = catalogName;
+this.viewIdentifier = viewIdentifier;
+this.fileIO = fileIO;
+this.connections = dbConnPool;
+this.catalogProperties = catalogProperties;
+  }
+
+  @Override
+  protected void doRefresh() {
+Map view;
+
+try {
+  view = JdbcUtil.view(connections, catalogName, viewIdentifier);
+} catch (InterruptedException e) {
+  Thread.currentThread().interrupt();
+  throw new UncheckedInterruptedException(e, "Interrupted during refresh");
+} catch (SQLException e) {
+  // SQL exception happened when getting view from catalog
+  throw new UncheckedSQLException(
+  e, "Failed to get view %s from catalog %s", viewIdentifier, 
catalogName);
+}
+
+if (view.isEmpty()) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s", 
viewIdentifier);
+  } else {
+this.disableRefresh();
+return;
+  }
+}
+
+String newMetadataLocation = 
view.get(JdbcTableOperations.METADATA_LOCATION_PROP);
+Preconditions.checkState(
+newMetadataLocation != null, "Invalid view %s: metadata location is 
null", viewIdentifier);
+refreshFromMetadataLocation(newMetadataLocation);
+  }
+
+  @Override
+  protected void doCommit(ViewMetadata base, ViewMetadata metadata) {
+String newMetadataLocation = writeNewMetadataIfRequired(metadata);
+try {
+  Map view = JdbcUtil.view(connections, catalogName, 
viewIdentifier);
+  if (base != null) {
+validateMetadataLocation(view, base);
+String oldMetadataLocation = base.metadataFileLocation();
+// Start atomic update
+LOG.debug("Committing existing view: {}", viewName());
+updateView(newMetadataLocation, oldMetadataLocation);
+  } else {
+// view not exists create it
+LOG.debug("Committing new view: {}", viewName());
+createView(newMetadataLocation);
+  }
+
+} catch (SQLIntegrityConstraintViolationException e) {
+  if (currentMetadataLocation() == null) {
+throw new AlreadyExistsException(e, "View already exists: %s", 
viewIdentifier);
+  } else {
+throw new UncheckedSQLException(e, "View already exists: %s", 
viewIdentifier);
+  }
+
+} catch (SQLTimeou

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



nastra commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1462869217


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java:
##
@@ -303,7 +287,294 @@ public static Properties 
filterAndRemovePrefix(Map properties, S
 return result;
   }
 
-  public static String updatePropertiesStatement(int size) {
+  static String getSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(GET, tableName, namespace, name);
+  }
+
+  static String getTableSql() {

Review Comment:
   this one still has a `get` prefix. Maybe rename this to `tableSql()` or 
`sqlForTable()`, but the former probably better aligns with the other method 
naming



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-23 Thread via GitHub



nastra commented on code in PR #9524:
URL: https://github.com/apache/iceberg/pull/9524#discussion_r1462880626


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java:
##
@@ -58,9 +61,11 @@
 
 public class TestIcebergSourceContinuous {
 
+  public static final InMemoryReporter METRIC_REPORTER = 
InMemoryReporter.create();
+
   @ClassRule
   public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE =
-  MiniClusterResource.createWithClassloaderCheckDisabled();
+  MiniClusterResource.createWithClassloaderCheckDisabled(METRIC_REPORTER);

Review Comment:
   I haven't seen a PR that would include migrating this class here to JUnit5



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: IN clause on system function is not pushed down [iceberg]

2024-01-23 Thread via GitHub



tmnd1991 commented on PR #9192:
URL: https://github.com/apache/iceberg/pull/9192#issuecomment-1905614023

   any chance to have this reviewed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-23 Thread via GitHub



Xuanwo commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1462719672


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+
+SVN i

Re: [I] Adopt `Catalog` API to include references to the `TableMetadata` and the `metadata_location` in the `TableCommit` payload for the `update_table` method [iceberg-rust]

2024-01-23 Thread via GitHub



JanKaul closed issue #75: Adopt `Catalog` API to include references to the 
`TableMetadata` and the `metadata_location` in the `TableCommit` payload for 
the `update_table` method
URL: https://github.com/apache/iceberg-rust/issues/75


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-23 Thread via GitHub



nicor88 commented on PR #288:
URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1905753459

   @HonahX @mgmarino all good on my side, all worked with a new table and with 
the last commit 💯 
   great work @mgmarino !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Add UnionByName functionality [iceberg-python]

2024-01-23 Thread via GitHub



Fokko opened a new pull request, #296:
URL: https://github.com/apache/iceberg-python/pull/296

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump pyarrow from 14.0.2 to 15.0.0 [iceberg-python]

2024-01-23 Thread via GitHub



Fokko merged PR #295:
URL: https://github.com/apache/iceberg-python/pull/295


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump pyspark from 3.4.2 to 3.5.0 [iceberg-python]

2024-01-23 Thread via GitHub



Fokko commented on PR #283:
URL: https://github.com/apache/iceberg-python/pull/283#issuecomment-1906044198

   @HonahX I was thinking of adding this to the `Dockerfile` through an `ENV` 
step


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support partitioned writes [iceberg-python]

2024-01-23 Thread via GitHub



asheeshgarg commented on issue #208:
URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906088623

   @jqin61 just wondering if we can use this directly
   
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



jbonofre commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463483616


##
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcViewCatalog.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.jdbc;
+
+import java.util.Map;
+import java.util.UUID;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.CatalogProperties;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.view.ViewCatalogTests;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.io.TempDir;
+
+public class TestJdbcViewCatalog extends ViewCatalogTests {
+
+  private JdbcCatalog catalog;
+
+  @TempDir private java.nio.file.Path tableDir;
+
+  @BeforeEach
+  public void before() {
+Map properties = Maps.newHashMap();
+properties.put(
+CatalogProperties.URI,
+"jdbc:sqlite:file::memory:?ic" + 
UUID.randomUUID().toString().replace("-", ""));
+properties.put(JdbcCatalog.PROPERTY_PREFIX + "username", "user");
+properties.put(JdbcCatalog.PROPERTY_PREFIX + "password", "password");
+properties.put(CatalogProperties.WAREHOUSE_LOCATION, 
tableDir.toAbsolutePath().toString());
+
+catalog = new JdbcCatalog();
+catalog.setConf(new Configuration());

Review Comment:
   After double checking, Hadoop conf is used by `CatalogUtil.loadFileIO()` via 
`configureHadoopConf()`.  I think `CatalogUtil` can check if Hadoop conf is not 
null before calling `configureHadoopConf()`. I propose to evaluate this in a 
separate PR (not in this one).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



jbonofre commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463495264


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.jdbc;
+
+import java.sql.DataTruncation;
+import java.sql.SQLException;
+import java.sql.SQLIntegrityConstraintViolationException;
+import java.sql.SQLNonTransientConnectionException;
+import java.sql.SQLTimeoutException;
+import java.sql.SQLTransientConnectionException;
+import java.sql.SQLWarning;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.PropertyUtil;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** JDBC implementation of Iceberg ViewOperations. */
+public class JdbcViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(JdbcViewOperations.class);
+  private final String catalogName;
+  private final TableIdentifier viewIdentifier;
+  private final FileIO fileIO;
+  private final JdbcClientPool connections;
+  private final Map catalogProperties;
+
+  protected JdbcViewOperations(
+  JdbcClientPool dbConnPool,
+  FileIO fileIO,
+  String catalogName,
+  TableIdentifier viewIdentifier,
+  Map catalogProperties) {
+this.catalogName = catalogName;
+this.viewIdentifier = viewIdentifier;
+this.fileIO = fileIO;
+this.connections = dbConnPool;
+this.catalogProperties = catalogProperties;
+  }
+
+  @Override
+  protected void doRefresh() {
+Map view;
+
+try {
+  view = JdbcUtil.view(connections, catalogName, viewIdentifier);
+} catch (InterruptedException e) {
+  Thread.currentThread().interrupt();
+  throw new UncheckedInterruptedException(e, "Interrupted during refresh");
+} catch (SQLException e) {
+  // SQL exception happened when getting view from catalog
+  throw new UncheckedSQLException(
+  e, "Failed to get view %s from catalog %s", viewIdentifier, 
catalogName);
+}
+
+if (view.isEmpty()) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s", 
viewIdentifier);
+  } else {
+this.disableRefresh();
+return;
+  }
+}
+
+String newMetadataLocation = 
view.get(JdbcTableOperations.METADATA_LOCATION_PROP);
+Preconditions.checkState(
+newMetadataLocation != null, "Invalid view %s: metadata location is 
null", viewIdentifier);
+refreshFromMetadataLocation(newMetadataLocation);
+  }
+
+  @Override
+  protected void doCommit(ViewMetadata base, ViewMetadata metadata) {
+String newMetadataLocation = writeNewMetadataIfRequired(metadata);
+try {
+  Map view = JdbcUtil.view(connections, catalogName, 
viewIdentifier);
+  if (base != null) {
+validateMetadataLocation(view, base);
+String oldMetadataLocation = base.metadataFileLocation();
+// Start atomic update
+LOG.debug("Committing existing view: {}", viewName());
+updateView(newMetadataLocation, oldMetadataLocation);
+  } else {
+// view not exists create it
+LOG.debug("Committing new view: {}", viewName());
+createView(newMetadataLocation);
+  }
+
+} catch (SQLIntegrityConstraintViolationException e) {
+  if (currentMetadataLocation() == null) {
+throw new AlreadyExistsException(e, "View already exists: %s", 
viewIdentifier);
+  } else {
+throw new UncheckedSQLException(e, "View already exists: %s", 
viewIdentifier);
+  }
+
+} catch (SQLTime

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



nastra commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463497746


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.jdbc;
+
+import java.sql.DataTruncation;
+import java.sql.SQLException;
+import java.sql.SQLIntegrityConstraintViolationException;
+import java.sql.SQLNonTransientConnectionException;
+import java.sql.SQLTimeoutException;
+import java.sql.SQLTransientConnectionException;
+import java.sql.SQLWarning;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.PropertyUtil;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** JDBC implementation of Iceberg ViewOperations. */
+public class JdbcViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(JdbcViewOperations.class);
+  private final String catalogName;
+  private final TableIdentifier viewIdentifier;
+  private final FileIO fileIO;
+  private final JdbcClientPool connections;
+  private final Map catalogProperties;
+
+  protected JdbcViewOperations(
+  JdbcClientPool dbConnPool,
+  FileIO fileIO,
+  String catalogName,
+  TableIdentifier viewIdentifier,
+  Map catalogProperties) {
+this.catalogName = catalogName;
+this.viewIdentifier = viewIdentifier;
+this.fileIO = fileIO;
+this.connections = dbConnPool;
+this.catalogProperties = catalogProperties;
+  }
+
+  @Override
+  protected void doRefresh() {
+Map view;
+
+try {
+  view = JdbcUtil.view(connections, catalogName, viewIdentifier);
+} catch (InterruptedException e) {
+  Thread.currentThread().interrupt();
+  throw new UncheckedInterruptedException(e, "Interrupted during refresh");
+} catch (SQLException e) {
+  // SQL exception happened when getting view from catalog
+  throw new UncheckedSQLException(
+  e, "Failed to get view %s from catalog %s", viewIdentifier, 
catalogName);
+}
+
+if (view.isEmpty()) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s", 
viewIdentifier);
+  } else {
+this.disableRefresh();
+return;
+  }
+}
+
+String newMetadataLocation = 
view.get(JdbcTableOperations.METADATA_LOCATION_PROP);
+Preconditions.checkState(
+newMetadataLocation != null, "Invalid view %s: metadata location is 
null", viewIdentifier);
+refreshFromMetadataLocation(newMetadataLocation);
+  }
+
+  @Override
+  protected void doCommit(ViewMetadata base, ViewMetadata metadata) {
+String newMetadataLocation = writeNewMetadataIfRequired(metadata);
+try {
+  Map view = JdbcUtil.view(connections, catalogName, 
viewIdentifier);
+  if (base != null) {
+validateMetadataLocation(view, base);
+String oldMetadataLocation = base.metadataFileLocation();
+// Start atomic update
+LOG.debug("Committing existing view: {}", viewName());
+updateView(newMetadataLocation, oldMetadataLocation);
+  } else {
+// view not exists create it
+LOG.debug("Committing new view: {}", viewName());
+createView(newMetadataLocation);
+  }
+
+} catch (SQLIntegrityConstraintViolationException e) {
+  if (currentMetadataLocation() == null) {
+throw new AlreadyExistsException(e, "View already exists: %s", 
viewIdentifier);
+  } else {
+throw new UncheckedSQLException(e, "View already exists: %s", 
viewIdentifier);
+  }
+
+} catch (SQLTimeou

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



jbonofre commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463527520


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.jdbc;
+
+import java.sql.DataTruncation;
+import java.sql.SQLException;
+import java.sql.SQLIntegrityConstraintViolationException;
+import java.sql.SQLNonTransientConnectionException;
+import java.sql.SQLTimeoutException;
+import java.sql.SQLTransientConnectionException;
+import java.sql.SQLWarning;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.PropertyUtil;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** JDBC implementation of Iceberg ViewOperations. */
+public class JdbcViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(JdbcViewOperations.class);
+  private final String catalogName;
+  private final TableIdentifier viewIdentifier;
+  private final FileIO fileIO;
+  private final JdbcClientPool connections;
+  private final Map catalogProperties;
+
+  protected JdbcViewOperations(
+  JdbcClientPool dbConnPool,
+  FileIO fileIO,
+  String catalogName,
+  TableIdentifier viewIdentifier,
+  Map catalogProperties) {
+this.catalogName = catalogName;
+this.viewIdentifier = viewIdentifier;
+this.fileIO = fileIO;
+this.connections = dbConnPool;
+this.catalogProperties = catalogProperties;
+  }
+
+  @Override
+  protected void doRefresh() {
+Map view;
+
+try {
+  view = JdbcUtil.view(connections, catalogName, viewIdentifier);
+} catch (InterruptedException e) {
+  Thread.currentThread().interrupt();
+  throw new UncheckedInterruptedException(e, "Interrupted during refresh");
+} catch (SQLException e) {
+  // SQL exception happened when getting view from catalog
+  throw new UncheckedSQLException(
+  e, "Failed to get view %s from catalog %s", viewIdentifier, 
catalogName);
+}
+
+if (view.isEmpty()) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s", 
viewIdentifier);
+  } else {
+this.disableRefresh();
+return;
+  }
+}
+
+String newMetadataLocation = 
view.get(JdbcTableOperations.METADATA_LOCATION_PROP);
+Preconditions.checkState(
+newMetadataLocation != null, "Invalid view %s: metadata location is 
null", viewIdentifier);
+refreshFromMetadataLocation(newMetadataLocation);
+  }
+
+  @Override
+  protected void doCommit(ViewMetadata base, ViewMetadata metadata) {
+String newMetadataLocation = writeNewMetadataIfRequired(metadata);
+try {
+  Map view = JdbcUtil.view(connections, catalogName, 
viewIdentifier);
+  if (base != null) {
+validateMetadataLocation(view, base);
+String oldMetadataLocation = base.metadataFileLocation();
+// Start atomic update
+LOG.debug("Committing existing view: {}", viewName());
+updateView(newMetadataLocation, oldMetadataLocation);
+  } else {
+// view not exists create it
+LOG.debug("Committing new view: {}", viewName());
+createView(newMetadataLocation);
+  }
+
+} catch (SQLIntegrityConstraintViolationException e) {
+  if (currentMetadataLocation() == null) {
+throw new AlreadyExistsException(e, "View already exists: %s", 
viewIdentifier);
+  } else {
+throw new UncheckedSQLException(e, "View already exists: %s", 
viewIdentifier);
+  }
+
+} catch (SQLTime

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



nastra commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463554981


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java:
##
@@ -303,7 +287,294 @@ public static Properties 
filterAndRemovePrefix(Map properties, S
 return result;
   }
 
-  public static String updatePropertiesStatement(int size) {
+  private static String getSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(GET, tableName, namespace, name);
+  }
+
+  static String tableSql() {
+return getSql(true);
+  }
+
+  static String viewSql() {
+return getSql(false);
+  }
+
+  private static String namespaceSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(GET_NAMESPACE, namespace, tableName, namespace, 
namespace);
+  }
+
+  static String tableNamespaceSql() {
+return namespaceSql(true);
+  }
+
+  static String viewNamespaceSql() {
+return namespaceSql(false);
+  }
+
+  private static String listNamespacesSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST_NAMESPACES, namespace, tableName, namespace);
+  }
+
+  static String listTableNamespacesSql() {
+return listNamespacesSql(true);
+  }
+
+  static String listViewNamespacesSql() {
+return listNamespacesSql(false);
+  }
+
+  private static String listAllNamespacesSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST_ALL_NAMESPACES, namespace, tableName);
+  }
+
+  static String listAllTableNamespacesSql() {
+return listAllNamespacesSql(true);
+  }
+
+  static String listAllViewNamespacesSql() {
+return listAllNamespacesSql(false);
+  }
+
+  private static String doCommitSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DO_COMMIT, tableName, namespace, name);
+  }
+
+  private static String createCatalogSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(CREATE_CATALOG, tableName, namespace, name, 
namespace, name);
+  }
+
+  static String createCatalogTableSql() {
+return createCatalogSql(true);
+  }
+
+  static String createCatalogViewSql() {
+return createCatalogSql(false);
+  }
+
+  private static String listSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST, tableName, namespace);
+  }
+
+  static String listTablesSql() {
+return listSql(true);
+  }
+
+  static String listViewsSql() {
+return listSql(false);
+  }
+
+  private static String renameSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(RENAME, tableName, namespace, name, namespace, name);
+  }
+
+  static String renameTableSql() {
+return renameSql(true);
+  }
+
+  static String renameViewSql() {
+return renameSql(false);
+  }
+
+  private static String dropSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DROP, tableName, namespace, name);
+  }
+
+  static String dropTableSql() {
+return dropSql(true);
+  }
+
+  static String dropViewSql() {
+return dropSql(false);
+  }
+
+  private static String doCommitCreateSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DO_COMMIT_CREATE, tableName, namespace, name);
+  }
+
+  private static int update(
+  boolean isTable,
+  JdbcClientPool connections,
+  String catalogName,
+  TableIdentifier identifier,
+  String newMetadataLocation,
+  String oldMetadataLocation)
+  throws SQLException, InterruptedException {
+return connection

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



nastra commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463558073


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java:
##
@@ -303,7 +287,294 @@ public static Properties 
filterAndRemovePrefix(Map properties, S
 return result;
   }
 
-  public static String updatePropertiesStatement(int size) {
+  private static String getSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(GET, tableName, namespace, name);
+  }
+
+  static String tableSql() {
+return getSql(true);
+  }
+
+  static String viewSql() {
+return getSql(false);
+  }
+
+  private static String namespaceSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(GET_NAMESPACE, namespace, tableName, namespace, 
namespace);
+  }
+
+  static String tableNamespaceSql() {
+return namespaceSql(true);
+  }
+
+  static String viewNamespaceSql() {
+return namespaceSql(false);
+  }
+
+  private static String listNamespacesSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST_NAMESPACES, namespace, tableName, namespace);
+  }
+
+  static String listTableNamespacesSql() {
+return listNamespacesSql(true);
+  }
+
+  static String listViewNamespacesSql() {
+return listNamespacesSql(false);
+  }
+
+  private static String listAllNamespacesSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST_ALL_NAMESPACES, namespace, tableName);
+  }
+
+  static String listAllTableNamespacesSql() {
+return listAllNamespacesSql(true);
+  }
+
+  static String listAllViewNamespacesSql() {
+return listAllNamespacesSql(false);
+  }
+
+  private static String doCommitSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DO_COMMIT, tableName, namespace, name);
+  }
+
+  private static String createCatalogSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(CREATE_CATALOG, tableName, namespace, name, 
namespace, name);
+  }
+
+  static String createCatalogTableSql() {
+return createCatalogSql(true);
+  }
+
+  static String createCatalogViewSql() {
+return createCatalogSql(false);
+  }
+
+  private static String listSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST, tableName, namespace);
+  }
+
+  static String listTablesSql() {
+return listSql(true);
+  }
+
+  static String listViewsSql() {
+return listSql(false);
+  }
+
+  private static String renameSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(RENAME, tableName, namespace, name, namespace, name);
+  }
+
+  static String renameTableSql() {
+return renameSql(true);
+  }
+
+  static String renameViewSql() {
+return renameSql(false);
+  }
+
+  private static String dropSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DROP, tableName, namespace, name);
+  }
+
+  static String dropTableSql() {
+return dropSql(true);
+  }
+
+  static String dropViewSql() {
+return dropSql(false);
+  }
+
+  private static String doCommitCreateSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DO_COMMIT_CREATE, tableName, namespace, name);
+  }
+
+  private static int update(
+  boolean isTable,
+  JdbcClientPool connections,
+  String catalogName,
+  TableIdentifier identifier,
+  String newMetadataLocation,
+  String oldMetadataLocation)
+  throws SQLException, InterruptedException {
+return connection

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



jbonofre commented on code in PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#discussion_r1463559622


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java:
##
@@ -303,7 +287,294 @@ public static Properties 
filterAndRemovePrefix(Map properties, S
 return result;
   }
 
-  public static String updatePropertiesStatement(int size) {
+  private static String getSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(GET, tableName, namespace, name);
+  }
+
+  static String tableSql() {
+return getSql(true);
+  }
+
+  static String viewSql() {
+return getSql(false);
+  }
+
+  private static String namespaceSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(GET_NAMESPACE, namespace, tableName, namespace, 
namespace);
+  }
+
+  static String tableNamespaceSql() {
+return namespaceSql(true);
+  }
+
+  static String viewNamespaceSql() {
+return namespaceSql(false);
+  }
+
+  private static String listNamespacesSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST_NAMESPACES, namespace, tableName, namespace);
+  }
+
+  static String listTableNamespacesSql() {
+return listNamespacesSql(true);
+  }
+
+  static String listViewNamespacesSql() {
+return listNamespacesSql(false);
+  }
+
+  private static String listAllNamespacesSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST_ALL_NAMESPACES, namespace, tableName);
+  }
+
+  static String listAllTableNamespacesSql() {
+return listAllNamespacesSql(true);
+  }
+
+  static String listAllViewNamespacesSql() {
+return listAllNamespacesSql(false);
+  }
+
+  private static String doCommitSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DO_COMMIT, tableName, namespace, name);
+  }
+
+  private static String createCatalogSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(CREATE_CATALOG, tableName, namespace, name, 
namespace, name);
+  }
+
+  static String createCatalogTableSql() {
+return createCatalogSql(true);
+  }
+
+  static String createCatalogViewSql() {
+return createCatalogSql(false);
+  }
+
+  private static String listSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+return String.format(LIST, tableName, namespace);
+  }
+
+  static String listTablesSql() {
+return listSql(true);
+  }
+
+  static String listViewsSql() {
+return listSql(false);
+  }
+
+  private static String renameSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(RENAME, tableName, namespace, name, namespace, name);
+  }
+
+  static String renameTableSql() {
+return renameSql(true);
+  }
+
+  static String renameViewSql() {
+return renameSql(false);
+  }
+
+  private static String dropSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DROP, tableName, namespace, name);
+  }
+
+  static String dropTableSql() {
+return dropSql(true);
+  }
+
+  static String dropViewSql() {
+return dropSql(false);
+  }
+
+  private static String doCommitCreateSql(boolean isTable) {
+String tableName = isTable ? CATALOG_TABLE_NAME : CATALOG_VIEW_NAME;
+String namespace = isTable ? TABLE_NAMESPACE : VIEW_NAMESPACE;
+String name = isTable ? TABLE_NAME : VIEW_NAME;
+return String.format(DO_COMMIT_CREATE, tableName, namespace, name);
+  }
+
+  private static int update(
+  boolean isTable,
+  JdbcClientPool connections,
+  String catalogName,
+  TableIdentifier identifier,
+  String newMetadataLocation,
+  String oldMetadataLocation)
+  throws SQLException, InterruptedException {
+return connecti

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on PR #9323:
URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1906444781

   @jasonf20, explicitly setting the sequence number isn't safe. Sequence 
numbers are assigned when the client attempts to commit and must be updated if 
the client has to retry. You could make this work by writing separate manifest 
files for each interim commit and reassigning the sequence numbers for those in 
order, but that seems time consuming and would require large changes to the 
snapshot producer base class. Maybe we should catch up sometime to talk through 
what you're trying to accomplish.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



jbonofre commented on PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906520489

   In order to avoid any potential concurrency issue, I'm working on a new 
approach where both tables and views are store in the same JDBC Table, with a 
column indicating if it's a table or view. I should have this for tomorrow. The 
rest will be untouched, only the SQL statements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



ajantha-bhat commented on PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906542388

   > In order to avoid any potential concurrency issue, I'm working on a new 
approach where both tables and views are store in the same JDBC Table, with a 
column indicating if it's a table or view. I should have this for tomorrow. The 
rest will be untouched, only the SQL statements.
   
   Don't jump into chaining the code again. let us discuss and come to 
conclusion first. 
   
   Can you please elaborate the potential problem with the current PR? 
   As per me, the current implementation of using different namespace and 
different tables for iceberg table and view is fine. For example, 
`iceberg_namespace_properties` is maintained as a different JDBC table 
currently instead of storing them with the same JDBC table. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



jbonofre commented on PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906552439

   I'm reworking a bit the SQL statement to have views and tables in the same 
JDBC table, especially to address name atomically between table and view.
   When we create a view, we check if a table with the same name doesn't 
already exist. With the current PR, we can have mismatch if table and view are 
created at the same table (the lock is at JDBC table level). By having table 
and view in the same JDBC table (and using a column to define if it's a view or 
table), we insure we have atomicity here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



jbonofre commented on PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906555047

   @ajantha-bhat the potential issue is about name/check between table and 
view, it's separated today. My concern is not about namespace but more 
concurrency on name between table and view.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



ajantha-bhat commented on PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906559757

   > I'm reworking a bit the SQL statement to have views and tables in the same 
JDBC table, especially to address name atomically between table and view.
   When we create a view, we check if a table with the same name doesn't 
already exist. With the current PR, we can have mismatch if table and view are 
created at the same table (the lock is at JDBC table level). By having table 
and view in the same JDBC table (and using a column to define if it's a view or 
table), we insure we have atomicity here.
   
   I see. You are talking about an edge case where user is creating the view 
and table concurrently with the same name. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-01-23 Thread via GitHub



jbonofre commented on PR #9487:
URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1906568528

   @ajantha-bhat correct, corner case but possible 😸 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Adding Snowflake's public documentation [iceberg-docs]

2024-01-23 Thread via GitHub



Fokko merged PR #297:
URL: https://github.com/apache/iceberg-docs/pull/297


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] `schema_id` not incremented during schema evolution [iceberg-python]

2024-01-23 Thread via GitHub



kevinjqliu commented on issue #290:
URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1906606223

   Thanks for the explanation @HonahX  
   
   The equality check (`==`) for `Schema` here is overloaded. Sometimes it's 
used to check whether two tables have the same structure, i.e. `tbl1.schema() 
== tbl2.schema()`. Other times it's used to check whether the two schemas are 
objects with the same fields, including `schema_id`, i.e. in 
[`test_base.py#L592-L618`](https://github.com/apache/iceberg-python/blob/main/tests/catalog/test_base.py#L592-L618).
   
   I think it could be a common foot gun to use `==` with the `schema_id` 
constructor. Such as, 
   ```
   assert given_table.schema() == Schema(
   NestedField(field_id=1, name="x", field_type=LongType(), 
required=True),
   NestedField(field_id=2, name="y", field_type=LongType(), 
required=True, doc="comment"),
   NestedField(field_id=3, name="z", field_type=LongType(), 
required=True),
   NestedField(field_id=4, name="new_column1", 
field_type=IntegerType(), required=False),
   schema_id=0,
   identifier_field_ids=[],
   )
   ```
   Looking at this code, I'd assume it's asserting that the `schema_id` must be 
`0`
   
   Couple options: 
   1. Changing the definition of `__eq__` for `Schema`. 
   2. Creating a helper function which compares `Schema` for all fields
   3. Rely on the author to not use `schema_id` in the constructor and check 
for `schema_id` separately
   
   Option (1) is a big refactor and changes the assumption in a lot of places, 
i.e. in `update_schema()`.
   I'm leaning toward options (2) and/or (3) but no preference
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

2024-01-23 Thread via GitHub



Fokko commented on issue #278:
URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1906620668

   Alright, I went to the source and talked with @danielcweeks and @rdblue. It 
looks like we made things more complicated than actually needed.
   
   So when reading and writing Parquet, we need to make sure that the IDs are 
aligned properly. When we are working with runtime data (`pa.Table`'s) then we 
match everything up based on names.
   
   I also discussed with Dan about adding Arrow types to the `create_table` 
statement, and he liked the idea, where I was a bit reluctant. But thinking of 
it, I think it makes sense since it will allow us to create Iceberg tables from 
a dataframe:
   
   ```python
   catalog = load_catalog()
   catalog.create_table('some.table', df=df)
   ```
   
   And then:
   
   ```python
   # It will wire up the schema by name
   tbl.overwrite(df)
   ```
   
   ```python
   # Should be quite easy with union by name:
   tbl.append(df, merge_schema=True)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add install extra for sqlite [iceberg-python]

2024-01-23 Thread via GitHub



Fokko commented on PR #297:
URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906625437

   @kevinjqliu Thanks for working on this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

2024-01-23 Thread via GitHub



syun64 commented on issue #278:
URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1906647358

   That makes sense @Fokko .
   
   Just to make sure we are on the same page, does the following approach align 
with your thoughts?
   
   We are proposing to update the create_table API to:
   ```
   def create_table(
   self,
   identifier: Union[str, Identifier],
   schema: Union[Schema, pa.sSchema],
   location: Optional[str] = None,
   partition_spec: PartitionSpec = UNPARTITIONED_PARTITION_SPEC,
   sort_order: SortOrder = UNSORTED_SORT_ORDER,
   properties: Properties = EMPTY_DICT,
   ) -> Table:
   ...
   if isinstance(schema, pa.Schema):
   schema = pre_order_visit_pyarrow(schema, 
_ConvertToIcebergWithFreshIds())
   ...
   # existing code
   ```
   
   We will call the function like:
   ```
   table: pa.Table
   catalog = load_catalog()
   catalog.create_table('some.table', schema=table.schema)
   ```
   
   And use the previously proposed Visitor: 
https://github.com/syun64/iceberg-python/blob/preorder-fresh-schema/pyiceberg/io/pyarrow.py#L994
 since 
[new_table_metadata](https://github.com/apache/iceberg-python/blob/46b25be424d1ef0f28778b0873c4d91bf117a2a7/pyiceberg/table/metadata.py#L399)
 has to take `field_id`ed Iceberg Schema as the input?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Fix setting updated parquet compression property [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on PR #9503:
URL: https://github.com/apache/iceberg/pull/9503#issuecomment-1906654805

   @amogh-jahagirdar, I'm not sure I agree with this change. It seems to me 
that this is the Parquet default regardless of whether the default write format 
is Avro or ORC. It's a format-specific property so it doesn't apply in those 
cases. And if the table switches to Parquet at write time or is reconfigured, 
we want to have the new default configuration set.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Purge support for Iceberg view [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on issue #9433:
URL: https://github.com/apache/iceberg/issues/9433#issuecomment-1906663747

   I think cleaning up metadata files is a catalog concern, not something that 
should be exposed to users through a PURGE flag. A user is not prepared to make 
a good choice about whether they should use PURGE. The user would assume that 
the metadata is cleaned up automatically. And if the user doesn't know to use 
PURGE, someone needs to clean it up anyway. This is distinct from the table 
case, where a PURGE signals that the user needs to get rid of the data _now_ 
rather than potentially keeping it around (most likely because of PII or other 
compliance concerns).
   
   I don't think PURGE makes sense for views.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add install extra for sqlite [iceberg-python]

2024-01-23 Thread via GitHub



kevinjqliu commented on PR #297:
URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906669155

   @Fokko np. I had to rebase and regenerate the lock file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on code in PR #9423:
URL: https://github.com/apache/iceberg/pull/9423#discussion_r1463815754


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala:
##
@@ -59,6 +62,10 @@ case class ResolveViews(spark: SparkSession) extends 
Rule[LogicalPlan] with Look
   loadView(catalog, ident)
 .map(_ => ResolvedV2View(catalog.asViewCatalog, ident))
 .getOrElse(u)
+
+case c@CreateIcebergView(ResolvedIdentifier(_: ViewCatalog, ident), _, _, 
_, _, query, _, _) =>
+  ViewHelper.verifyTemporaryObjectsNotExists(false, 
Spark3Util.toV1TableIdentifier(ident), query, Seq.empty)

Review Comment:
   This is a validation that needs to be done, but I don't think that calling 
the Spark version is a good choice.
   
   This coerces a multipart identifier to a v1 identifier, which is a bad idea 
because multi-part identifiers can't necessarily be converted. The method used 
here will throw an exception if the namespace has too many parts, which would 
break view creation in catalogs with nested namespaces.
   
   This also passes `Seq.empty` instead of a list of the temporary functions 
that are referenced, which isn't guaranteed to be correct.
   
   Last, checks like this one should be done in a validation rule, not in 
`ResolveViews`. I think this was originally in a validation rule. Why did that 
change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] InMemory Catalog [iceberg-python]

2024-01-23 Thread via GitHub



Fokko commented on issue #293:
URL: https://github.com/apache/iceberg-python/issues/293#issuecomment-1906706092

   @kevinjqliu Alright, that's fair, I just wanted to make sure that we 
considered the option before making the `InMemory` one public 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on code in PR #9423:
URL: https://github.com/apache/iceberg/pull/9423#discussion_r1463848914


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala:
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.ViewAlreadyExistsException
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.ViewCatalog
+import org.apache.spark.sql.errors.QueryCompilationErrors
+import org.apache.spark.sql.execution.CommandExecutionMode
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.MetadataBuilder
+import org.apache.spark.sql.util.SchemaUtils
+import scala.collection.JavaConverters._
+
+
+case class CreateV2ViewExec(
+  catalog: ViewCatalog,
+  ident: Identifier,
+  originalText: String,
+  query: LogicalPlan,
+  userSpecifiedColumns: Seq[(String, Option[String])],
+  comment: Option[String],
+  properties: Map[String, String],
+  allowExisting: Boolean,
+  replace: Boolean) extends LeafV2CommandExec {
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+val analyzedPlan = session.sessionState.executePlan(query, 
CommandExecutionMode.SKIP).analyzed
+val identifier = Spark3Util.toV1TableIdentifier(ident)
+
+if (userSpecifiedColumns.nonEmpty) {

Review Comment:
   I think this should be moved to a validation rule. That way we know the 
analyzer has finished and don't need to call it again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-23 Thread via GitHub



zeroshade commented on code in PR #51:
URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1463940580


##
catalog/glue.go:
##
@@ -0,0 +1,162 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package catalog
+
+import (
+   "context"
+   "errors"
+   "fmt"
+
+   "github.com/apache/iceberg-go/io"
+   "github.com/apache/iceberg-go/table"
+   "github.com/aws/aws-sdk-go-v2/aws"
+   "github.com/aws/aws-sdk-go-v2/service/glue"
+   "github.com/aws/aws-sdk-go-v2/service/glue/types"
+)
+
+var (
+   _ Catalog = (*GlueCatalog)(nil)
+)
+
+type GlueAPI interface {
+   GetTable(ctx context.Context, params *glue.GetTableInput, optFns 
...func(*glue.Options)) (*glue.GetTableOutput, error)
+   GetTables(ctx context.Context, params *glue.GetTablesInput, optFns 
...func(*glue.Options)) (*glue.GetTablesOutput, error)
+}
+
+type GlueCatalog struct {
+   glueSvc GlueAPI
+}
+
+func NewGlueCatalog(awscfg aws.Config) *GlueCatalog {
+   return &GlueCatalog{
+   glueSvc: glue.NewFromConfig(awscfg),
+   }
+}

Review Comment:
   > As an integration point, I think it would be better to provide a set of 
functional options and strip back the abstraction to specific create methods 
for each catalogue as it is more typical of a Go API. Go tends to encourage a 
flatter API with fewer abstractions like these.
   
   That's fair and I'm fine with that. Perhaps in the future we'd add the more 
generic abstraction to help people convert from existing iceberg libraries but 
a per-catalog-type `New` functions will work for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support partitioned writes [iceberg-python]

2024-01-23 Thread via GitHub



jqin61 commented on issue #208:
URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906856252

   > @jqin61 just wondering if we can use this directly 
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html
   
   Thank you Ashish! I overlooked it, as you mention, we could just use 
write_dataset() with specified args of partitioning  base_nametemplate to write 
out the partitioned datafiles as iceberg needs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support partitioned writes [iceberg-python]

2024-01-23 Thread via GitHub



syun64 commented on issue #208:
URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1906879564

   > @Fokko @jqin61 I am also interested in this to move forward as we also 
deal with lot of write involves partitions. Happy to collaborate on to this. 
For write_dataset() we might need to look if we need to add field meta data at 
parquet in terms of field-id etc while committing data files.
   
   Yes - I think we learned this from our earlier attempts: 
https://github.com/apache/iceberg-python/pull/41/files/1398a2fb01341087a1334482db84a193843a2362#r1427302782
   
   As @jqin61 pointed out in a previous PR, adding these to the schema should 
output parquet files with the correct field_id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add install extra for sqlite [iceberg-python]

2024-01-23 Thread via GitHub



Fokko commented on PR #297:
URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906902491

   @kevinjqliu There is no upper bound constraint on Ray, and therefore it is 
bumped to the latest version, which is not compatible for some reason. I've 
created https://github.com/apache/iceberg-python/pull/298 to fix this for now. 
If you run revert the changes in `poetry.lock` and run `poetry lock 
--no-update` then it will not update Ray to the latest version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add install extra for sqlite [iceberg-python]

2024-01-23 Thread via GitHub



kevinjqliu commented on PR #297:
URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906912995

   @Fokko thanks, I just ran `poetry lock --no-update`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add install extra for sqlite [iceberg-python]

2024-01-23 Thread via GitHub



Fokko merged PR #297:
URL: https://github.com/apache/iceberg-python/pull/297


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add install extra for sqlite [iceberg-python]

2024-01-23 Thread via GitHub



Fokko commented on PR #297:
URL: https://github.com/apache/iceberg-python/pull/297#issuecomment-1906969525

   Thanks @kevinjqliu for picking this up 🙌 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Spark: Fix aggregation pushdown on struct fields [iceberg]

2024-01-23 Thread via GitHub



amogh-jahagirdar commented on code in PR #9176:
URL: https://github.com/apache/iceberg/pull/9176#discussion_r1464032789


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestAggregatePushDown.java:
##
@@ -249,6 +250,78 @@ public void testAggregateNotPushDownIfOneCantPushDown() {
 assertEquals("expected and actual should equal", expected, actual);
   }
 
+  @Test
+  public void testAggregationPushdownStructInteger() {
+testAggregationPushdownStruct(
+2L,
+3L,
+2L,
+"(id BIGINT, struct_with_int STRUCT)",
+"struct_with_int.c1",
+"(1, named_struct(\"c1\", NULL))",
+"(2, named_struct(\"c1\", 2))",
+"(3, named_struct(\"c1\", 3))");
+  }
+
+  @Test
+  public void testAggregationPushdownNestedStruct() {
+testAggregationPushdownStruct(
+2L,
+3L,
+2L,
+"(id BIGINT, struct_with_int 
STRUCT>>>)",
+"struct_with_int.c1.c2.c3.c4",
+"(1, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", 
named_struct(\"c4\", NULL)",
+"(2, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", 
named_struct(\"c4\", 2)",
+"(3, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", 
named_struct(\"c4\", 3)");
+  }
+
+  @Test
+  public void testAggregationPushdownStructTimestamp() {
+long timestamp = System.currentTimeMillis();
+long futureTimestamp = timestamp + 5000;
+Timestamp expectedMax = new Timestamp(futureTimestamp / 1000 * 1000);
+Timestamp expectedMin = new Timestamp(1000 * (timestamp / 1000));
+testAggregationPushdownStruct(
+2L,
+expectedMax,
+expectedMin,
+"(id BIGINT, struct_with_ts STRUCT)",
+"struct_with_ts.c1",
+"(1, named_struct(\"c1\", NULL))",
+String.format(
+"(2, named_struct(\"c1\", CAST(from_unixtime(%d/1000) AS 
TIMESTAMP)))", timestamp),
+String.format(
+"(3, named_struct(\"c1\", CAST(from_unixtime(%d/1000) AS 
TIMESTAMP)))",
+timestamp + 5000));
+  }
+
+  private void testAggregationPushdownStruct(
+  Object expectedCount,
+  Object expectedMax,
+  Object expectedMin,
+  String schema,
+  String aggField,
+  String... rows) {
+sql("CREATE TABLE %s %s USING iceberg", tableName, schema);
+sql("INSERT INTO TABLE %s VALUES %s", tableName, String.join(",", rows));
+List actual =

Review Comment:
   I should write an assertion that the plan actually contains the pushed down 
aggregates.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Adds the ability to read from a branch on the Flink Iceberg Source [iceberg]

2024-01-23 Thread via GitHub



stevenzwu commented on code in PR #9547:
URL: https://github.com/apache/iceberg/pull/9547#discussion_r1464034995


##
flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/StreamingMonitorFunction.java:
##
@@ -195,7 +192,10 @@ void monitorAndForwardSplits() {
 // Refresh the table to get the latest committed snapshot.
 table.refresh();
 
-Snapshot snapshot = table.currentSnapshot();
+Snapshot snapshot =

Review Comment:
   I am wondering if we should change the default to `SnapshotRef.MAIN_BRANCH`? 
This way we can simplify the code here a bit.
   
   ```
 public static final ConfigOption BRANCH =
 ConfigOptions.key("branch").stringType().defaultValue(null);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Adds the ability to read from a branch on the Flink Iceberg Source [iceberg]

2024-01-23 Thread via GitHub



stevenzwu commented on code in PR #9547:
URL: https://github.com/apache/iceberg/pull/9547#discussion_r1464036002


##
flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/StreamingMonitorFunction.java:
##
@@ -195,7 +192,10 @@ void monitorAndForwardSplits() {
 // Refresh the table to get the latest committed snapshot.
 table.refresh();
 
-Snapshot snapshot = table.currentSnapshot();
+Snapshot snapshot =

Review Comment:
   I also just found out `FlinkWriteOptions` has the default to main branch 
already. so we probably should make it consistent in `FlinkReadOptions`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Query optimization fails after upgrading to 1.4.0+ with nullif in predicate [iceberg]

2024-01-23 Thread via GitHub



singhpk234 commented on issue #9518:
URL: https://github.com/apache/iceberg/issues/9518#issuecomment-1907096291

   > Apologies, I edited out a quick test I had done that I actually ran on a 
cluster that is on EMR 6.14 so it's expected that my test would succeed instead 
of hit the reported error
   
   Sorry I didn't fully get this, you mean to say it can pass as well ? or is 
it consistently re-producable ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] AWS: Support setting description for Glue table [iceberg]

2024-01-23 Thread via GitHub



singhpk234 commented on code in PR #9530:
URL: https://github.com/apache/iceberg/pull/9530#discussion_r1464132457


##
aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java:
##
@@ -59,7 +60,7 @@ private IcebergToGlueConverter() {}
   private static final Pattern GLUE_DB_PATTERN = 
Pattern.compile("^[a-z0-9_]{1,252}$");
   private static final Pattern GLUE_TABLE_PATTERN = 
Pattern.compile("^[a-z0-9_]{1,255}$");
   public static final String GLUE_DB_LOCATION_KEY = "location";
-  public static final String GLUE_DB_DESCRIPTION_KEY = "comment";
+  public static final String GLUE_DESCRIPTION_KEY = "comment";

Review Comment:
   can you please add a comment that this is being used for both DB & TBL 
setting description ?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Fix Parquet's fallback field id assignment [iceberg]

2024-01-23 Thread via GitHub



github-actions[bot] commented on issue #586:
URL: https://github.com/apache/iceberg/issues/586#issuecomment-1907128245

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Don't use internal Spark classes to migrate existing tables to Iceberg [iceberg]

2024-01-23 Thread via GitHub



github-actions[bot] commented on issue #595:
URL: https://github.com/apache/iceberg/issues/595#issuecomment-1907128274

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Broadcast Join Failure [iceberg]

2024-01-23 Thread via GitHub



github-actions[bot] commented on issue #621:
URL: https://github.com/apache/iceberg/issues/621#issuecomment-1907128297

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Views on top of Iceberg tables [iceberg]

2024-01-23 Thread via GitHub



github-actions[bot] commented on issue #644:
URL: https://github.com/apache/iceberg/issues/644#issuecomment-1907128329

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Fix Filter and Projection unit tests on vectorized-read branch [iceberg]

2024-01-23 Thread via GitHub



github-actions[bot] commented on issue #660:
URL: https://github.com/apache/iceberg/issues/660#issuecomment-1907128385

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Use uncompressed page size to set initial size for Arrow data buffers [iceberg]

2024-01-23 Thread via GitHub



github-actions[bot] commented on issue #661:
URL: https://github.com/apache/iceberg/issues/661#issuecomment-1907128408

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Prefetch parquet data pages [iceberg]

2024-01-23 Thread via GitHub



github-actions[bot] commented on issue #647:
URL: https://github.com/apache/iceberg/issues/647#issuecomment-1907128362

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Add namespace support in Iceberg Catalog [iceberg]

2024-01-23 Thread via GitHub



github-actions[bot] commented on issue #672:
URL: https://github.com/apache/iceberg/issues/672#issuecomment-1907128437

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on code in PR #9423:
URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464142918


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala:
##
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.ViewAlreadyExistsException
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.ViewCatalog
+import org.apache.spark.sql.errors.QueryCompilationErrors
+import org.apache.spark.sql.execution.CommandExecutionMode
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.MetadataBuilder
+import org.apache.spark.sql.util.SchemaUtils
+import scala.collection.JavaConverters._
+
+
+case class CreateV2ViewExec(
+  catalog: ViewCatalog,
+  ident: Identifier,
+  originalText: String,
+  query: LogicalPlan,
+  userSpecifiedColumns: Seq[(String, Option[String])],
+  comment: Option[String],
+  properties: Map[String, String],
+  allowExisting: Boolean,
+  replace: Boolean) extends LeafV2CommandExec {
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+val qe = session.sessionState.executePlan(query, CommandExecutionMode.SKIP)
+qe.assertAnalyzed()
+val analyzedPlan = qe.analyzed
+
+val identifier = Spark3Util.toV1TableIdentifier(ident)
+
+if (userSpecifiedColumns.nonEmpty) {
+  if (userSpecifiedColumns.length > analyzedPlan.output.length) {
+throw QueryCompilationErrors.cannotCreateViewNotEnoughColumnsError(
+  identifier, userSpecifiedColumns.map(_._1), analyzedPlan)
+  } else if (userSpecifiedColumns.length < analyzedPlan.output.length) {
+throw QueryCompilationErrors.cannotCreateViewTooManyColumnsError(
+  identifier, userSpecifiedColumns.map(_._1), analyzedPlan)
+  }
+}
+
+val queryColumnNames = analyzedPlan.schema.fieldNames
+SchemaUtils.checkColumnNameDuplication(queryColumnNames, 
SQLConf.get.resolver)

Review Comment:
   After looking into this a bit more, `query` does correspond to 
`originalText`. That's good because we don't have to call the parser separately 
and this ensures that the query can be analyzed correctly. We should keep this 
design for the logical plan.
   
   I think there are a few things that we can improve, though. Since the query 
plan is linked in like this, it will be automatically analyzed so this should 
already catch things like trying to project a missing column (by the way, we 
should have a test for that). It will also catch missing relations (tables and 
other views) but we probably need to handle resolution a bit differently --- we 
want view resolution to happen just like it would when loading a view. I think 
that means that we should call our own code from `createViewRelation` to 
rewrite identifiers. (Ideally, we could alias here as well, but it may need 
some special handling if `GetColumnByOrdinal` doesn't work with an undetermined 
type.)
   
   Rewriting identifiers needs to happen immediately so that Spark doesn't 
substitute any temporary views. I think this needs to be done _before_ 
`ResolveViews` so we should do it in `RewriteViewCommands`.
   
   To summarize:
   * `RewriteViewCommands` should call our code to rewrite identifiers in the 
parsed `query` (and alias columns, if possible)
   * If `RewriteViewCommands` can't apply the aliases, then we will need a rule 
that applies them once the `query` is `resolved`
   * `RewriteViewCommands` should also check for temporary view references and 
fail if any are found (and maybe temporary functions as well?)
   * Checking for the right number of columns should be done in the checker 
(validation phase) if it isn't

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on code in PR #9423:
URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464144880


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/views/CreateIcebergView.scala:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical.views
+
+import org.apache.spark.sql.catalyst.plans.logical.BinaryCommand
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+case class CreateIcebergView(

Review Comment:
   I think this should have:
   * `child: LogicalPlan`
   * `queryText: String` (should not be optional)
   * `query: LogicalPlan`
   * `columnAliases: Seq[String]` (translate from Spark's weirdness in 
conversion)
   * `columnComments: Seq[Option[String]]`
   * `comment: String`
   * `properties: Map[String, String]`
   * `allowExisting: Boolean`
   * `replace: Boolean`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on code in PR #9423:
URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464145346


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala:
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.ViewAlreadyExistsException
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.ViewCatalog
+import org.apache.spark.sql.errors.QueryCompilationErrors
+import org.apache.spark.sql.execution.CommandExecutionMode
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.MetadataBuilder
+import org.apache.spark.sql.util.SchemaUtils
+import scala.collection.JavaConverters._
+
+
+case class CreateV2ViewExec(

Review Comment:
   I think this should have:
   * `catalog: ViewCatalog`
   * `ident: Identifier`
   * `queryText: String` (no need for query at this point, not optional)
   * `schema: StructType` (from the logical plan's `query.schema`)
   * `columnAliases: Seq[String]`
   * `columnComments: Seq[Option[String]]`
   * `comment: String`
   * `properties: Map[String, String]`
   * `allowExisting: Boolean`
   * `replace: Boolean`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on code in PR #9423:
URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464145827


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala:
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.ViewAlreadyExistsException
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.ViewCatalog
+import org.apache.spark.sql.errors.QueryCompilationErrors
+import org.apache.spark.sql.execution.CommandExecutionMode
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.MetadataBuilder
+import org.apache.spark.sql.util.SchemaUtils
+import scala.collection.JavaConverters._
+
+
+case class CreateV2ViewExec(
+  catalog: ViewCatalog,
+  ident: Identifier,
+  originalText: String,
+  query: LogicalPlan,
+  userSpecifiedColumns: Seq[(String, Option[String])],
+  comment: Option[String],
+  properties: Map[String, String],
+  allowExisting: Boolean,
+  replace: Boolean) extends LeafV2CommandExec {
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+val analyzedPlan = session.sessionState.executePlan(query, 
CommandExecutionMode.SKIP).analyzed
+val identifier = Spark3Util.toV1TableIdentifier(ident)
+
+if (userSpecifiedColumns.nonEmpty) {
+  if (userSpecifiedColumns.length > analyzedPlan.output.length) {
+throw QueryCompilationErrors.cannotCreateViewNotEnoughColumnsError(
+  identifier, userSpecifiedColumns.map(_._1), analyzedPlan)
+  } else if (userSpecifiedColumns.length < analyzedPlan.output.length) {
+throw QueryCompilationErrors.cannotCreateViewTooManyColumnsError(
+  identifier, userSpecifiedColumns.map(_._1), analyzedPlan)
+  }
+}
+
+val queryColumnNames = analyzedPlan.schema.fieldNames
+SchemaUtils.checkColumnNameDuplication(queryColumnNames, 
SQLConf.get.resolver)
+
+val viewSchema = aliasPlan(analyzedPlan, userSpecifiedColumns).schema
+val columnAliases = userSpecifiedColumns.map(_._1).toArray
+val columnComments = userSpecifiedColumns.map(_._2.orNull).toArray
+
+val currentCatalogName = 
session.sessionState.catalogManager.currentCatalog.name
+val currentCatalog = if (!catalog.name().equals(currentCatalogName)) 
currentCatalogName else null
+val currentNamespace = session.sessionState.catalogManager.currentNamespace
+
+val engineVersion = "Spark " + org.apache.spark.SPARK_VERSION
+val newProperties = properties ++
+  comment.map(ViewCatalog.PROP_COMMENT -> _) +
+  (ViewCatalog.PROP_CREATE_ENGINE_VERSION -> engineVersion,
+ViewCatalog.PROP_ENGINE_VERSION -> engineVersion)
+
+if (replace) {
+  // CREATE OR REPLACE VIEW
+  if (catalog.viewExists(ident)) {
+catalog.dropView(ident)
+  }
+  // FIXME: replaceView API doesn't exist in Spark 3.5
+  catalog.createView(
+ident,
+originalText,
+currentCatalog,
+currentNamespace,
+viewSchema,
+queryColumnNames,
+columnAliases,
+columnComments,
+newProperties.asJava)
+} else {
+  try {
+// CREATE VIEW [IF NOT EXISTS]
+catalog.createView(
+  ident,
+  originalText,
+  currentCatalog,
+  currentNamespace,
+  viewSchema,
+  queryColumnNames,
+  columnAliases,
+  columnComments,
+  newProperties.asJava)
+  } catch {
+case _: ViewAlreadyExistsException if allowExisting => // Ignore
+  }
+}
+
+Nil
+  }
+
+  override def simpleString(maxFields: Int): String = {
+s"CreateV2ViewExec: ${ident}"
+  }
+
+  /**
+

Re: [PR] Spark: Support creating views via SQL [iceberg]

2024-01-23 Thread via GitHub



rdblue commented on code in PR #9423:
URL: https://github.com/apache/iceberg/pull/9423#discussion_r1464146096


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -139,24 +139,9 @@ public void readFromMultipleViews() throws 
NoSuchTableException {
 insertRows(6);
 String viewName = "firstView";
 String secondView = "secondView";
-String viewSQL = String.format("SELECT id FROM %s WHERE id <= 3", 
tableName);
-String secondViewSQL = String.format("SELECT id FROM %s WHERE id > 3", 
tableName);
 
-ViewCatalog viewCatalog = viewCatalog();
-
-viewCatalog
-.buildView(TableIdentifier.of(NAMESPACE, viewName))
-.withQuery("spark", viewSQL)
-.withDefaultNamespace(NAMESPACE)
-.withSchema(schema(viewSQL))
-.create();
-
-viewCatalog
-.buildView(TableIdentifier.of(NAMESPACE, secondView))
-.withQuery("spark", secondViewSQL)
-.withDefaultNamespace(NAMESPACE)
-.withSchema(schema(secondViewSQL))
-.create();
+sql("CREATE VIEW %s AS SELECT id FROM %s WHERE id <= 3", viewName, 
tableName);
+sql("CREATE VIEW %s AS SELECT id FROM %s WHERE id > 3", secondView, 
tableName);

Review Comment:
   I don't think this should be changed. I like that it was previously isolated 
so that we weren't relying on Spark for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-23 Thread via GitHub



wolfeidau commented on code in PR #51:
URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1464148047


##
catalog/glue.go:
##
@@ -0,0 +1,162 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package catalog
+
+import (
+   "context"
+   "errors"
+   "fmt"
+
+   "github.com/apache/iceberg-go/io"
+   "github.com/apache/iceberg-go/table"
+   "github.com/aws/aws-sdk-go-v2/aws"
+   "github.com/aws/aws-sdk-go-v2/service/glue"
+   "github.com/aws/aws-sdk-go-v2/service/glue/types"
+)
+
+var (
+   _ Catalog = (*GlueCatalog)(nil)
+)
+
+type GlueAPI interface {
+   GetTable(ctx context.Context, params *glue.GetTableInput, optFns 
...func(*glue.Options)) (*glue.GetTableOutput, error)
+   GetTables(ctx context.Context, params *glue.GetTablesInput, optFns 
...func(*glue.Options)) (*glue.GetTablesOutput, error)
+}
+
+type GlueCatalog struct {
+   glueSvc GlueAPI
+}
+
+func NewGlueCatalog(awscfg aws.Config) *GlueCatalog {
+   return &GlueCatalog{
+   glueSvc: glue.NewFromConfig(awscfg),
+   }
+}

Review Comment:
   @zeroshade The current model feels like it is following the factory method 
pattern https://en.wikipedia.org/wiki/Factory_method_pattern, which is great 
pattern for certain use cases, but in this case there are lots of other things 
to consider when creating these catalogs, like identity, custom platform 
specific configuration ect.
   
   I prefer the Go model which although using common interfaces, encourages a 
more explicit creation for these things as this avoids "magic", which is 
important for reliable, deterministic operation of systems. 
   
   You still have common configuration/metrics/logging components, however 
these are instantiated based on the New routine in that catalog implementation. 
   
   Really it comes down to migrations being more of a considered action in most 
Go libraries, this ensures the configuration and security of these 
implementations is respected. 
   
   For example accessing an S3 data store could require specific role 
configuration for authentication, endpoints and event hooks, which may differ 
from the information used between different tables. A developer can do all this 
with the AWS SDK config, then pass it in when creating an IO for that 
particular store, and this is clearly specified in their code.
   
   That said the operation of the catalog using a common interface is great, 
this i where other common components  interact and operate so it is nice to 
have.
   
   Again as I said before, this is just something I have observed is a 
idiomatic in Go libraries, like the AWS SDK. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-23 Thread via GitHub



wolfeidau commented on code in PR #51:
URL: https://github.com/apache/iceberg-go/pull/51#discussion_r1464148047


##
catalog/glue.go:
##
@@ -0,0 +1,162 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package catalog
+
+import (
+   "context"
+   "errors"
+   "fmt"
+
+   "github.com/apache/iceberg-go/io"
+   "github.com/apache/iceberg-go/table"
+   "github.com/aws/aws-sdk-go-v2/aws"
+   "github.com/aws/aws-sdk-go-v2/service/glue"
+   "github.com/aws/aws-sdk-go-v2/service/glue/types"
+)
+
+var (
+   _ Catalog = (*GlueCatalog)(nil)
+)
+
+type GlueAPI interface {
+   GetTable(ctx context.Context, params *glue.GetTableInput, optFns 
...func(*glue.Options)) (*glue.GetTableOutput, error)
+   GetTables(ctx context.Context, params *glue.GetTablesInput, optFns 
...func(*glue.Options)) (*glue.GetTablesOutput, error)
+}
+
+type GlueCatalog struct {
+   glueSvc GlueAPI
+}
+
+func NewGlueCatalog(awscfg aws.Config) *GlueCatalog {
+   return &GlueCatalog{
+   glueSvc: glue.NewFromConfig(awscfg),
+   }
+}

Review Comment:
   @zeroshade The current model feels like it is following the factory method 
pattern https://en.wikipedia.org/wiki/Factory_method_pattern, which is great 
pattern for certain use cases, but in this case there are lots of other things 
to consider when creating these catalogs, like identity, custom platform 
specific configuration ect.
   
   I prefer the Go model which although using common interfaces, encourages a 
more explicit creation for these things as this avoids "magic", which is 
important for reliable, deterministic operation of systems. 
   
   You still have common configuration/metrics/logging components, however 
these are instantiated based on the New routine in that catalog implementation. 
   
   Really it comes down to migrations being more of a considered action in most 
Go libraries, this ensures the configuration and security of these 
implementations is respected. 
   
   For example accessing an S3 data store could require specific role 
configuration for authentication, endpoints and event hooks, which may differ 
from the information used between different tables. A developer can do all this 
with the AWS SDK config, then pass it in when creating an IO for that 
particular store, and this is clearly specified in their code.
   
   That said the operation of the catalog using a common interface is great, 
this i where other common components  interact and operate so it is nice to 
have.
   
   Again as I said before, this is just something I have observed as idiomatic 
in Go libraries, like the AWS SDK. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Query optimization fails after upgrading to 1.4.0+ with nullif in predicate [iceberg]

2024-01-23 Thread via GitHub



jakelong95 commented on issue #9518:
URL: https://github.com/apache/iceberg/issues/9518#issuecomment-1907148175

   The issue is reproducible on Iceberg 1.4.2 and Spark 3.5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-23 Thread via GitHub



mas-chen commented on code in PR #9524:
URL: https://github.com/apache/iceberg/pull/9524#discussion_r1464155134


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java:
##
@@ -58,9 +61,11 @@
 
 public class TestIcebergSourceContinuous {
 
+  public static final InMemoryReporter METRIC_REPORTER = 
InMemoryReporter.create();
+
   @ClassRule
   public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE =
-  MiniClusterResource.createWithClassloaderCheckDisabled();
+  MiniClusterResource.createWithClassloaderCheckDisabled(METRIC_REPORTER);

Review Comment:
   @pvary @stevenzwu @nastra I'm fine with that and I can volunteer to migrate 
this class to JUnit5 after the PR is merged. I am limited in my bandwidth this 
week, so I can address it next week, while I would like these metrics to land 
in the upcoming iceberg release.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-23 Thread via GitHub



stevenzwu commented on code in PR #9524:
URL: https://github.com/apache/iceberg/pull/9524#discussion_r1464178171


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java:
##
@@ -58,9 +61,11 @@
 
 public class TestIcebergSourceContinuous {
 
+  public static final InMemoryReporter METRIC_REPORTER = 
InMemoryReporter.create();
+
   @ClassRule
   public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE =
-  MiniClusterResource.createWithClassloaderCheckDisabled();
+  MiniClusterResource.createWithClassloaderCheckDisabled(METRIC_REPORTER);

Review Comment:
   I agree with @mas-chen that the JUnit5 can be tackled as a separate PR. this 
PR doesn't add any new test classes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spec: add multi-arg transform support [iceberg]

2024-01-23 Thread via GitHub



advancedxy commented on PR #8579:
URL: https://github.com/apache/iceberg/pull/8579#issuecomment-1907243141

   @szehon-ho @aokolnychyi the `bucketV2` part is removed from this PR. Let me 
know if you have any more comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Lock Ray on <2.8.0 [iceberg-python]

2024-01-23 Thread via GitHub



HonahX commented on PR #298:
URL: https://github.com/apache/iceberg-python/pull/298#issuecomment-1907409790

   OOPS. Seems there is a conflict. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-23 Thread via GitHub



HonahX commented on code in PR #288:
URL: https://github.com/apache/iceberg-python/pull/288#discussion_r1464342275


##
tests/catalog/integration_test_glue.py:
##
@@ -279,6 +379,20 @@ def test_commit_table_update_schema(
 assert test_catalog._parse_metadata_version(table.metadata_location) == 0
 assert original_table_metadata.current_schema_id == 0
 
+assert athena.get_query_results(f'SELECT * FROM 
"{database_name}"."{table_name}"') == [

Review Comment:
   >  In other words, if the table was created and not queryable, then a table 
was not actually successfully created.
   
   I hadn't considered this perspective initially. But indeed, these are very 
sound reasoning! Let's maintain the current state of these tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-23 Thread via GitHub



pvary commented on code in PR #9524:
URL: https://github.com/apache/iceberg/pull/9524#discussion_r1464351088


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceContinuous.java:
##
@@ -58,9 +61,11 @@
 
 public class TestIcebergSourceContinuous {
 
+  public static final InMemoryReporter METRIC_REPORTER = 
InMemoryReporter.create();
+
   @ClassRule
   public static final MiniClusterWithClientResource MINI_CLUSTER_RESOURCE =
-  MiniClusterResource.createWithClassloaderCheckDisabled();
+  MiniClusterResource.createWithClassloaderCheckDisabled(METRIC_REPORTER);

Review Comment:
   @mas-chen: It seems more complicated this way (modify, migrate, remove), but 
I am mostly interested in the final result 😀 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-23 Thread via GitHub



HonahX commented on code in PR #288:
URL: https://github.com/apache/iceberg-python/pull/288#discussion_r1464346417


##
pyiceberg/catalog/glue.py:
##
@@ -84,17 +110,97 @@ def _construct_parameters(
 return new_parameters
 
 
+GLUE_PRIMITIVE_TYPES = {
+BooleanType: "boolean",
+IntegerType: "int",
+LongType: "bigint",
+FloatType: "float",
+DoubleType: "double",
+DateType: "date",
+TimeType: "string",
+StringType: "string",
+UUIDType: "string",
+TimestampType: "timestamp",
+FixedType: "binary",
+BinaryType: "binary",
+}
+
+
+class IcebergSchemaToGlueType(SchemaVisitor[str]):
+def schema(self, schema: Schema, struct_result: str) -> str:
+return struct_result
+
+def struct(self, struct: StructType, field_results: List[str]) -> str:
+return f"struct<{','.join(field_results)}>"
+
+def field(self, field: NestedField, field_result: str) -> str:
+return f"{field.name}:{field_result}"
+
+def list(self, list_type: ListType, element_result: str) -> str:
+return f"array<{element_result}>"
+
+def map(self, map_type: MapType, key_result: str, value_result: str) -> 
str:
+return f"map<{key_result},{value_result}>"
+
+def primitive(self, primitive: PrimitiveType) -> str:
+if isinstance(primitive, DecimalType):
+return f"decimal({primitive.precision},{primitive.scale})"
+if (primitive_type := type(primitive)) not in GLUE_PRIMITIVE_TYPES:
+raise ValueError(f"Unknown primitive type: {primitive}")
+return GLUE_PRIMITIVE_TYPES[primitive_type]
+
+
+def _to_columns(metadata: TableMetadata) -> List[ColumnTypeDef]:
+results: Dict[str, ColumnTypeDef] = {}
+
+def _append_to_results(field: NestedField, is_current: bool) -> None:
+if field.name in results:
+return
+
+results[field.name] = cast(
+ColumnTypeDef,
+{
+"Name": field.name,
+"Type": visit(field.field_type, IcebergSchemaToGlueType()),

Review Comment:
   ```suggestion
   "Type": visit(field.field_type, _IcebergSchemaToGlueType()),
   ```



##
pyiceberg/catalog/glue.py:
##
@@ -84,17 +110,97 @@ def _construct_parameters(
 return new_parameters
 
 
+GLUE_PRIMITIVE_TYPES = {
+BooleanType: "boolean",
+IntegerType: "int",
+LongType: "bigint",
+FloatType: "float",
+DoubleType: "double",
+DateType: "date",
+TimeType: "string",
+StringType: "string",
+UUIDType: "string",
+TimestampType: "timestamp",
+FixedType: "binary",
+BinaryType: "binary",
+}
+
+
+class IcebergSchemaToGlueType(SchemaVisitor[str]):

Review Comment:
   ```suggestion
   class _IcebergSchemaToGlueType(SchemaVisitor[str]):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] `schema_id` not incremented during schema evolution [iceberg-python]

2024-01-23 Thread via GitHub



HonahX commented on issue #290:
URL: https://github.com/apache/iceberg-python/issues/290#issuecomment-1907475405

   @kevinjqliu Thanks for sharing these options. I think (3) is enough here 
since this is just in test. There are only few places which require checking 
equality of both fields and schema_id. If you think (2) is more helpful, you 
can add one in `conftest.py`. 
   
   Do you want to include these in 
https://github.com/apache/iceberg-python/pull/289/ ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Add install extra for sqlite [iceberg-python]

2024-01-23 Thread via GitHub



HonahX closed issue #285: Add install extra for sqlite
URL: https://github.com/apache/iceberg-python/issues/285


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Set Glue Table Information when creating/updating tables [iceberg-python]

2024-01-23 Thread via GitHub



mgmarino commented on PR #288:
URL: https://github.com/apache/iceberg-python/pull/288#issuecomment-1907487756

   > Overall LGTM! Thank you very much @mgmarino! Just adding one final request 
to make `IcebergSchemaToGlueType` internal by adding `_`. Sorry I missed that 
early
   
   Sure, np!
   
   > I've added this to 0.6.0 milestone to ensure that Athena users can query 
the table written by pyiceberg after releasing write support.
   
   🎉 
   
   Thanks, @HonahX and @nicor88, for the input and reviews! 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.4, 3.5: Enable drop table with purge in tests [iceberg]

2024-01-23 Thread via GitHub



nastra merged PR #9548:
URL: https://github.com/apache/iceberg/pull/9548


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Don't run CI's on unrelated changes [iceberg]

2024-01-23 Thread via GitHub



nastra merged PR #9526:
URL: https://github.com/apache/iceberg/pull/9526


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Spark: Fix aggregation pushdown on struct fields [iceberg]

2024-01-23 Thread via GitHub



nastra commented on code in PR #9176:
URL: https://github.com/apache/iceberg/pull/9176#discussion_r1464437182


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestAggregatePushDown.java:
##
@@ -249,6 +250,78 @@ public void testAggregateNotPushDownIfOneCantPushDown() {
 assertEquals("expected and actual should equal", expected, actual);
   }
 
+  @Test
+  public void testAggregationPushdownStructInteger() {
+testAggregationPushdownStruct(
+2L,
+3L,
+2L,
+"(id BIGINT, struct_with_int STRUCT)",
+"struct_with_int.c1",
+"(1, named_struct(\"c1\", NULL))",
+"(2, named_struct(\"c1\", 2))",
+"(3, named_struct(\"c1\", 3))");
+  }
+
+  @Test
+  public void testAggregationPushdownNestedStruct() {
+testAggregationPushdownStruct(
+2L,
+3L,
+2L,
+"(id BIGINT, struct_with_int 
STRUCT>>>)",
+"struct_with_int.c1.c2.c3.c4",
+"(1, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", 
named_struct(\"c4\", NULL)",
+"(2, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", 
named_struct(\"c4\", 2)",
+"(3, named_struct(\"c1\", named_struct(\"c2\", named_struct(\"c3\", 
named_struct(\"c4\", 3)");
+  }
+
+  @Test
+  public void testAggregationPushdownStructTimestamp() {
+long timestamp = System.currentTimeMillis();
+long futureTimestamp = timestamp + 5000;
+Timestamp expectedMax = new Timestamp(futureTimestamp / 1000 * 1000);
+Timestamp expectedMin = new Timestamp(1000 * (timestamp / 1000));
+testAggregationPushdownStruct(
+2L,
+expectedMax,
+expectedMin,
+"(id BIGINT, struct_with_ts STRUCT)",
+"struct_with_ts.c1",
+"(1, named_struct(\"c1\", NULL))",
+String.format(
+"(2, named_struct(\"c1\", CAST(from_unixtime(%d/1000) AS 
TIMESTAMP)))", timestamp),
+String.format(
+"(3, named_struct(\"c1\", CAST(from_unixtime(%d/1000) AS 
TIMESTAMP)))",
+timestamp + 5000));
+  }
+
+  private void testAggregationPushdownStruct(
+  Object expectedCount,
+  Object expectedMax,
+  Object expectedMin,
+  String schema,
+  String aggField,
+  String... rows) {
+sql("CREATE TABLE %s %s USING iceberg", tableName, schema);
+sql("INSERT INTO TABLE %s VALUES %s", tableName, String.join(",", rows));
+List actual =
+sql("SELECT COUNT(%s), MAX(%s), MIN(%s) FROM %s", aggField, aggField, 
aggField, tableName);
+Object actualCount = actual.get(0)[0];
+Object actualMax = actual.get(0)[1];
+Object actualMin = actual.get(0)[2];
+Assertions.assertThat(actualCount)
+.withFailMessage("Expected and actual count should equal")
+.isEqualTo(expectedCount);
+Assertions.assertThat(actualMax)
+.withFailMessage("Expected and actual max should equal")
+.isEqualTo(expectedMax);
+Assertions.assertThat(actualMin)
+.withFailMessage("Expected and actual min should equal")

Review Comment:
   it's better to use `.as()` instead of `withFailMessage()` as otherwise we'd 
be losing the entire context why the assertion failed and what values 
actual/expected had



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Spark: Fix aggregation pushdown on struct fields [iceberg]

2024-01-23 Thread via GitHub



nastra commented on code in PR #9176:
URL: https://github.com/apache/iceberg/pull/9176#discussion_r1464446258


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestAggregatePushDown.java:
##
@@ -249,6 +250,78 @@ public void testAggregateNotPushDownIfOneCantPushDown() {
 assertEquals("expected and actual should equal", expected, actual);
   }
 
+  @Test
+  public void testAggregationPushdownStructInteger() {

Review Comment:
   an alternative to having multiple methods that effectively call the same 
test code is to have a single test method that is parameterized



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

79 matches

Mail list logo