Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Fokko commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442596925 ## website/src/reference/setup_gpg.md: ## @@ -0,0 +1,161 @@ + + +# Setup GPG key + +> This section is a brief from the [Cryptography with OpenPGP](https://infra.apache.org/openpgp.html) guideline. + + +## Install GPG + +For more details, please refer to [GPG official website](https://www.gnupg.org/download/index.html). Here shows one approach to install GPG with `apt`: + +```shell +sudo apt install gnupg2 +``` + +## Generate GPG Key + +Attentions: + +- Name is best to keep consistent with your full name of Apache ID; +- Email should be the Apache email; +- Name is best to only use English to avoid garbled. + +Run `gpg --full-gen-key` and complete the generation interactively: + +```shell +gpg (GnuPG) 2.2.20; Copyright (C) 2020 Free Software Foundation, Inc. +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. + +Please select what kind of key you want: + (1) RSA and RSA (default) + (2) DSA and Elgamal + (3) DSA (sign only) + (4) RSA (sign only) + (14) Existing key from card +Your selection? 1 # input 1 +RSA keys may be between 1024 and 4096 bits long. +What keysize do you want? (2048) 4096 # input 4096 +Requested keysize is 4096 bits +Please specify how long the key should be valid. + 0 = key does not expire += key expires in n days + w = key expires in n weeks + m = key expires in n months + y = key expires in n years +Key is valid for? (0) 0 # input 0 +Key does not expire at all +Is this correct? (y/N) y # input y + +GnuPG needs to construct a user ID to identify your key. + +Real name: Hulk Lin # input your name +Email address: h...@apache.org# input your email +Comment: # input some annotations, optional +You selected this USER-ID: +"Hulk " + +Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O # input O +We need to generate a lot of random bytes. It is a good idea to perform +some other action (type on the keyboard, move the mouse, utilize the +disks) during the prime generation; this gives the random number +generator a better chance to gain enough entropy. +We need to generate a lot of random bytes. It is a good idea to perform +some other action (type on the keyboard, move the mouse, utilize the +disks) during the prime generation; this gives the random number +generator a better chance to gain enough entropy. + +# Input the security key +┌──┐ +│ Please enter this passphrase │ +│ │ +│ Passphrase: ___ │ +│ │ +│ │ +└──┘ +# key generation will be done after your inputting the key with the following output +gpg: key E49B00F626B marked as ultimately trusted +gpg: revocation certificate stored as '/Users/hulk/.gnupg/openpgp-revocs.d/F77B887A4F25A9468C513E9AA3008E49B00F626B.rev' +public and secret key created and signed. + +pub rsa4096 2022-07-12 [SC] + F77B887A4F25A9468C513E9AA3008E49B00F626B +uid [ultimate] hulk +sub rsa4096 2022-07-12 [E] +``` + +## Upload your key to public GPG keyserver + +Firstly, list your key: + +```shell +gpg --list-keys +``` + +The output is like: + +```shell +--- +pub rsa4096 2022-07-12 [SC] + F77B887A4F25A9468C513E9AA3008E49B00F626B +uid [ultimate] hulk +sub rsa4096 2022-07-12 [E] +``` + +Then, send your key id to key server: + +```shell +gpg --keyserver keys.openpgp.org --send-key # e.g., F77B887A4F25A9468C513E9AA3008E49B00F626B +``` + +Among them, `keys.openpgp.org` is a randomly selected keyserver, you can use `keyserver.ubuntu.com` or any other full-featured keyserver. + +## Check whether the key is created successfully + +Uploading takes about one minute; after that, you can check by your email at the corresponding keyserver. + +Uploading keys to the keyserver is mainly for joining a [Web of Trust](https://infra.apache.org/release-signing.html#web-of-trust). + +## Add your GPG public key to the KEYS document + +:::info + +`SVN` is required for this step. + +::: + +The svn repository of the release branch is: https://dist.apache.org/repos/dist/release/incubator/opendal Review Comment: I think we want to `sed 's/opendal/iceberg/g'` this section -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org ---
Re: [I] Flink import debezium cdc record(delete type) to iceberg(0.13.2+) got IndexOutOfBoundsException [iceberg]
zinking commented on issue #5043: URL: https://github.com/apache/iceberg/issues/5043#issuecomment-1878282717 incorrect data type mapping between mysql and iceberg could also lead to this. for example tinyint -> boolean -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Fokko commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442598483 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo + +SVN is
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Fokko commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442599426 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo + +SVN is
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Fokko commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442598860 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo + +SVN is
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Fokko commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442599810 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo + +SVN is
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Fokko commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442600155 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo + +SVN is
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
Fokko commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442602143 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo + +SVN is
Re: [PR] Spark 3.5: Migrate tests that depend on SparkDistributedDataScanTestBase to JUnit5 [iceberg]
chinmay-bhat commented on PR #9416: URL: https://github.com/apache/iceberg/pull/9416#issuecomment-1878298683 Based on the test failures, older spark versions depend on `core` files like `DeleteFileIndexTestBase`, `ParameterizedTableTestBase`, `DataTableScanTestBase`,`ScanPlanningAndReportingTestBase`, which have been updated to JUnit5 in this PR. A possible solution would be to migrate the affected files in older versions to JUnit5 as well. That would be a task bigger than this PRs scope. Another solution would be to create JUnit5 versions of the above `core` files, but I'm not keen on creating unnecessary duplicates. @nastra @Fokko I faced a similar problem in #9382. Any advise would be great! :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support renaming views [iceberg]
nastra commented on code in PR #9343: URL: https://github.com/apache/iceberg/pull/9343#discussion_r1442615912 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -53,6 +57,17 @@ case class ResolveViews(spark: SparkSession) extends Rule[LogicalPlan] with Look loadView(catalog, ident) .map(createViewRelation(parts, _)) .getOrElse(u) + +case u@UnresolvedTableOrView(CatalogAndIdentifier(catalog, ident), _, _) => + loadView(catalog, ident) +.map(_ => ResolvedV2View(catalog.asViewCatalog, ident)) +.getOrElse(u) + +case RenameTable(ResolvedV2View(oldCatalog, oldIdent), CatalogAndIdentifier(newCatalog, newIdent), true) => Review Comment: I added this to `isView@true` so that it's clearer. I checked the Spark code and the `to` identifier isn't resolved anywhere and just passed as-is when doing a rename -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]
liurenjie1024 commented on code in PR #147: URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442622182 ## website/src/release.md: ## @@ -0,0 +1,383 @@ + + +This document mainly introduces how the release manager releases a new version in accordance with the Apache requirements. + +## Introduction + +`Source Release` is the key point which Apache values, and is also necessary for an ASF release. + +Please remember that publishing software has legal consequences. + +This guide complements the foundation-wide policies and guides: + +- [Release Policy](https://www.apache.org/legal/release-policy.html) +- [Release Distribution Policy](https://infra.apache.org/release-distribution) +- [Release Creation Process](https://infra.apache.org/release-publishing.html) + +## Some Terminology of release + +In the context of our release, we use several terms to describe different stages of the release process. + +Here's an explanation of these terms: + +- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`. +- `release_version`: the version of release candidate, like `0.2.0-rc.1`. +- `rc_version`: the minor version for voting round, like `rc.1`. + +## Preparation + + + +This section is the requirements for individuals who are new to the role of release manager. + + + +Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has been set up. + +## Start a tracking issue about the next release + +Start a tracking issue on GitHub for the upcoming release to track all tasks that need to be completed. + +Title: + +``` +Tracking issues of Iceberg Rust ${iceberg_version} Release +``` + +Content: + +```markdown +This issue is used to track tasks of the iceberg rust ${iceberg_version} release. + +## Tasks + +### Blockers + +> Blockers are the tasks that must be completed before the release. + +### Build Release + + GitHub Side + +- [ ] Bump version in project +- [ ] Update docs +- [ ] Generate dependencies list +- [ ] Push release candidate tag to GitHub + + ASF Side + +- [ ] Create an ASF Release +- [ ] Upload artifacts to the SVN dist repo + +### Voting + +- [ ] Start VOTE at iceberg community + +### Official Release + +- [ ] Push the release git tag +- [ ] Publish artifacts to SVN RELEASE branch +- [ ] Change Iceberg Rust Website download link +- [ ] Send the announcement + +For details of each step, please refer to: https://rust.iceberg.apache.org/release +``` + +## GitHub Side + +### Bump version in project + +Bump all components' version in the project to the new iceberg version. +Please note that this version is the exact version of the release, not the release candidate version. + +- rust core: bump version in `Cargo.toml` + +### Update docs + +- Update `CHANGELOG.md`, refer to [Generate Release Note](reference/generate_release_note.md) for more information. + +### Generate dependencies list + +Download and setup `cargo-deny`. You can refer to [cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html). + +Running `python3 ./scripts/dependencies.py generate` to update the dependencies list of every package. + +### Push release candidate tag + +After bump version PR gets merged, we can create a GitHub release for the release candidate: + +- Create a tag at `main` branch on the `Bump Version` / `Patch up version` commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the corresponding commit instead of directly tagging on the main branch. +- Push tags to GitHub: `git push --tags`. + +## ASF Side + +If any step in the ASF Release process fails and requires code changes, +we will abandon that version and prepare for the next one. +Our release page will only display ASF releases instead of GitHub Releases. + +### Create an ASF Release + +After GitHub Release has been created, we can start to create ASF Release. + +- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created in the previous step) +- Use the release script to create a new release: `ICEBERG_VERSION= ICEBERG_VERSION_RC= ./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 ./scripts/release.sh`) +- This script will do the following things: +- Create a new branch named by `release-${release_version}` from the tag +- Generate the release candidate artifacts under `dist`, including: +- `apache-iceberg-rust-${release_version}-src.tar.gz` +- `apache-iceberg-rust-${release_version}-src.tar.gz.asc` +- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512` +- Check the header of the source code. This step needs docker to run. +- Push the newly created branch to GitHub + +This script will create a new release under `dist`. + +For example: + +```shell +> tree dist +dist +├── apache-iceberg-rust-0.2.0-src.tar.gz +├── apache-iceberg-rust-0.2.0-src.tar.gz.asc +└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512 +``` + +### Upload artifacts to the SVN dist repo +
Re: [PR] Core: Remove deprecated method from BaseMetadataTable [iceberg]
ajantha-bhat commented on code in PR #9298: URL: https://github.com/apache/iceberg/pull/9298#discussion_r1442638780 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseFileRewriteCoordinator.java: ## @@ -72,18 +70,12 @@ public void clearRewrite(Table table, String fileSetId) { public Set fetchSetIds(Table table) { return resultMap.keySet().stream() -.filter(e -> e.first().equals(tableUUID(table))) +.filter(e -> e.first().equals(Spark3Util.tableUUID(table))) Review Comment: Yeah, I renamed to `baseTableUUID` and I am not sure about the guidelines on static import. Some place we use it and some place we don't. So, I left it as it is. ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java: ## @@ -948,6 +950,17 @@ public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier( return org.apache.spark.sql.catalyst.TableIdentifier.apply(table, database); } + static String tableUUID(org.apache.iceberg.Table table) { +if (table instanceof HasTableOperations) { + TableOperations ops = ((HasTableOperations) table).operations(); + return ops.current().uuid(); +} else if (table instanceof BaseMetadataTable) { + return ((BaseMetadataTable) table).table().operations().current().uuid(); +} else { + return null; Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Watermark read options [iceberg]
pvary commented on code in PR #9346: URL: https://github.com/apache/iceberg/pull/9346#discussion_r1442661021 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceSql.java: ## @@ -39,4 +62,76 @@ public void before() throws IOException { .getConfiguration() .set(TableConfigOptions.TABLE_DYNAMIC_TABLE_OPTIONS_ENABLED, true); } + + private Record generateRecord(Instant t1) { +Record record = GenericRecord.create(SCHEMA_TS); +record.setField("t1", t1.atZone(ZoneId.systemDefault()).toLocalDateTime()); +record.setField("t2", t1.getEpochSecond()); +return record; + } + + private List testWatermarkOptionsInternal(boolean ascending) throws Exception { +Table table = catalogResource.catalog().createTable(TestFixtures.TABLE_IDENTIFIER, SCHEMA_TS); +long baseTime = 1702382109000L; +GenericAppenderHelper helper = +new GenericAppenderHelper(table, FileFormat.PARQUET, TEMPORARY_FOLDER); +// File 1 - early timestamps +Record early1 = generateRecord(Instant.ofEpochMilli(baseTime)); +Record early2 = generateRecord(Instant.ofEpochMilli(baseTime - 10 * 1000L)); +List recordsDataFile1 = Lists.newArrayList(); +recordsDataFile1.add(early1); +recordsDataFile1.add(early2); +DataFile dataFile1 = helper.writeFile(recordsDataFile1); +// File 2 - old timestamps +Record late1 = generateRecord(Instant.ofEpochMilli(baseTime + 14 * 1000L)); +Record late2 = generateRecord(Instant.ofEpochMilli(baseTime + 12 * 1000L)); +List recordsDataFile2 = Lists.newArrayList(); +recordsDataFile2.add(late1); +recordsDataFile2.add(late2); +DataFile dataFile2 = helper.writeFile(recordsDataFile2); +helper.appendToTable(dataFile1, dataFile2); +List expected = Lists.newArrayList(); +if (ascending) { + expected.addAll(recordsDataFile1); + expected.addAll(recordsDataFile2); +} else { + expected.addAll(recordsDataFile2); + expected.addAll(recordsDataFile1); +} +return expected; + } + + /** + * Tests the order of splits returned when setting the watermark-column and watermark-timeunit + * options + */ + @Test + public void testWatermarkOptionsAscending() throws Exception { +List expected = testWatermarkOptionsInternal(true); +TestHelpers.assertRecordsWithOrder( +run( +ImmutableMap.of("watermark-column", "t1", "split-file-open-cost", "12800"), +"", +"*"), +expected, +SCHEMA_TS); + } + + @Test + public void testWatermarkOptionsDescending() throws Exception { +List expected = testWatermarkOptionsInternal(false); +TestHelpers.assertRecordsWithOrder( +run( +ImmutableMap.of( +"watermark-column", +"t2", +"watermark-time-unit", Review Comment: Shouldn't this be `watermark-column-time-unit`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Watermark read options [iceberg]
pvary commented on code in PR #9346: URL: https://github.com/apache/iceberg/pull/9346#discussion_r1442664555 ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceSql.java: ## @@ -39,4 +62,76 @@ public void before() throws IOException { .getConfiguration() .set(TableConfigOptions.TABLE_DYNAMIC_TABLE_OPTIONS_ENABLED, true); } + + private Record generateRecord(Instant t1) { +Record record = GenericRecord.create(SCHEMA_TS); +record.setField("t1", t1.atZone(ZoneId.systemDefault()).toLocalDateTime()); +record.setField("t2", t1.getEpochSecond()); Review Comment: What I do not like in this test, that if we make a mistake and switch `t1` with `t2` in somewhere in the production code, or in the test then we will not see a difference in the test results. That is why I originally suggested, that the different columns have different values to make sure that the `watermark-column` value is propagated correctly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] why Iceberg Spark add_files procedure does not support parquet file with parquet-mr 1.10 version? [iceberg]
hbpeng0115 opened a new issue, #9418: URL: https://github.com/apache/iceberg/issues/9418 ### Query engine Spark ### Question Hello, we use Spark 3.2 + Iceberg 1.4.3 call procedure to add S3 parquet files to existed Iceberg table. However, we find parquet files with parquet-mr 1.10 version can not be queried after invoking add_files procedure and throw the following exception. And parquet files with parquet-mr 1.13 version can be queried. Does anyone know how to solve this issue? Many thanks~ ``` Caused by: org.apache.iceberg.shaded.org.apache.parquet.io.ParquetDecodingException: Can't read value in column [request, context, profile_concrete_event_id, array] repeated int64 array = 170 at value 76034 out of 184067 in current page. repetition level: 1, definition level: 3 at org.apache.iceberg.parquet.PageIterator.handleRuntimeException(PageIterator.java:220) at org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:151) at org.apache.iceberg.parquet.ColumnIterator.nextLong(ColumnIterator.java:128) at org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:49) at org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:46) at org.apache.iceberg.parquet.ParquetValueReaders$UnboxedReader.read(ParquetValueReaders.java:246) at org.apache.iceberg.parquet.ParquetValueReaders$RepeatedReader.read(ParquetValueReaders.java:467) at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419) at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745) at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745) at org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419) at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745) at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130) at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65) at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49) at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:129) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:93) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:130) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking stream. at org.apache.iceberg.shaded.org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) at org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80) at org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62) at org.apache.iceberg.shaded.org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong(DictionaryValuesReader.java:117) at org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:149) ... 33 more``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at
Re: [I] Pruning partitions using filter when reading file. [iceberg-rust]
liurenjie1024 commented on issue #126: URL: https://github.com/apache/iceberg-rust/issues/126#issuecomment-1878483283 Blocked by #153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]
pvary commented on PR #9185: URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1878512144 > > Do we have a way to check if the new tests were running on the CI? With the correct parameters? > > I don't think so, but I was verifying them all locally In this case, I am find with these changes. In the next PR we should port this changes to 1.18, and 1.16 too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] API, Core: Fix errorprone warnings [iceberg]
ajantha-bhat opened a new pull request, #9419: URL: https://github.com/apache/iceberg/pull/9419 `./gradlew clean build -x test` is reporting some warnings as seen below. https://github.com/apache/iceberg/assets/5889404/e64c76c4-29d3-42cf-b3e0-c1d9f847200b";> After this PR, `./gradlew clean build -x test` is green. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Fix errorprone warnings [iceberg]
ajantha-bhat commented on PR #9419: URL: https://github.com/apache/iceberg/pull/9419#issuecomment-1878529981 cc: @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Fix errorprone warnings [iceberg]
ajantha-bhat commented on code in PR #9419: URL: https://github.com/apache/iceberg/pull/9419#discussion_r1442771024 ## api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java: ## @@ -166,6 +166,7 @@ public void clear() { wrapperSet.clear(); } + @SuppressWarnings("CollectionUndefinedEquality") Review Comment: `containsAll` uses `CharSequenceWrapper` for equals to ensure consistent hashing and equals behaviour similar to `CharSequenceMap` (javadoc of `CharSequenceMap` explains it). But the tool is unable to figure it out I guess. Hence, suppressing here and below two places. ## api/src/main/java/org/apache/iceberg/util/CharSequenceWrapper.java: ## @@ -54,7 +54,7 @@ public boolean equals(Object other) { CharSequenceWrapper that = (CharSequenceWrapper) other; if (wrapped instanceof String && that.wrapped instanceof String) { - return wrapped.equals(that.wrapped); + return wrapped.toString().contentEquals(that.wrapped); Review Comment: see [https://errorprone.info/bugpattern/UndefinedEquals](https://errorprone.info/bugpattern/UndefinedEquals) check For CharSequence -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] open-api: Use openapi-generator-gradle-plugin for validating specification [iceberg]
Fokko commented on code in PR #9344: URL: https://github.com/apache/iceberg/pull/9344#discussion_r1442774264 ## open-api/rest-catalog-open-api.yaml: ## @@ -2568,26 +2568,26 @@ components: propertyName: type mapping: assert-view-uuid: '#/components/schemas/AssertViewUUID' -type: object -required: - - type -properties: - type: -type: "string" + type: object Review Comment: Nice catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Allow snapshotting iceberg table (create new table based on certain Iceberg snapshot) [iceberg]
YehorKrivokon commented on issue #2481: URL: https://github.com/apache/iceberg/issues/2481#issuecomment-1878533755 Hi all, I'm trying to use Iceberg with Spark and SparkCatalog and I've a repro of this issue. Is there any workaround to work with snapshots? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] AWS: Add Option to don't write non current columns in glue schema closes #7584 [iceberg]
Raphael-Vignes opened a new pull request, #9420: URL: https://github.com/apache/iceberg/pull/9420 This PR aims to close this [issue](https://github.com/apache/iceberg/issues/7584) (and would resolve this [issue](https://github.com/apache/iceberg/issues/6340) too. We want to provide an optional parameter to `IcebergToGlueConverter`'s `setTableInputInformation` method (set to false by default) named ̀glueCatalogWriteNonCurrentColumns`. It will allow to skip writing columns that were deleted in previous version of the Glue Schema instead of writing and setting the `iceberg.filed.current` value to false. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support dropping views [iceberg]
nastra commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1442789176 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -111,8 +116,8 @@ case class ResolveViews(spark: SparkSession) extends Rule[LogicalPlan] with Look } private def qualifyFunctionIdentifiers( - plan: LogicalPlan, - catalogAndNamespace: Seq[String]): LogicalPlan = plan transformExpressions { +plan: LogicalPlan, +catalogAndNamespace: Seq[String]): LogicalPlan = plan transformExpressions { Review Comment: formatting was off here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.53.v20231009 to 11.0.19 [iceberg]
nastra closed pull request #9370: Build: Bump jetty from 9.4.53.v20231009 to 11.0.19 URL: https://github.com/apache/iceberg/pull/9370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.53.v20231009 to 11.0.19 [iceberg]
dependabot[bot] commented on PR #9370: URL: https://github.com/apache/iceberg/pull/9370#issuecomment-1878554815 OK, I won't notify you again about this release, but will get in touch when a new version is available. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.google.errorprone:error_prone_annotations from 2.23.0 to 2.24.0 [iceberg]
nastra merged PR #9369: URL: https://github.com/apache/iceberg/pull/9369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.1 [iceberg]
nastra commented on PR #9371: URL: https://github.com/apache/iceberg/pull/9371#issuecomment-1878558251 Looks like `iceberg-aliyun` uses this for testing. I don't think we can upgrade to the newer version, since that requires JDK 11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.1 [iceberg]
dependabot[bot] commented on PR #9371: URL: https://github.com/apache/iceberg/pull/9371#issuecomment-1878558298 OK, I won't notify you again about this release, but will get in touch when a new version is available. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.1 [iceberg]
nastra closed pull request #9371: Build: Bump spring-boot from 2.5.4 to 3.2.1 URL: https://github.com/apache/iceberg/pull/9371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]
nastra commented on PR #9185: URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1878559904 @cgpoh can you please port these changes to other Flink versions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink 1.17: Create JUnit5 version of TestFlinkScan [iceberg]
nastra merged PR #9185: URL: https://github.com/apache/iceberg/pull/9185 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Create JUnit5-version of TestFlinkScan [iceberg]
nastra closed issue #9077: Create JUnit5-version of TestFlinkScan URL: https://github.com/apache/iceberg/issues/9077 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Nessie: Strip trailing slash for warehouse location [iceberg]
nastra merged PR #9415: URL: https://github.com/apache/iceberg/pull/9415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Adding Junit 5 conversion and AssertJ style for TestFlinkCatalogTable… [iceberg]
nastra commented on PR #9381: URL: https://github.com/apache/iceberg/pull/9381#issuecomment-1878626062 @vinitpatni the issue with `GenericAppenderHelper` has been addressed in https://github.com/apache/iceberg/pull/9185, so this class should now be usable with JUnit4 + JUni5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.31.0 [iceberg]
dependabot[bot] commented on PR #9041: URL: https://github.com/apache/iceberg/pull/9041#issuecomment-1878626952 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.31.0 [iceberg]
nastra closed pull request #9041: Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.31.0 URL: https://github.com/apache/iceberg/pull/9041 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]
nastra commented on code in PR #9417: URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442849817 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestChangelogIterator.java: ## @@ -196,10 +196,10 @@ public void testUpdatedRowsWithDuplication() { Iterator iterator = ChangelogIterator.computeUpdates(rowsWithDuplication.iterator(), SCHEMA, IDENTIFIER_FIELDS); -assertThrows( -"Cannot compute updates because there are multiple rows with the same identifier fields([id, name]). Please make sure the rows are unique.", -IllegalStateException.class, -() -> Lists.newArrayList(iterator)); +assertThatThrownBy(() -> Lists.newArrayList(iterator)) +.as( +"Cannot compute updates because there are multiple rows with the same identifier fields([id, name]). Please make sure the rows are unique.") +.isInstanceOf(IllegalStateException.class); Review Comment: can you please add a `.hasMessage()` check here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Arrow: Use case instead of wrapping a map/list [iceberg-python]
Fokko opened a new pull request, #252: URL: https://github.com/apache/iceberg-python/pull/252 Wrapping the list seems to introduce an odd behavior where `null` values are converted to an empty list `[]`. Resolves #251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]
nastra commented on code in PR #9417: URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442860998 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/TestTableSerialization.java: ## @@ -36,29 +39,23 @@ import org.apache.iceberg.relocated.com.google.common.collect.Lists; import org.apache.iceberg.spark.source.SerializableTableWithSize; import org.apache.iceberg.types.Types; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestTableSerialization { +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.TestTemplate; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.api.io.TempDir; - public TestTableSerialization(String isObjectStoreEnabled) { -this.isObjectStoreEnabled = isObjectStoreEnabled; - } +@ExtendWith(ParameterizedTestExtension.class) +public class TestTableSerialization { - @Parameterized.Parameters(name = "isObjectStoreEnabled = {0}") - public static Object[] parameters() { -return new Object[] {"true", "false"}; + @Parameters(name = "isObjectStoreEnabled = {0}") + public static Object[][] parameters() { Review Comment: probably better to have it less complex: ``` public static List parameters() { return Arrays.asList("true", "false"); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]
nastra commented on code in PR #9417: URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442861281 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/TestTableSerialization.java: ## @@ -36,29 +39,23 @@ import org.apache.iceberg.relocated.com.google.common.collect.Lists; import org.apache.iceberg.spark.source.SerializableTableWithSize; import org.apache.iceberg.types.Types; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Rule; -import org.junit.Test; -import org.junit.rules.TemporaryFolder; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestTableSerialization { +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.TestTemplate; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.api.io.TempDir; - public TestTableSerialization(String isObjectStoreEnabled) { -this.isObjectStoreEnabled = isObjectStoreEnabled; - } +@ExtendWith(ParameterizedTestExtension.class) +public class TestTableSerialization { - @Parameterized.Parameters(name = "isObjectStoreEnabled = {0}") - public static Object[] parameters() { -return new Object[] {"true", "false"}; + @Parameters(name = "isObjectStoreEnabled = {0}") + public static Object[][] parameters() { +return new Object[][] {{"true"}, {"false"}}; } private static final HadoopTables TABLES = new HadoopTables(); - private final String isObjectStoreEnabled; + @Parameter(index = 0) Review Comment: ```suggestion @Parameter ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to assign splits due to the serialized split size [iceberg]
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1878646665 No idea tbh since I run into this on production and there I don't have the ability to go deep and debug. I wouldn't say it is a wide table in terms of number of columns and I don't know about column stats. Is there a way to fetch that from somewhere? But the metadata about the file I know at least is like; One of the files: **File name:** `s3://data-furnishing-pipeline-eu-production/data-lake/dafu_pipeline.db/transactional_customer_ids_created/data/__key_bucket=0/createdAt_day=2018-10-04/country=NO/00021-0-ca186d77-6349-4e79-bb4f-874fac0ee123-00186.parquet` **Schema**: ```json [ { "Name": "__key", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "1", "iceberg.field.optional": "true" } }, { "Name": "eventid", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "2", "iceberg.field.optional": "false" } }, { "Name": "authorizationid", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "3", "iceberg.field.optional": "false" } }, { "Name": "authorizationkrn", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "4", "iceberg.field.optional": "true" } }, { "Name": "orderid", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "5", "iceberg.field.optional": "false" } }, { "Name": "ordershortid", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "6", "iceberg.field.optional": "true" } }, { "Name": "orderkrn", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "7", "iceberg.field.optional": "true" } }, { "Name": "nationalidentificationnumber", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "8", "iceberg.field.optional": "true" } }, { "Name": "personaid", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "9", "iceberg.field.optional": "true" } }, { "Name": "personakrn", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "10", "iceberg.field.optional": "true" } }, { "Name": "accountid", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "11", "iceberg.field.optional": "true" } }, { "Name": "accountkrn", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "12", "iceberg.field.optional": "true" } }, { "Name": "country", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "13", "iceberg.field.optional": "false" } }, { "Name": "createdat", "Type": "timestamp", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "14", "iceberg.field.optional": "false" } } ] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]
nastra commented on code in PR #9417: URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442867841 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestFunctionCatalog.java: ## @@ -27,107 +30,106 @@ import org.apache.spark.sql.connector.catalog.FunctionCatalog; import org.apache.spark.sql.connector.catalog.Identifier; import org.apache.spark.sql.connector.catalog.functions.UnboundFunction; -import org.assertj.core.api.Assertions; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.TestTemplate; -public class TestFunctionCatalog extends SparkTestBaseWithCatalog { +public class TestFunctionCatalog extends TestBaseWithCatalog { private static final String[] EMPTY_NAMESPACE = new String[] {}; private static final String[] SYSTEM_NAMESPACE = new String[] {"system"}; private static final String[] DEFAULT_NAMESPACE = new String[] {"default"}; private static final String[] DB_NAMESPACE = new String[] {"db"}; - private final FunctionCatalog asFunctionCatalog; + private FunctionCatalog asFunctionCatalog; - public TestFunctionCatalog() { + @BeforeEach + public void before() { +super.before(); this.asFunctionCatalog = castToFunctionCatalog(catalogName); +createDefaultNamespace(); } - @Before public void createDefaultNamespace() { sql("CREATE NAMESPACE IF NOT EXISTS %s", catalogName + ".default"); } - @After + @AfterEach public void dropDefaultNamespace() { sql("DROP NAMESPACE IF EXISTS %s", catalogName + ".default"); } - @Test + @TestTemplate public void testListFunctionsViaCatalog() throws NoSuchNamespaceException { -Assertions.assertThat(asFunctionCatalog.listFunctions(EMPTY_NAMESPACE)) Review Comment: these changes are adding lots of noise. Using static imports in new test files is preferrable, but we typically don't want to add too much changes to new PRs by adding static imports. It's ok to keep these now, but just a reminder for any future PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]
nastra commented on code in PR #9417: URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442872510 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestFunctionCatalog.java: ## @@ -27,107 +30,106 @@ import org.apache.spark.sql.connector.catalog.FunctionCatalog; import org.apache.spark.sql.connector.catalog.Identifier; import org.apache.spark.sql.connector.catalog.functions.UnboundFunction; -import org.assertj.core.api.Assertions; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.TestTemplate; -public class TestFunctionCatalog extends SparkTestBaseWithCatalog { +public class TestFunctionCatalog extends TestBaseWithCatalog { private static final String[] EMPTY_NAMESPACE = new String[] {}; private static final String[] SYSTEM_NAMESPACE = new String[] {"system"}; private static final String[] DEFAULT_NAMESPACE = new String[] {"default"}; private static final String[] DB_NAMESPACE = new String[] {"db"}; - private final FunctionCatalog asFunctionCatalog; + private FunctionCatalog asFunctionCatalog; - public TestFunctionCatalog() { + @BeforeEach + public void before() { +super.before(); this.asFunctionCatalog = castToFunctionCatalog(catalogName); +createDefaultNamespace(); } - @Before public void createDefaultNamespace() { Review Comment: no need to have this separate method now. Can you just move `sql("CREATE NAMESPACE IF NOT EXISTS %s", catalogName + ".default");` to `before()`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]
nastra commented on PR #9417: URL: https://github.com/apache/iceberg/pull/9417#issuecomment-1878660248 @chinmay-bhat this is almost ready to go, just a few minor things to address -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Migrate tests in SQL directory to JUnit5 [iceberg]
nastra commented on code in PR #9401: URL: https://github.com/apache/iceberg/pull/9401#discussion_r1442885901 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/PartitionedWritesTestBase.java: ## @@ -18,53 +18,49 @@ */ package org.apache.iceberg.spark.sql; +import static org.assertj.core.api.Assertions.assertThat; + import java.util.Arrays; import java.util.List; -import java.util.Map; import org.apache.iceberg.Table; import org.apache.iceberg.expressions.Expressions; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.spark.SparkCatalogTestBase; +import org.apache.iceberg.spark.CatalogTestBase; import org.apache.iceberg.spark.source.SimpleRecord; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.catalyst.analysis.NoSuchTableException; import org.apache.spark.sql.functions; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; - -public abstract class PartitionedWritesTestBase extends SparkCatalogTestBase { - public PartitionedWritesTestBase( - String catalogName, String implementation, Map config) { -super(catalogName, implementation, config); - } +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.TestTemplate; + +public abstract class /**/ PartitionedWritesTestBase extends CatalogTestBase { Review Comment: ```suggestion public abstract class PartitionedWritesTestBase extends CatalogTestBase { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Set log level to WARN for rewrite task failure with partial progress [iceberg]
manuzhang commented on code in PR #9400: URL: https://github.com/apache/iceberg/pull/9400#discussion_r1442954829 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -345,7 +345,7 @@ private Result doExecuteWithPartialProgress( .noRetry() .onFailure( (fileGroup, exception) -> { - LOG.error("Failure during rewrite group {}", fileGroup.info(), exception); + LOG.warn("Failure during rewrite group {}", fileGroup.info(), exception); Review Comment: I agree ERROR is better for non-partial case. For partial progress, since failure is allowed, WARN sends the signal that failure is not fatal to the whole procedure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Fix parquet default compression codec [iceberg-docs]
manuzhang opened a new pull request, #305: URL: https://github.com/apache/iceberg-docs/pull/305 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]
chinmay-bhat commented on PR #9417: URL: https://github.com/apache/iceberg/pull/9417#issuecomment-1878814101 Thanks for the review, I've updated the PR with the suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to assign splits due to the serialized split size [iceberg]
pvary commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1878817631 @javrasya: This is table should not be too wide, and the statistics should be limited as well (unless you did some specific tweaking there). My best guess is your first suggestion: > reduce the size of FileScanTasks per split to be one How did you try to archive this? The best way to archive this is to set `splitOpenFileCost` to the same value as the split size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Migrate tests in SQL directory to JUnit5 [iceberg]
chinmay-bhat commented on code in PR #9401: URL: https://github.com/apache/iceberg/pull/9401#discussion_r1442989187 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/PartitionedWritesTestBase.java: ## @@ -18,53 +18,49 @@ */ package org.apache.iceberg.spark.sql; +import static org.assertj.core.api.Assertions.assertThat; + import java.util.Arrays; import java.util.List; -import java.util.Map; import org.apache.iceberg.Table; import org.apache.iceberg.expressions.Expressions; import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; -import org.apache.iceberg.spark.SparkCatalogTestBase; +import org.apache.iceberg.spark.CatalogTestBase; import org.apache.iceberg.spark.source.SimpleRecord; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.catalyst.analysis.NoSuchTableException; import org.apache.spark.sql.functions; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; - -public abstract class PartitionedWritesTestBase extends SparkCatalogTestBase { - public PartitionedWritesTestBase( - String catalogName, String implementation, Map config) { -super(catalogName, implementation, config); - } +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.TestTemplate; + +public abstract class /**/ PartitionedWritesTestBase extends CatalogTestBase { Review Comment: my bad, fixed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink 1.17: Create JUnit5 version of TestFlinkScan [iceberg]
cgpoh commented on PR #9185: URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1878844200 > @cgpoh can you please port these changes to other Flink versions? Sure. Will try to port these changes next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]
nastra commented on code in PR #8988: URL: https://github.com/apache/iceberg/pull/8988#discussion_r1443018102 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java: ## @@ -2181,20 +2181,28 @@ public void testTableWithInt96Timestamp() throws IOException { stagingLocation); // validate we get the expected results back -List expected = spark.table("parquet_table").select("tmp_col").collectAsList(); -List actual = -spark -.read() -.format("iceberg") -.load(loadLocation(tableIdentifier)) -.select("tmp_col") -.collectAsList(); -assertThat(actual).as("Rows must match").containsExactlyInAnyOrderElementsOf(expected); +testWithFilter("tmp_col < to_timestamp('2000-01-31 08:30:00')", tableIdentifier); Review Comment: these tests are all passing without the fix. Can you please add a test that reproduces the issue from https://github.com/apache/iceberg/issues/8990? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]
nastra commented on code in PR #9217: URL: https://github.com/apache/iceberg/pull/9217#discussion_r1443020124 ## core/src/test/java/org/apache/iceberg/TestCreateSnapshotEvent.java: ## @@ -40,76 +41,76 @@ public TestCreateSnapshotEvent(int formatVersion) { Listeners.register(new MyListener(), CreateSnapshotEvent.class); } - @Test + @TestTemplate public void testAppendCommitEvent() { -Assert.assertEquals("Table should start empty", 0, listManifestFiles().size()); +Assertions.assertThat(listManifestFiles()).as("Table should start empty").isEmpty(); Review Comment: can you statically import `assertThat()` please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]
manuzhang commented on code in PR #8988: URL: https://github.com/apache/iceberg/pull/8988#discussion_r1443027055 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java: ## @@ -2181,20 +2181,28 @@ public void testTableWithInt96Timestamp() throws IOException { stagingLocation); // validate we get the expected results back -List expected = spark.table("parquet_table").select("tmp_col").collectAsList(); -List actual = -spark -.read() -.format("iceberg") -.load(loadLocation(tableIdentifier)) -.select("tmp_col") -.collectAsList(); -assertThat(actual).as("Rows must match").containsExactlyInAnyOrderElementsOf(expected); +testWithFilter("tmp_col < to_timestamp('2000-01-31 08:30:00')", tableIdentifier); Review Comment: Filter expression was no longer applied after last rebase. It's fixed now. Please try again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]
nastra commented on code in PR #9217: URL: https://github.com/apache/iceberg/pull/9217#discussion_r1443033477 ## core/src/test/java/org/apache/iceberg/TestCreateSnapshotEvent.java: ## @@ -40,76 +41,76 @@ public TestCreateSnapshotEvent(int formatVersion) { Listeners.register(new MyListener(), CreateSnapshotEvent.class); } - @Test + @TestTemplate public void testAppendCommitEvent() { -Assert.assertEquals("Table should start empty", 0, listManifestFiles().size()); +Assertions.assertThat(listManifestFiles()).as("Table should start empty").isEmpty(); table.newAppend().appendFile(FILE_A).commit(); -Assert.assertNotNull(currentEvent); -Assert.assertEquals( -"Added records in the table should be 1", "1", currentEvent.summary().get("added-records")); -Assert.assertEquals( -"Added files in the table should be 1", -"1", -currentEvent.summary().get("added-data-files")); -Assert.assertEquals( -"Total records in the table should be 1", "1", currentEvent.summary().get("total-records")); -Assert.assertEquals( -"Total data files in the table should be 1", -"1", -currentEvent.summary().get("total-data-files")); +Assertions.assertThat(currentEvent).isNotNull(); +Assertions.assertThat(currentEvent.summary().get("added-records")) Review Comment: can be simplified to ``` assertThat(currentEvent.summary()) .containsEntry("added-records", "1") .containsEntry("added-data-files", "1") .containsEntry("total-records", "1") .containsEntry("total-data-files", "1"); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Adding Junit 5 conversion and AssertJ style for TestFlinkCatalogTable… [iceberg]
vinitpatni commented on PR #9381: URL: https://github.com/apache/iceberg/pull/9381#issuecomment-1878891716 I have done changes in TestStreamScanSql locally but running testcases on my local machine is taking lot of time. It is running indefinitely for this class. @nastra Do you have any idea ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]
nastra merged PR #9417: URL: https://github.com/apache/iceberg/pull/9417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.4: Add support for reading Iceberg views [iceberg]
amogh-jahagirdar commented on PR #9422: URL: https://github.com/apache/iceberg/pull/9422#issuecomment-1878930402 Since this was a clean backport. I'll go ahead and merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.4: Add support for reading Iceberg views [iceberg]
amogh-jahagirdar merged PR #9422: URL: https://github.com/apache/iceberg/pull/9422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]
manuzhang commented on code in PR #8988: URL: https://github.com/apache/iceberg/pull/8988#discussion_r1443085007 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java: ## @@ -2181,20 +2181,28 @@ public void testTableWithInt96Timestamp() throws IOException { stagingLocation); // validate we get the expected results back -List expected = spark.table("parquet_table").select("tmp_col").collectAsList(); -List actual = -spark -.read() -.format("iceberg") -.load(loadLocation(tableIdentifier)) -.select("tmp_col") -.collectAsList(); -assertThat(actual).as("Rows must match").containsExactlyInAnyOrderElementsOf(expected); +testWithFilter("tmp_col < to_timestamp('2000-01-31 08:30:00')", tableIdentifier); Review Comment: Test failed after last applying the filter expression. Still working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]
manuzhang commented on code in PR #8988: URL: https://github.com/apache/iceberg/pull/8988#discussion_r1443085007 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java: ## @@ -2181,20 +2181,28 @@ public void testTableWithInt96Timestamp() throws IOException { stagingLocation); // validate we get the expected results back -List expected = spark.table("parquet_table").select("tmp_col").collectAsList(); -List actual = -spark -.read() -.format("iceberg") -.load(loadLocation(tableIdentifier)) -.select("tmp_col") -.collectAsList(); -assertThat(actual).as("Rows must match").containsExactlyInAnyOrderElementsOf(expected); +testWithFilter("tmp_col < to_timestamp('2000-01-31 08:30:00')", tableIdentifier); Review Comment: Test failed after applying the filter expression. Still working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]
nastra commented on code in PR #9217: URL: https://github.com/apache/iceberg/pull/9217#discussion_r1443105597 ## core/src/test/java/org/apache/iceberg/TestManifestReader.java: ## @@ -32,17 +32,15 @@ import org.apache.iceberg.types.Types; import org.assertj.core.api.Assertions; import org.assertj.core.api.recursive.comparison.RecursiveComparisonConfiguration; -import org.junit.Assert; -import org.junit.Assume; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestManifestReader extends TableTestBase { - @Parameterized.Parameters(name = "formatVersion = {0}") - public static Object[] parameters() { -return new Object[] {1, 2}; +import org.junit.jupiter.api.Assumptions; +import org.junit.jupiter.api.TestTemplate; +import org.junit.jupiter.api.extension.ExtendWith; + +@ExtendWith(ParameterizedTestExtension.class) Review Comment: this should be part of the base class. Also the base + subclass shouldn't have a constructor anymore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Core: Add JUnit5 version of TableTestBase [iceberg]
nastra opened a new pull request, #9424: URL: https://github.com/apache/iceberg/pull/9424 This fixes https://github.com/apache/iceberg/issues/9073 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]
nastra commented on code in PR #9217: URL: https://github.com/apache/iceberg/pull/9217#discussion_r1443109201 ## core/src/test/java/org/apache/iceberg/TestManifestReader.java: ## @@ -32,17 +32,15 @@ import org.apache.iceberg.types.Types; import org.assertj.core.api.Assertions; import org.assertj.core.api.recursive.comparison.RecursiveComparisonConfiguration; -import org.junit.Assert; -import org.junit.Assume; -import org.junit.Test; -import org.junit.runner.RunWith; -import org.junit.runners.Parameterized; - -@RunWith(Parameterized.class) -public class TestManifestReader extends TableTestBase { - @Parameterized.Parameters(name = "formatVersion = {0}") - public static Object[] parameters() { -return new Object[] {1, 2}; +import org.junit.jupiter.api.Assumptions; +import org.junit.jupiter.api.TestTemplate; +import org.junit.jupiter.api.extension.ExtendWith; + +@ExtendWith(ParameterizedTestExtension.class) Review Comment: @lisirrx I've opened https://github.com/apache/iceberg/pull/9424 to fix the parameterization as that should live in the base class. If you're ok with the changes I can push them to your branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
rdblue commented on PR #9414: URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1878993066 @amogh-jahagirdar, I agree with not adding a `with` method to modify a schema. If we can avoid making that kind of API addition I would prefer it. I'm also not sure that we want to modify existing views by updating the schema. That's a lot of validation and complexity. The schema should be produced by fairly standard SQL syntax: ```sql CREATE OR REPLACE VIEW default.event_agg ( event_count COMMENT 'Count of events', event_date) COMMENT 'Daily event counts' AS SELECT COUNT(1), CAST(event_ts AS DATE) FROM prod.default.events GROUP BY 2 ``` If we allow making schema -only changes, then would we also allow modifying the view by -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
rdblue closed pull request #9414: API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs URL: https://github.com/apache/iceberg/pull/9414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Watermark read options [iceberg]
stevenzwu commented on code in PR #9346: URL: https://github.com/apache/iceberg/pull/9346#discussion_r1443084956 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/FlinkReadOptions.java: ## @@ -109,4 +110,13 @@ private FlinkReadOptions() {} public static final String MAX_ALLOWED_PLANNING_FAILURES = "max-allowed-planning-failures"; public static final ConfigOption MAX_ALLOWED_PLANNING_FAILURES_OPTION = ConfigOptions.key(PREFIX + MAX_ALLOWED_PLANNING_FAILURES).intType().defaultValue(3); + + public static final String WATERMARK_COLUMN = "watermark-column"; + public static final ConfigOption WATERMARK_COLUMN_OPTION = + ConfigOptions.key(PREFIX + WATERMARK_COLUMN).stringType().noDefaultValue(); + public static final String WATERMARK_TIME_COLUMN_UNIT = "watermark-column-time-unit"; Review Comment: nit: it seems the style is to have an empty line btw a config ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java: ## @@ -482,24 +478,25 @@ public IcebergSource build() { } contextBuilder.resolveConfig(table, readOptions, flinkConfig); - Schema icebergSchema = table.schema(); if (projectedFlinkSchema != null) { contextBuilder.project(FlinkSchemaUtil.convert(icebergSchema, projectedFlinkSchema)); } SerializableRecordEmitter emitter = SerializableRecordEmitter.defaultEmitter(); + FlinkReadConf flinkReadConf = new FlinkReadConf(table, readOptions, flinkConfig); + String watermarkColumn = flinkReadConf.watermarkColumn(); + TimeUnit watermarkTimeUnit = flinkReadConf.watermarkTimeUnit(); + if (watermarkColumn != null) { // Column statistics is needed for watermark generation contextBuilder.includeColumnStats(Sets.newHashSet(watermarkColumn)); - Review Comment: nit: revert unnecessary empty line change ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBoundedSql.java: ## @@ -29,11 +31,17 @@ import org.apache.iceberg.Schema; import org.apache.iceberg.expressions.Expression; import org.apache.iceberg.flink.FlinkConfigOptions; +import org.apache.iceberg.types.Types; import org.junit.Before; public class TestIcebergSourceBoundedSql extends TestIcebergSourceBounded { private volatile TableEnvironment tEnv; + private static final Schema SCHEMA_TS = Review Comment: all changes in this file should be reverted, right? ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceSql.java: ## @@ -39,4 +62,76 @@ public void before() throws IOException { .getConfiguration() .set(TableConfigOptions.TABLE_DYNAMIC_TABLE_OPTIONS_ENABLED, true); } + + private Record generateRecord(Instant t1) { +Record record = GenericRecord.create(SCHEMA_TS); +record.setField("t1", t1.atZone(ZoneId.systemDefault()).toLocalDateTime()); +record.setField("t2", t1.getEpochSecond()); +return record; + } + + private List testWatermarkOptionsInternal(boolean ascending) throws Exception { Review Comment: this is not actually `test`. it is just preparing the table by appending records/files to the table. maybe also add a Javadoc to explain the return list is the expected order of results ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java: ## @@ -482,24 +478,25 @@ public IcebergSource build() { } contextBuilder.resolveConfig(table, readOptions, flinkConfig); - Schema icebergSchema = table.schema(); if (projectedFlinkSchema != null) { contextBuilder.project(FlinkSchemaUtil.convert(icebergSchema, projectedFlinkSchema)); } SerializableRecordEmitter emitter = SerializableRecordEmitter.defaultEmitter(); + FlinkReadConf flinkReadConf = new FlinkReadConf(table, readOptions, flinkConfig); + String watermarkColumn = flinkReadConf.watermarkColumn(); + TimeUnit watermarkTimeUnit = flinkReadConf.watermarkTimeUnit(); + if (watermarkColumn != null) { // Column statistics is needed for watermark generation contextBuilder.includeColumnStats(Sets.newHashSet(watermarkColumn)); - SplitWatermarkExtractor watermarkExtractor = new ColumnStatsWatermarkExtractor(icebergSchema, watermarkColumn, watermarkTimeUnit); emitter = SerializableRecordEmitter.emitterWithWatermark(watermarkExtractor); splitAssignerFactory = new OrderedSplitAssignerFactory(SplitComparators.watermark(watermarkExtractor)); } - Review Comment: nit: unnecessary empty line change. Iceberg style has an empty line after a control block ends with `}` ## flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/TestHelpers.java: ## @@ -126,7 +126,7 @@ public static List convertRowDataToRow(List rowDataList, RowType r .col
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
rdblue commented on PR #9414: URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1879017514 Oops, I didn't intend to close this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Move to ValueReader generation to a visitor [iceberg]
amogh-jahagirdar commented on code in PR #9063: URL: https://github.com/apache/iceberg/pull/9063#discussion_r1443148176 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -108,6 +110,113 @@ public ParquetValueReader struct( } } + private class LogicalTypeAnnotationParquetValueReaderVisitor + implements LogicalTypeAnnotation.LogicalTypeAnnotationVisitor> { + +private final ColumnDescriptor desc; +private final org.apache.iceberg.types.Type.PrimitiveType expected; +private final PrimitiveType primitive; + +LogicalTypeAnnotationParquetValueReaderVisitor( +ColumnDescriptor desc, +org.apache.iceberg.types.Type.PrimitiveType expected, +PrimitiveType primitive) { + this.desc = desc; + this.expected = expected; + this.primitive = primitive; +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.StringLogicalTypeAnnotation stringLogicalType) { + return Optional.of(new ParquetValueReaders.StringReader(desc)); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumLogicalType) { + return Optional.of(new ParquetValueReaders.StringReader(desc)); +} + +@Override +public Optional> visit(DecimalLogicalTypeAnnotation decimalLogicalType) { + switch (primitive.getPrimitiveTypeName()) { +case BINARY: +case FIXED_LEN_BYTE_ARRAY: + return Optional.of( + new ParquetValueReaders.BinaryAsDecimalReader(desc, decimalLogicalType.getScale())); +case INT64: + return Optional.of( + new ParquetValueReaders.LongAsDecimalReader(desc, decimalLogicalType.getScale())); +case INT32: + return Optional.of( + new ParquetValueReaders.IntegerAsDecimalReader(desc, decimalLogicalType.getScale())); +default: + throw new UnsupportedOperationException( + "Unsupported base type for decimal: " + primitive.getPrimitiveTypeName()); + } +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.DateLogicalTypeAnnotation dateLogicalType) { + return Optional.of(new DateReader(desc)); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.TimeLogicalTypeAnnotation timeLogicalType) { + if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { +return Optional.of(new TimeReader(desc)); + } else if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MILLIS) { +return Optional.of(new TimeMillisReader(desc)); + } + + return Optional.empty(); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.TimestampLogicalTypeAnnotation timestampLogicalType) { + if (timestampLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { +Types.TimestampType tsMicrosType = (Types.TimestampType) expected; +return tsMicrosType.shouldAdjustToUTC() +? Optional.of(new TimestamptzReader(desc)) +: Optional.of(new TimestampReader(desc)); + } else if (timestampLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { Review Comment: I think this should be == LogicalTypeAnnotation.TimeUnit.MILLIS? ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -108,6 +110,113 @@ public ParquetValueReader struct( } } + private class LogicalTypeAnnotationParquetValueReaderVisitor + implements LogicalTypeAnnotation.LogicalTypeAnnotationVisitor> { + +private final ColumnDescriptor desc; +private final org.apache.iceberg.types.Type.PrimitiveType expected; +private final PrimitiveType primitive; + +LogicalTypeAnnotationParquetValueReaderVisitor( +ColumnDescriptor desc, +org.apache.iceberg.types.Type.PrimitiveType expected, +PrimitiveType primitive) { + this.desc = desc; + this.expected = expected; + this.primitive = primitive; +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.StringLogicalTypeAnnotation stringLogicalType) { + return Optional.of(new ParquetValueReaders.StringReader(desc)); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumLogicalType) { + return Optional.of(new ParquetValueReaders.StringReader(desc)); +} + +@Override +public Optional> visit(DecimalLogicalTypeAnnotation decimalLogicalType) { + switch (primitive.getPrimitiveTypeName()) { +case BINARY: +case FIXED_LEN_BYTE_ARRAY: + return Optional.of( + new ParquetValueReaders.BinaryAsDecimalReader(desc, decimalLogicalType.getScale())); +case INT64: + return Optional.of( + new ParquetValueReaders.LongAsDecimalReader(desc, decimalLogicalT
Re: [PR] Spark: Support dropping views [iceberg]
rdblue commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443152002 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala: ## @@ -122,7 +128,27 @@ class IcebergSparkSqlExtensionsParser(delegate: ParserInterface) extends ParserI if (isIcebergCommand(sqlTextAfterSubstitution)) { parse(sqlTextAfterSubstitution) { parser => astBuilder.visit(parser.singleStatement()) }.asInstanceOf[LogicalPlan] } else { - delegate.parsePlan(sqlText) + ViewSubstitutionExecutor.execute(delegate.parsePlan(sqlText)) +} + } + + private object ViewSubstitutionExecutor extends RuleExecutor[LogicalPlan] { +private val fixedPoint = FixedPoint( + maxIterations, + errorOnExceed = true, + maxIterationsSetting = SQLConf.ANALYZER_MAX_ITERATIONS.key) + +override protected def batches: Seq[Batch] = Seq(Batch("pre-substitution", fixedPoint, V2ViewSubstitution)) Review Comment: Do we want to call this pre-substitution still? I originally used that because I thought we wanted substitution rules in it. But it turns out that we don't need this for view substitution, only for command hijacking. Maybe a "Hijack Commands" batch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support dropping views [iceberg]
rdblue commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443153917 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala: ## @@ -122,7 +128,27 @@ class IcebergSparkSqlExtensionsParser(delegate: ParserInterface) extends ParserI if (isIcebergCommand(sqlTextAfterSubstitution)) { parse(sqlTextAfterSubstitution) { parser => astBuilder.visit(parser.singleStatement()) }.asInstanceOf[LogicalPlan] } else { - delegate.parsePlan(sqlText) + ViewSubstitutionExecutor.execute(delegate.parsePlan(sqlText)) +} + } + + private object ViewSubstitutionExecutor extends RuleExecutor[LogicalPlan] { +private val fixedPoint = FixedPoint( + maxIterations, + errorOnExceed = true, + maxIterationsSetting = SQLConf.ANALYZER_MAX_ITERATIONS.key) + +override protected def batches: Seq[Batch] = Seq(Batch("pre-substitution", fixedPoint, V2ViewSubstitution)) Review Comment: And it just occurred to me that we may not need an executor at all if we don't need to run to a fixed point. Can we just apply a command hijacking rule by itself instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support dropping views [iceberg]
rdblue commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443156571 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DropV2ViewExec.scala: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.NoSuchViewException +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.connector.catalog.Identifier +import org.apache.spark.sql.connector.catalog.ViewCatalog + + +case class DropV2ViewExec( + catalog: ViewCatalog, + ident: Identifier, + ifExists: Boolean) extends LeafV2CommandExec { + + override lazy val output: Seq[Attribute] = Nil + + override protected def run(): Seq[InternalRow] = { +if (catalog.viewExists(ident)) { + catalog.dropView(ident) Review Comment: Doesn't this return a boolean? If so, we could avoid loading or checking existence and just call `dropView` directly. Then if that returns `false` and `ifExists` is false then we could throw `NoSuchViewException`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Move to ValueReader generation to a visitor [iceberg]
amogh-jahagirdar commented on code in PR #9063: URL: https://github.com/apache/iceberg/pull/9063#discussion_r1443148385 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -108,6 +110,113 @@ public ParquetValueReader struct( } } + private class LogicalTypeAnnotationParquetValueReaderVisitor + implements LogicalTypeAnnotation.LogicalTypeAnnotationVisitor> { + +private final ColumnDescriptor desc; +private final org.apache.iceberg.types.Type.PrimitiveType expected; +private final PrimitiveType primitive; + +LogicalTypeAnnotationParquetValueReaderVisitor( +ColumnDescriptor desc, +org.apache.iceberg.types.Type.PrimitiveType expected, +PrimitiveType primitive) { + this.desc = desc; + this.expected = expected; + this.primitive = primitive; +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.StringLogicalTypeAnnotation stringLogicalType) { + return Optional.of(new ParquetValueReaders.StringReader(desc)); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumLogicalType) { + return Optional.of(new ParquetValueReaders.StringReader(desc)); +} + +@Override +public Optional> visit(DecimalLogicalTypeAnnotation decimalLogicalType) { + switch (primitive.getPrimitiveTypeName()) { +case BINARY: +case FIXED_LEN_BYTE_ARRAY: + return Optional.of( + new ParquetValueReaders.BinaryAsDecimalReader(desc, decimalLogicalType.getScale())); +case INT64: + return Optional.of( + new ParquetValueReaders.LongAsDecimalReader(desc, decimalLogicalType.getScale())); +case INT32: + return Optional.of( + new ParquetValueReaders.IntegerAsDecimalReader(desc, decimalLogicalType.getScale())); +default: + throw new UnsupportedOperationException( + "Unsupported base type for decimal: " + primitive.getPrimitiveTypeName()); + } +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.DateLogicalTypeAnnotation dateLogicalType) { + return Optional.of(new DateReader(desc)); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.TimeLogicalTypeAnnotation timeLogicalType) { + if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { +return Optional.of(new TimeReader(desc)); + } else if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MILLIS) { +return Optional.of(new TimeMillisReader(desc)); + } + + return Optional.empty(); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.TimestampLogicalTypeAnnotation timestampLogicalType) { + if (timestampLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { +Types.TimestampType tsMicrosType = (Types.TimestampType) expected; +return tsMicrosType.shouldAdjustToUTC() +? Optional.of(new TimestamptzReader(desc)) +: Optional.of(new TimestampReader(desc)); + } else if (timestampLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { Review Comment: I think this should be == LogicalTypeAnnotation.TimeUnit.MILLIS? Not Micros -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Spark: Session level Iceberg table config defaults [iceberg]
kuldeepsinghchauhan commented on issue #2270: URL: https://github.com/apache/iceberg/issues/2270#issuecomment-1879091855 This is a very critical feature, In an actual production scenario where we are planning to rollout the iceberg format for operations and analytics user. we are currently stuck and cannot decide how to set up iceberg table level properties defaults in a central config file. I know we are talking about to set up this at spark session level however it would be best if iceberg can leverage properties from spark-deafults.conf -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support dropping views [iceberg]
rdblue commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443226670 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -635,6 +633,51 @@ private Catalog tableCatalog() { return Spark3Util.loadIcebergCatalog(spark, catalogName); } + @Test + public void dropView() { +String viewName = "viewToBeDropped"; +String sql = String.format("SELECT id FROM %s", tableName); + +ViewCatalog viewCatalog = viewCatalog(); + +TableIdentifier identifier = TableIdentifier.of(NAMESPACE, viewName); +viewCatalog +.buildView(identifier) +.withQuery("spark", sql) +.withDefaultNamespace(NAMESPACE) +.withDefaultCatalog(catalogName) +.withSchema(schema(sql)) +.create(); + +assertThat(viewCatalog.viewExists(identifier)).isTrue(); + +sql("DROP VIEW %s", viewName); +assertThat(viewCatalog.viewExists(identifier)).isFalse(); + } + + @Test + public void dropNonExistingView() { +assertThatThrownBy(() -> sql("DROP VIEW non_existing")) +.isInstanceOf(AnalysisException.class) +.hasMessageContaining("The view %s.%s cannot be found", NAMESPACE, "non_existing"); + +assertThatNoException().isThrownBy(() -> sql("DROP VIEW IF EXISTS non_existing")); Review Comment: Nit: I think that `IF EXISTS` is a separate option and probably deserves a separate test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support dropping views [iceberg]
rdblue commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443227383 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -635,6 +633,51 @@ private Catalog tableCatalog() { return Spark3Util.loadIcebergCatalog(spark, catalogName); } + @Test + public void dropView() { +String viewName = "viewToBeDropped"; +String sql = String.format("SELECT id FROM %s", tableName); + +ViewCatalog viewCatalog = viewCatalog(); + +TableIdentifier identifier = TableIdentifier.of(NAMESPACE, viewName); +viewCatalog +.buildView(identifier) +.withQuery("spark", sql) +.withDefaultNamespace(NAMESPACE) +.withDefaultCatalog(catalogName) +.withSchema(schema(sql)) +.create(); + +assertThat(viewCatalog.viewExists(identifier)).isTrue(); + +sql("DROP VIEW %s", viewName); +assertThat(viewCatalog.viewExists(identifier)).isFalse(); + } + + @Test + public void dropNonExistingView() { +assertThatThrownBy(() -> sql("DROP VIEW non_existing")) +.isInstanceOf(AnalysisException.class) +.hasMessageContaining("The view %s.%s cannot be found", NAMESPACE, "non_existing"); + +assertThatNoException().isThrownBy(() -> sql("DROP VIEW IF EXISTS non_existing")); + } + + @Test + public void dropGlobalTempView() { +String globalTempView = "globalViewToBeDropped"; Review Comment: Are these cases just to validate that we don't mess up normal view creation/deletion? If so, can you add a comment? Do we also want to have one for v1 views in a different catalog? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support dropping views [iceberg]
rdblue commented on code in PR #9421: URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443275368 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -45,14 +47,17 @@ case class ResolveViews(spark: SparkSession) extends Rule[LogicalPlan] with Look protected lazy val catalogManager: CatalogManager = spark.sessionState.catalogManager override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { -case u@UnresolvedRelation(nameParts, _, _) - if catalogManager.v1SessionCatalog.isTempView(nameParts) => +case u@UnresolvedRelation(nameParts, _, _) if isTempView(nameParts) => u case u@UnresolvedRelation(parts@CatalogAndIdentifier(catalog, ident), _, _) => loadView(catalog, ident) .map(createViewRelation(parts, _)) .getOrElse(u) + +case DropIcebergView(r@ResolvedIdentifier(catalog, ident), ifExists) + if isTempView(ident.asMultipartIdentifier) || !isViewCatalog(catalog) => + DropView(r, ifExists) Review Comment: Why do we need to convert back to a non-Iceberg plan? Is it that we always convert in the pre-substitution batch? If so, is it reasonable to detect whether it is an Iceberg `ViewCatalog` before converting rather than converting back? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to assign splits due to the serialized split size [iceberg]
javrasya commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1879135064 @pvary I wasn't aware of `splitOpenFileCost`, thank you for sharing that. The way how I did it is that I introduced my own SplitAssignerFactory and SplitAssigner and pass that down to the source. Here is the code to that custom Splitassigner; https://gist.github.com/javrasya/98cfe90bd1a2585c56c4c3346a518477 But the thing is that even though I manage to reduce the number of task per Split, it was still big enough to raise the same error. So it did not solve my problem. How do you limit the statistics, I didn't do anything apart from creating my table and ingesting the data into it. Also do you think a table with 14 columns is too wide? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix parquet default compression codec in 1.4.0 [iceberg-docs]
Fokko merged PR #301: URL: https://github.com/apache/iceberg-docs/pull/301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix parquet default compression codec in 1.4.1 [iceberg-docs]
Fokko merged PR #302: URL: https://github.com/apache/iceberg-docs/pull/302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix parquet default compression codec in 1.4.3 [iceberg-docs]
Fokko merged PR #304: URL: https://github.com/apache/iceberg-docs/pull/304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix parquet default compression codec in 1.4.2 [iceberg-docs]
Fokko merged PR #303: URL: https://github.com/apache/iceberg-docs/pull/303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix parquet default compression codec [iceberg-docs]
Fokko merged PR #305: URL: https://github.com/apache/iceberg-docs/pull/305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix PR link in 1.4.3 release notes [iceberg-docs]
Fokko merged PR #299: URL: https://github.com/apache/iceberg-docs/pull/299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] docs: Add link for iceberg rust [iceberg-docs]
Fokko merged PR #300: URL: https://github.com/apache/iceberg-docs/pull/300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Fix day partition transform result type [iceberg]
tdcmeehan commented on PR #9345: URL: https://github.com/apache/iceberg/pull/9345#issuecomment-1879158054 It seems the consensus is it's more straightforward to keep this as `int`, so I will close this. Thanks folks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API: Fix day partition transform result type [iceberg]
tdcmeehan closed pull request #9345: API: Fix day partition transform result type URL: https://github.com/apache/iceberg/pull/9345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Adding Snowflake's public documentation [iceberg-docs]
Fokko commented on PR #297: URL: https://github.com/apache/iceberg-docs/pull/297#issuecomment-1879159046 @scottteal sorry for the late reply here, could you fix the conflicts once more? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] How can we build self-defined iceberg-spark-runtime jar package? [iceberg]
gaoshihang opened a new issue, #9425: URL: https://github.com/apache/iceberg/issues/9425 ### Query engine Spark ### Question How can we build self-defined iceberg-spark-runtime jar package? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Adding Snowflake's public documentation [iceberg-docs]
scottteal commented on PR #297: URL: https://github.com/apache/iceberg-docs/pull/297#issuecomment-1879185914 @Fokko I think I've now resolved the conflicts. I tried to preserve the ordering to match the current website as much as possible, although it's still unclear to me what the ordering logic is. They aren't sorted alphabetically, or grouped in a logical manner that I can tell. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Move to ValueReader generation to a visitor [iceberg]
Fokko commented on code in PR #9063: URL: https://github.com/apache/iceberg/pull/9063#discussion_r1443316711 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -108,6 +110,113 @@ public ParquetValueReader struct( } } + private class LogicalTypeAnnotationParquetValueReaderVisitor + implements LogicalTypeAnnotation.LogicalTypeAnnotationVisitor> { + +private final ColumnDescriptor desc; +private final org.apache.iceberg.types.Type.PrimitiveType expected; +private final PrimitiveType primitive; + +LogicalTypeAnnotationParquetValueReaderVisitor( +ColumnDescriptor desc, +org.apache.iceberg.types.Type.PrimitiveType expected, +PrimitiveType primitive) { + this.desc = desc; + this.expected = expected; + this.primitive = primitive; +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.StringLogicalTypeAnnotation stringLogicalType) { + return Optional.of(new ParquetValueReaders.StringReader(desc)); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumLogicalType) { + return Optional.of(new ParquetValueReaders.StringReader(desc)); +} + +@Override +public Optional> visit(DecimalLogicalTypeAnnotation decimalLogicalType) { + switch (primitive.getPrimitiveTypeName()) { +case BINARY: +case FIXED_LEN_BYTE_ARRAY: + return Optional.of( + new ParquetValueReaders.BinaryAsDecimalReader(desc, decimalLogicalType.getScale())); +case INT64: + return Optional.of( + new ParquetValueReaders.LongAsDecimalReader(desc, decimalLogicalType.getScale())); +case INT32: + return Optional.of( + new ParquetValueReaders.IntegerAsDecimalReader(desc, decimalLogicalType.getScale())); +default: + throw new UnsupportedOperationException( + "Unsupported base type for decimal: " + primitive.getPrimitiveTypeName()); + } +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.DateLogicalTypeAnnotation dateLogicalType) { + return Optional.of(new DateReader(desc)); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.TimeLogicalTypeAnnotation timeLogicalType) { + if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { +return Optional.of(new TimeReader(desc)); + } else if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MILLIS) { +return Optional.of(new TimeMillisReader(desc)); + } + + return Optional.empty(); +} + +@Override +public Optional> visit( +LogicalTypeAnnotation.TimestampLogicalTypeAnnotation timestampLogicalType) { + if (timestampLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { +Types.TimestampType tsMicrosType = (Types.TimestampType) expected; +return tsMicrosType.shouldAdjustToUTC() +? Optional.of(new TimestamptzReader(desc)) +: Optional.of(new TimestampReader(desc)); + } else if (timestampLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { Review Comment: Oof, that's a good one! I would this part to be thoroughly tested as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Add the homepage to the `Cargo.toml` [iceberg-rust]
Fokko opened a new issue, #154: URL: https://github.com/apache/iceberg-rust/issues/154 Looks like we can add the homepage as well: https://doc.rust-lang.org/cargo/reference/manifest.html#the-homepage-field To our newly created https://rust.iceberg.apache.org/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Replace black by Ruff Formatter [iceberg-python]
Fokko merged PR #127: URL: https://github.com/apache/iceberg-python/pull/127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]
Fokko commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443340509 ## site/docs/community.md: ## @@ -40,13 +40,13 @@ Issues are tracked in GitHub: ## Slack -We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on Slack. To be invited, follow [this invite link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-1znkcg5zm-7_FE~pcox347XwZE3GNfPg). +We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on Slack. To be invited, follow [this invite link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-287g3akar-K9Oe_En5j1UL7Y_Ikpai3A). Review Comment: I checked and this is the latest one: https://github.com/apache/iceberg-docs/commit/87149e3da04eafad1c8766132f0bf1e822e4d075 ## .github/workflows/site-ci.yml: ## @@ -1,3 +1,4 @@ +# Review Comment: Looks like git sees this as a move 🤔 Not a real problem. ## site/nav.yml: ## @@ -0,0 +1,49 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +nav: + - Home: index.md + - Quickstart: +- Spark: spark-quickstart.md +- Hive: hive-quickstart.md + - Docs: +#- nightly: '!include docs/docs/nightly/mkdocs.yml' +- latest: '!include docs/docs/latest/mkdocs.yml' +- 1.4.2: '!include docs/docs/1.4.2/mkdocs.yml' Review Comment: I think 1.4.3 is missing here. ## site/docs/spec.md: ## @@ -1128,6 +1125,41 @@ Example ] } ] ``` +### Content File (Data and Delete) Serialization Review Comment: Not for this PR, but it would be good at some point to just include the spec from the repository itself. ## site/docs/roadmap.md: ## @@ -20,28 +20,37 @@ title: "Roadmap" # Roadmap Overview -This roadmap outlines projects that the Iceberg community is working on, their priority, and a rough size estimate. -This is based on the latest [community priority discussion](https://lists.apache.org/thread.html/r84e80216c259c81f824c6971504c321cd8c785774c489d52d4fc123f%40%3Cdev.iceberg.apache.org%3E). +This roadmap outlines projects that the Iceberg community is working on. Each high-level item links to a Github project board that tracks the current status. Related design docs will be linked on the planning boards. -# Priority 1 - -* API: [Iceberg 1.0.0](https://github.com/apache/iceberg/projects/3) [medium] -* Python: [Pythonic refactor](https://github.com/apache/iceberg/projects/7) [medium] -* Spec: [Z-ordering / Space-filling curves](https://github.com/apache/iceberg/projects/16) [medium] -* Spec: [Snapshot tagging and branching](https://github.com/apache/iceberg/projects/4) [small] -* Views: [Spec](https://github.com/apache/iceberg/projects/6) [medium] -* Puffin: [Implement statistics information in table snapshot](https://github.com/apache/iceberg/pull/4741) [medium] -* Flink: [FLIP-27 based Iceberg source](https://github.com/apache/iceberg/projects/23) [large] - -# Priority 2 - -* ORC: [Support delete files stored as ORC](https://github.com/apache/iceberg/projects/13) [small] -* Spark: [DSv2 streaming improvements](https://github.com/apache/iceberg/projects/2) [small] -* Flink: [Inline file compaction](https://github.com/apache/iceberg/projects/14) [small] -* Flink: [Support UPSERT](https://github.com/apache/iceberg/projects/15) [small] -* Spec: [Secondary indexes](https://github.com/apache/iceberg/projects/17) [large] -* Spec v3: [Encryption](https://github.com/apache/iceberg/projects/5) [large] -* Spec v3: [Relative paths](https://github.com/apache/iceberg/projects/18) [large] -* Spec v3: [Default field values](https://github.com/apache/iceberg/projects/19) [medium] +# General + +* [Multi-table transaction support](https://github.com/apache/iceberg/projects/30) +* [Views Support](https://github.com/apache/iceberg/projects/29) +* [Change Data Capture (CDC) Support](https://github.com/apache/iceberg/projects/26) +* [Snapshot tagging and branching](https://github.com/apache/iceberg/projects/4) +* [Inline file compaction](https://github.com/apache/iceberg/projects/14) +* [Delete File compaction](https://github.com/apache/iceberg/projects/10) +* [Z-ordering / Space-filling curves](https://github.
Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]
bitsondatadev commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443356487 ## site/docs/community.md: ## @@ -40,13 +40,13 @@ Issues are tracked in GitHub: ## Slack -We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on Slack. To be invited, follow [this invite link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-1znkcg5zm-7_FE~pcox347XwZE3GNfPg). +We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on Slack. To be invited, follow [this invite link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-287g3akar-K9Oe_En5j1UL7Y_Ikpai3A). Review Comment: Oh yeah, I still need to figure out how to embed jitsi stuff there, another day. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]
amogh-jahagirdar commented on PR #9414: URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1879243149 > That line of reasoning makes me think that any change (even comments) should just produce a new version of the view. Yes, that's the approach in this PR currently. A new version is produced via ReplaceViewVersion (view.updateColumnDoc()) returns ReplaceViewVersion. In my next iteration, I'm going to just put the logic in the existing Replace API. That fits the operation pattern that exists throughout Iceberg better. I think creating a new version is required, since we are producing a new schema and there *must* be a new version to point to the new schema. The issue is that currently the only API that exists for producing a new version is `replace` which requires users to pass in query/dialect pair(s), default namespace, schema, etc. The case for a new API is so that engine integrations don't need to determine all the queries/dialects/default namespace/catalog when the replace is performed for just column doc updates. More concretely, take the Trino case, which allows the following syntax:https://trino.io/docs/current/sql/comment.html ``` COMMENT ON view.col IS 'some col' ``` Currently, just using the replace API we'd have to implement creating the new Schema with the updated doc (the code in Schema#withUpdatedColumnDoc) and then we'd also want to preserve all the queries/dialects, default namespace, and default catalog so the integration has to pass the current state of all of those to the replace API. ``` public void updateColumn(String col, String doc) { View view = loadView(...) Schema updatedSchema = newSchemaWithColumnDoc(view.schema(), col, doc); view.replaceVersion() .withSchema(updatedSchema) .withQuery(allCurrentDialects, allTheirQueries) .withDefaultCatalog(view.currentVersion().defaultCatalog()) .withNamespace(view.currentVersion().namespace()) .commit() } private Schema newSchemaWithColumnDoc(Schema schema, String col, String doc) { NestedField fieldToUpdate = findField(name); Preconditions.checkArgument(fieldToUpdate != null, "Field %s does not exist", name); NestedField updatedField = NestedField.of( fieldToUpdate.fieldId(), fieldToUpdate.isOptional(), fieldToUpdate.name(), fieldToUpdate.type(), doc); List newFields = Lists.newArrayList(); newFields.add(updatedField); for (NestedField field : columns()) { if (field.fieldId() != updatedField.fieldId()) { newFields.add(field); } } return new Schema(newFields, getAliases(), identifierFieldIds()); } ``` So the integration has to pass in a bunch of mostly irrelevant details into the replace just for updating the docs and if an engine integration misses a dialect to include in the replace, then updating the column doc has removed the other dialect unintentionally for the new version. Having an API in the library which takes care of these details would avoid any of these weird issues. With the new API the integration should look something like: ``` public void updateColumnDoc(String col, String doc) { View view = loadView(...) view.replaceVersion() .withColumnDoc(col, doc) .commit() } ``` >If we allow making schema changes, then would we also allow modifying the view by widening view columns? I would advocate against this. These kinds of schema evolution capabilities on views don't seem that useful to me at the cost of a lot of complexity. I think there's a semantic distinction in those kinds of capabilities and adding docs. That's why I don't think it makes sense to have UpdateViewSchema implementation for views and discarded it for the current approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]
bitsondatadev commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443359408 ## site/nav.yml: ## @@ -0,0 +1,49 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +nav: + - Home: index.md + - Quickstart: +- Spark: spark-quickstart.md +- Hive: hive-quickstart.md + - Docs: +#- nightly: '!include docs/docs/nightly/mkdocs.yml' +- latest: '!include docs/docs/latest/mkdocs.yml' +- 1.4.2: '!include docs/docs/1.4.2/mkdocs.yml' Review Comment: I will update 1.4.3 on another PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]
bitsondatadev commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443361277 ## site/docs/community.md: ## @@ -40,13 +40,13 @@ Issues are tracked in GitHub: ## Slack -We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on Slack. To be invited, follow [this invite link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-1znkcg5zm-7_FE~pcox347XwZE3GNfPg). +We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on Slack. To be invited, follow [this invite link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-287g3akar-K9Oe_En5j1UL7Y_Ikpai3A). Review Comment: wait yeah this is the latest one haha and that's what's currently in there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] How can we build self-defined iceberg-spark-runtime jar package? [iceberg]
gaoshihang closed issue #9425: How can we build self-defined iceberg-spark-runtime jar package? URL: https://github.com/apache/iceberg/issues/9425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org