date:20240105

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442596925


##
website/src/reference/setup_gpg.md:
##
@@ -0,0 +1,161 @@
+
+
+# Setup GPG key
+
+> This section is a brief from the [Cryptography with 
OpenPGP](https://infra.apache.org/openpgp.html) guideline.
+
+
+## Install GPG
+
+For more details, please refer to [GPG official 
website](https://www.gnupg.org/download/index.html). Here shows one approach to 
install GPG with `apt`:
+
+```shell
+sudo apt install gnupg2
+```
+
+## Generate GPG Key
+
+Attentions:
+
+- Name is best to keep consistent with your full name of Apache ID;
+- Email should be the Apache email;
+- Name is best to only use English to avoid garbled.
+
+Run `gpg --full-gen-key` and complete the generation interactively:
+
+```shell
+gpg (GnuPG) 2.2.20; Copyright (C) 2020 Free Software Foundation, Inc.
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+
+Please select what kind of key you want:
+   (1) RSA and RSA (default)
+   (2) DSA and Elgamal
+   (3) DSA (sign only)
+   (4) RSA (sign only)
+  (14) Existing key from card
+Your selection? 1 # input 1
+RSA keys may be between 1024 and 4096 bits long.
+What keysize do you want? (2048) 4096 # input 4096
+Requested keysize is 4096 bits
+Please specify how long the key should be valid.
+ 0 = key does not expire
+= key expires in n days
+  w = key expires in n weeks
+  m = key expires in n months
+  y = key expires in n years
+Key is valid for? (0) 0 # input 0
+Key does not expire at all
+Is this correct? (y/N) y # input y
+
+GnuPG needs to construct a user ID to identify your key.
+
+Real name: Hulk Lin   # input your name
+Email address: h...@apache.org# input your email
+Comment:  # input some annotations, optional
+You selected this USER-ID:
+"Hulk "
+
+Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O # input O
+We need to generate a lot of random bytes. It is a good idea to perform
+some other action (type on the keyboard, move the mouse, utilize the
+disks) during the prime generation; this gives the random number
+generator a better chance to gain enough entropy.
+We need to generate a lot of random bytes. It is a good idea to perform
+some other action (type on the keyboard, move the mouse, utilize the
+disks) during the prime generation; this gives the random number
+generator a better chance to gain enough entropy.
+
+# Input the security key
+┌──┐
+│ Please enter this passphrase │
+│  │
+│ Passphrase: ___  │
+│  │
+│  │
+└──┘
+# key generation will be done after your inputting the key with the following 
output
+gpg: key E49B00F626B marked as ultimately trusted
+gpg: revocation certificate stored as 
'/Users/hulk/.gnupg/openpgp-revocs.d/F77B887A4F25A9468C513E9AA3008E49B00F626B.rev'
+public and secret key created and signed.
+
+pub   rsa4096 2022-07-12 [SC]
+  F77B887A4F25A9468C513E9AA3008E49B00F626B
+uid   [ultimate] hulk 
+sub   rsa4096 2022-07-12 [E]
+```
+
+## Upload your key to public GPG keyserver
+
+Firstly, list your key:
+
+```shell
+gpg --list-keys
+```
+
+The output is like:
+
+```shell
+---
+pub   rsa4096 2022-07-12 [SC]
+  F77B887A4F25A9468C513E9AA3008E49B00F626B
+uid   [ultimate] hulk 
+sub   rsa4096 2022-07-12 [E]
+```
+
+Then, send your key id to key server:
+
+```shell
+gpg --keyserver keys.openpgp.org --send-key  # e.g., 
F77B887A4F25A9468C513E9AA3008E49B00F626B
+```
+
+Among them, `keys.openpgp.org` is a randomly selected keyserver, you can use 
`keyserver.ubuntu.com` or any other full-featured keyserver.
+
+## Check whether the key is created successfully
+
+Uploading takes about one minute; after that, you can check by your email at 
the corresponding keyserver.
+
+Uploading keys to the keyserver is mainly for joining a [Web of 
Trust](https://infra.apache.org/release-signing.html#web-of-trust).
+
+## Add your GPG public key to the KEYS document
+
+:::info
+
+`SVN` is required for this step.
+
+:::
+
+The svn repository of the release branch is: 
https://dist.apache.org/repos/dist/release/incubator/opendal

Review Comment:
   I think we want to `sed 's/opendal/iceberg/g'` this section



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---

Re: [I] Flink import debezium cdc record(delete type) to iceberg(0.13.2+) got IndexOutOfBoundsException [iceberg]

2024-01-05 Thread via GitHub



zinking commented on issue #5043:
URL: https://github.com/apache/iceberg/issues/5043#issuecomment-1878282717

   incorrect data type mapping between mysql and iceberg could also lead to 
this. for example tinyint -> boolean


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442598483


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+
+SVN is

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442599426


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+
+SVN is

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442598860


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+
+SVN is

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442599810


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+
+SVN is

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442600155


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+
+SVN is

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442602143


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+
+SVN is

Re: [PR] Spark 3.5: Migrate tests that depend on SparkDistributedDataScanTestBase to JUnit5 [iceberg]

2024-01-05 Thread via GitHub



chinmay-bhat commented on PR #9416:
URL: https://github.com/apache/iceberg/pull/9416#issuecomment-1878298683

   Based on the test failures, older spark versions depend on `core` files like 
`DeleteFileIndexTestBase`, `ParameterizedTableTestBase`, 
`DataTableScanTestBase`,`ScanPlanningAndReportingTestBase`, which have been 
updated to JUnit5 in this PR. 
   
   A possible solution would be to migrate the affected files in older versions 
to JUnit5 as well. That would be a task bigger than this PRs scope. Another 
solution would be to create JUnit5 versions of the above `core` files, but I'm 
not keen on creating unnecessary duplicates.
   
   @nastra @Fokko 
   I faced a similar problem in #9382. Any advise would be great! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support renaming views [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9343:
URL: https://github.com/apache/iceberg/pull/9343#discussion_r1442615912


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala:
##
@@ -53,6 +57,17 @@ case class ResolveViews(spark: SparkSession) extends 
Rule[LogicalPlan] with Look
   loadView(catalog, ident)
 .map(createViewRelation(parts, _))
 .getOrElse(u)
+
+case u@UnresolvedTableOrView(CatalogAndIdentifier(catalog, ident), _, _) =>
+  loadView(catalog, ident)
+.map(_ => ResolvedV2View(catalog.asViewCatalog, ident))
+.getOrElse(u)
+
+case RenameTable(ResolvedV2View(oldCatalog, oldIdent), 
CatalogAndIdentifier(newCatalog, newIdent), true) =>

Review Comment:
   I added this to `isView@true` so that it's clearer. I checked the Spark code 
and the `to` identifier isn't resolved anywhere and just passed as-is when 
doing a rename



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] docs: Add release guide for iceberg-rust [iceberg-rust]

2024-01-05 Thread via GitHub



liurenjie1024 commented on code in PR #147:
URL: https://github.com/apache/iceberg-rust/pull/147#discussion_r1442622182


##
website/src/release.md:
##
@@ -0,0 +1,383 @@
+
+
+This document mainly introduces how the release manager releases a new version 
in accordance with the Apache requirements.
+
+## Introduction
+
+`Source Release` is the key point which Apache values, and is also necessary 
for an ASF release.
+
+Please remember that publishing software has legal consequences.
+
+This guide complements the foundation-wide policies and guides:
+
+- [Release Policy](https://www.apache.org/legal/release-policy.html)
+- [Release Distribution Policy](https://infra.apache.org/release-distribution)
+- [Release Creation Process](https://infra.apache.org/release-publishing.html)
+
+## Some Terminology of release
+
+In the context of our release, we use several terms to describe different 
stages of the release process.
+
+Here's an explanation of these terms:
+
+- `iceberg_version`: the version of Iceberg to be released, like `0.2.0`.
+- `release_version`: the version of release candidate, like `0.2.0-rc.1`.
+- `rc_version`: the minor version for voting round, like `rc.1`.
+
+## Preparation
+
+
+
+This section is the requirements for individuals who are new to the role of 
release manager.
+
+
+
+Refer to [Setup GPG Key](reference/setup_gpg.md) to make sure the GPG key has 
been set up.
+
+## Start a tracking issue about the next release
+
+Start a tracking issue on GitHub for the upcoming release to track all tasks 
that need to be completed.
+
+Title:
+
+```
+Tracking issues of Iceberg Rust ${iceberg_version} Release
+```
+
+Content:
+
+```markdown
+This issue is used to track tasks of the iceberg rust ${iceberg_version} 
release.
+
+## Tasks
+
+### Blockers
+
+> Blockers are the tasks that must be completed before the release.
+
+### Build Release
+
+ GitHub Side
+
+- [ ] Bump version in project
+- [ ] Update docs
+- [ ] Generate dependencies list
+- [ ] Push release candidate tag to GitHub
+
+ ASF Side
+
+- [ ] Create an ASF Release
+- [ ] Upload artifacts to the SVN dist repo
+
+### Voting
+
+- [ ] Start VOTE at iceberg community
+
+### Official Release
+
+- [ ] Push the release git tag
+- [ ] Publish artifacts to SVN RELEASE branch
+- [ ] Change Iceberg Rust Website download link
+- [ ] Send the announcement
+
+For details of each step, please refer to: 
https://rust.iceberg.apache.org/release
+```
+
+## GitHub Side
+
+### Bump version in project
+
+Bump all components' version in the project to the new iceberg version.
+Please note that this version is the exact version of the release, not the 
release candidate version.
+
+- rust core: bump version in `Cargo.toml`
+
+### Update docs
+
+- Update `CHANGELOG.md`, refer to [Generate Release 
Note](reference/generate_release_note.md) for more information.
+
+### Generate dependencies list
+
+Download and setup `cargo-deny`. You can refer to 
[cargo-deny](https://embarkstudios.github.io/cargo-deny/cli/index.html).
+
+Running `python3 ./scripts/dependencies.py generate` to update the 
dependencies list of every package.
+
+### Push release candidate tag
+
+After bump version PR gets merged, we can create a GitHub release for the 
release candidate:
+
+- Create a tag at `main` branch on the `Bump Version` / `Patch up version` 
commit: `git tag -s "v0.2.0-rc.1"`, please correctly check out the 
corresponding commit instead of directly tagging on the main branch.
+- Push tags to GitHub: `git push --tags`.
+
+## ASF Side
+
+If any step in the ASF Release process fails and requires code changes,
+we will abandon that version and prepare for the next one.
+Our release page will only display ASF releases instead of GitHub Releases.
+
+### Create an ASF Release
+
+After GitHub Release has been created, we can start to create ASF Release.
+
+- Checkout to released tag. (e.g. `git checkout v0.2.0-rc.1`, tag is created 
in the previous step)
+- Use the release script to create a new release: 
`ICEBERG_VERSION= ICEBERG_VERSION_RC= 
./scripts/release.sh`(e.g. `ICEBERG_VERSION=0.2.0 ICEBERG_VERSION_RC=rc.1 
./scripts/release.sh`)
+- This script will do the following things:
+- Create a new branch named by `release-${release_version}` from the 
tag
+- Generate the release candidate artifacts under `dist`, including:
+- `apache-iceberg-rust-${release_version}-src.tar.gz`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.asc`
+- `apache-iceberg-rust-${release_version}-src.tar.gz.sha512`
+- Check the header of the source code. This step needs docker to run.
+- Push the newly created branch to GitHub
+
+This script will create a new release under `dist`.
+
+For example:
+
+```shell
+> tree dist
+dist
+├── apache-iceberg-rust-0.2.0-src.tar.gz
+├── apache-iceberg-rust-0.2.0-src.tar.gz.asc
+└── apache-iceberg-rust-0.2.0-src.tar.gz.sha512
+```
+
+### Upload artifacts to the SVN dist repo
+

Re: [PR] Core: Remove deprecated method from BaseMetadataTable [iceberg]

2024-01-05 Thread via GitHub



ajantha-bhat commented on code in PR #9298:
URL: https://github.com/apache/iceberg/pull/9298#discussion_r1442638780


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseFileRewriteCoordinator.java:
##
@@ -72,18 +70,12 @@ public void clearRewrite(Table table, String fileSetId) {
 
   public Set fetchSetIds(Table table) {
 return resultMap.keySet().stream()
-.filter(e -> e.first().equals(tableUUID(table)))
+.filter(e -> e.first().equals(Spark3Util.tableUUID(table)))

Review Comment:
   Yeah, I renamed to `baseTableUUID` and I am not sure about the guidelines on 
static import. Some place we use it and some place we don't. So, I left it as 
it is. 



##
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java:
##
@@ -948,6 +950,17 @@ public static 
org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(
 return org.apache.spark.sql.catalyst.TableIdentifier.apply(table, 
database);
   }
 
+  static String tableUUID(org.apache.iceberg.Table table) {
+if (table instanceof HasTableOperations) {
+  TableOperations ops = ((HasTableOperations) table).operations();
+  return ops.current().uuid();
+} else if (table instanceof BaseMetadataTable) {
+  return ((BaseMetadataTable) table).table().operations().current().uuid();
+} else {
+  return null;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Watermark read options [iceberg]

2024-01-05 Thread via GitHub



pvary commented on code in PR #9346:
URL: https://github.com/apache/iceberg/pull/9346#discussion_r1442661021


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceSql.java:
##
@@ -39,4 +62,76 @@ public void before() throws IOException {
 .getConfiguration()
 .set(TableConfigOptions.TABLE_DYNAMIC_TABLE_OPTIONS_ENABLED, true);
   }
+
+  private Record generateRecord(Instant t1) {
+Record record = GenericRecord.create(SCHEMA_TS);
+record.setField("t1", t1.atZone(ZoneId.systemDefault()).toLocalDateTime());
+record.setField("t2", t1.getEpochSecond());
+return record;
+  }
+
+  private List testWatermarkOptionsInternal(boolean ascending) throws 
Exception {
+Table table = 
catalogResource.catalog().createTable(TestFixtures.TABLE_IDENTIFIER, SCHEMA_TS);
+long baseTime = 1702382109000L;
+GenericAppenderHelper helper =
+new GenericAppenderHelper(table, FileFormat.PARQUET, TEMPORARY_FOLDER);
+// File 1 - early timestamps
+Record early1 = generateRecord(Instant.ofEpochMilli(baseTime));
+Record early2 = generateRecord(Instant.ofEpochMilli(baseTime - 10 * 
1000L));
+List recordsDataFile1 = Lists.newArrayList();
+recordsDataFile1.add(early1);
+recordsDataFile1.add(early2);
+DataFile dataFile1 = helper.writeFile(recordsDataFile1);
+// File 2 - old timestamps
+Record late1 = generateRecord(Instant.ofEpochMilli(baseTime + 14 * 1000L));
+Record late2 = generateRecord(Instant.ofEpochMilli(baseTime + 12 * 1000L));
+List recordsDataFile2 = Lists.newArrayList();
+recordsDataFile2.add(late1);
+recordsDataFile2.add(late2);
+DataFile dataFile2 = helper.writeFile(recordsDataFile2);
+helper.appendToTable(dataFile1, dataFile2);
+List expected = Lists.newArrayList();
+if (ascending) {
+  expected.addAll(recordsDataFile1);
+  expected.addAll(recordsDataFile2);
+} else {
+  expected.addAll(recordsDataFile2);
+  expected.addAll(recordsDataFile1);
+}
+return expected;
+  }
+
+  /**
+   * Tests the order of splits returned when setting the watermark-column and 
watermark-timeunit
+   * options
+   */
+  @Test
+  public void testWatermarkOptionsAscending() throws Exception {
+List expected = testWatermarkOptionsInternal(true);
+TestHelpers.assertRecordsWithOrder(
+run(
+ImmutableMap.of("watermark-column", "t1", "split-file-open-cost", 
"12800"),
+"",
+"*"),
+expected,
+SCHEMA_TS);
+  }
+
+  @Test
+  public void testWatermarkOptionsDescending() throws Exception {
+List expected = testWatermarkOptionsInternal(false);
+TestHelpers.assertRecordsWithOrder(
+run(
+ImmutableMap.of(
+"watermark-column",
+"t2",
+"watermark-time-unit",

Review Comment:
   Shouldn't this be `watermark-column-time-unit`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Watermark read options [iceberg]

2024-01-05 Thread via GitHub



pvary commented on code in PR #9346:
URL: https://github.com/apache/iceberg/pull/9346#discussion_r1442664555


##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceSql.java:
##
@@ -39,4 +62,76 @@ public void before() throws IOException {
 .getConfiguration()
 .set(TableConfigOptions.TABLE_DYNAMIC_TABLE_OPTIONS_ENABLED, true);
   }
+
+  private Record generateRecord(Instant t1) {
+Record record = GenericRecord.create(SCHEMA_TS);
+record.setField("t1", t1.atZone(ZoneId.systemDefault()).toLocalDateTime());
+record.setField("t2", t1.getEpochSecond());

Review Comment:
   What I do not like in this test, that if we make a mistake and switch `t1` 
with `t2` in somewhere in the production code, or in the test then we will not 
see a difference in the test results.
   
   That is why I originally suggested, that the different columns have 
different values to make sure that the `watermark-column` value is propagated 
correctly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] why Iceberg Spark add_files procedure does not support parquet file with parquet-mr 1.10 version? [iceberg]

2024-01-05 Thread via GitHub



hbpeng0115 opened a new issue, #9418:
URL: https://github.com/apache/iceberg/issues/9418

   ### Query engine
   
   Spark
   
   ### Question
   
   Hello, we use Spark 3.2 + Iceberg 1.4.3 call procedure to add S3 parquet 
files to existed Iceberg table. However, we find parquet files with parquet-mr 
1.10 version can not be queried after invoking add_files procedure and throw 
the following exception. And parquet files with parquet-mr 1.13 version can be 
queried. Does anyone know how to solve this issue? Many thanks~
   ```
   Caused by: 
org.apache.iceberg.shaded.org.apache.parquet.io.ParquetDecodingException: Can't 
read value in column [request, context, profile_concrete_event_id, array] 
repeated int64 array = 170 at value 76034 out of 184067 in current page. 
repetition level: 1, definition level: 3
at 
org.apache.iceberg.parquet.PageIterator.handleRuntimeException(PageIterator.java:220)
at 
org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:151)
at 
org.apache.iceberg.parquet.ColumnIterator.nextLong(ColumnIterator.java:128)
at 
org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:49)
at 
org.apache.iceberg.parquet.ColumnIterator$3.next(ColumnIterator.java:46)
at 
org.apache.iceberg.parquet.ParquetValueReaders$UnboxedReader.read(ParquetValueReaders.java:246)
at 
org.apache.iceberg.parquet.ParquetValueReaders$RepeatedReader.read(ParquetValueReaders.java:467)
at 
org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419)
at 
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
at 
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
at 
org.apache.iceberg.parquet.ParquetValueReaders$OptionReader.read(ParquetValueReaders.java:419)
at 
org.apache.iceberg.parquet.ParquetValueReaders$StructReader.read(ParquetValueReaders.java:745)
at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:130)
at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65)
at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:129)
at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:93)
at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:130)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
stream.
at 
org.apache.iceberg.shaded.org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57)
at 
org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
at 
org.apache.iceberg.shaded.org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
at 
org.apache.iceberg.shaded.org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readLong(DictionaryValuesReader.java:117)
at 
org.apache.iceberg.parquet.PageIterator.nextLong(PageIterator.java:149)
... 33 more```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at

Re: [I] Pruning partitions using filter when reading file. [iceberg-rust]

2024-01-05 Thread via GitHub



liurenjie1024 commented on issue #126:
URL: https://github.com/apache/iceberg-rust/issues/126#issuecomment-1878483283

   Blocked by #153 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-05 Thread via GitHub



pvary commented on PR #9185:
URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1878512144

   > > Do we have a way to check if the new tests were running on the CI? With 
the correct parameters?
   > 
   > I don't think so, but I was verifying them all locally
   
   In this case, I am find with these changes.
   In the next PR we should port this changes to 1.18, and 1.16 too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] API, Core: Fix errorprone warnings [iceberg]

2024-01-05 Thread via GitHub



ajantha-bhat opened a new pull request, #9419:
URL: https://github.com/apache/iceberg/pull/9419

   `./gradlew clean build -x test` is reporting some warnings as seen below. 
   
   https://github.com/apache/iceberg/assets/5889404/e64c76c4-29d3-42cf-b3e0-c1d9f847200b";>
   
   After this PR, `./gradlew clean build -x test` is green.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Core: Fix errorprone warnings [iceberg]

2024-01-05 Thread via GitHub



ajantha-bhat commented on PR #9419:
URL: https://github.com/apache/iceberg/pull/9419#issuecomment-1878529981

   cc: @aokolnychyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Core: Fix errorprone warnings [iceberg]

2024-01-05 Thread via GitHub



ajantha-bhat commented on code in PR #9419:
URL: https://github.com/apache/iceberg/pull/9419#discussion_r1442771024


##
api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java:
##
@@ -166,6 +166,7 @@ public void clear() {
 wrapperSet.clear();
   }
 
+  @SuppressWarnings("CollectionUndefinedEquality")

Review Comment:
   `containsAll` uses `CharSequenceWrapper` for equals to ensure consistent
hashing and equals behaviour similar to `CharSequenceMap` (javadoc of 
`CharSequenceMap` explains it). But the tool is unable to figure it out I 
guess. Hence, suppressing here and below two places. 



##
api/src/main/java/org/apache/iceberg/util/CharSequenceWrapper.java:
##
@@ -54,7 +54,7 @@ public boolean equals(Object other) {
 CharSequenceWrapper that = (CharSequenceWrapper) other;
 
 if (wrapped instanceof String && that.wrapped instanceof String) {
-  return wrapped.equals(that.wrapped);
+  return wrapped.toString().contentEquals(that.wrapped);

Review Comment:
   see 
   
[https://errorprone.info/bugpattern/UndefinedEquals](https://errorprone.info/bugpattern/UndefinedEquals)
 check For CharSequence



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] open-api: Use openapi-generator-gradle-plugin for validating specification [iceberg]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #9344:
URL: https://github.com/apache/iceberg/pull/9344#discussion_r1442774264


##
open-api/rest-catalog-open-api.yaml:
##
@@ -2568,26 +2568,26 @@ components:
 propertyName: type
 mapping:
   assert-view-uuid: '#/components/schemas/AssertViewUUID'
-type: object
-required:
-  - type
-properties:
-  type:
-type: "string"
+  type: object

Review Comment:
   Nice catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Allow snapshotting iceberg table (create new table based on certain Iceberg snapshot) [iceberg]

2024-01-05 Thread via GitHub



YehorKrivokon commented on issue #2481:
URL: https://github.com/apache/iceberg/issues/2481#issuecomment-1878533755

   Hi all,
   I'm trying to use Iceberg with Spark and SparkCatalog and I've a repro of 
this issue.
   Is there any workaround to work with snapshots?
   
   Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] AWS: Add Option to don't write non current columns in glue schema closes #7584 [iceberg]

2024-01-05 Thread via GitHub



Raphael-Vignes opened a new pull request, #9420:
URL: https://github.com/apache/iceberg/pull/9420

   This PR aims to close this 
[issue](https://github.com/apache/iceberg/issues/7584) (and would resolve this 
[issue](https://github.com/apache/iceberg/issues/6340) too.
   
   We want to provide an optional parameter to `IcebergToGlueConverter`'s 
`setTableInputInformation` method (set to false by default) named  
̀glueCatalogWriteNonCurrentColumns`. It will allow to skip writing columns that 
were deleted in previous version of the Glue Schema instead of writing and 
setting the `iceberg.filed.current` value to false.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9421:
URL: https://github.com/apache/iceberg/pull/9421#discussion_r1442789176


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala:
##
@@ -111,8 +116,8 @@ case class ResolveViews(spark: SparkSession) extends 
Rule[LogicalPlan] with Look
   }
 
   private def qualifyFunctionIdentifiers(
-  plan: LogicalPlan,
-  catalogAndNamespace: Seq[String]): LogicalPlan = plan 
transformExpressions {
+plan: LogicalPlan,
+catalogAndNamespace: Seq[String]): LogicalPlan = plan transformExpressions 
{

Review Comment:
   formatting was off here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.53.v20231009 to 11.0.19 [iceberg]

2024-01-05 Thread via GitHub



nastra closed pull request #9370: Build: Bump jetty from 9.4.53.v20231009 to 
11.0.19
URL: https://github.com/apache/iceberg/pull/9370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.53.v20231009 to 11.0.19 [iceberg]

2024-01-05 Thread via GitHub



dependabot[bot] commented on PR #9370:
URL: https://github.com/apache/iceberg/pull/9370#issuecomment-1878554815

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. You can also ignore all major, minor, or patch 
releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump com.google.errorprone:error_prone_annotations from 2.23.0 to 2.24.0 [iceberg]

2024-01-05 Thread via GitHub



nastra merged PR #9369:
URL: https://github.com/apache/iceberg/pull/9369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.1 [iceberg]

2024-01-05 Thread via GitHub



nastra commented on PR #9371:
URL: https://github.com/apache/iceberg/pull/9371#issuecomment-1878558251

   Looks like `iceberg-aliyun` uses this for testing. I don't think we can 
upgrade to the newer version, since that requires JDK 11


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.1 [iceberg]

2024-01-05 Thread via GitHub



dependabot[bot] commented on PR #9371:
URL: https://github.com/apache/iceberg/pull/9371#issuecomment-1878558298

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. You can also ignore all major, minor, or patch 
releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.1 [iceberg]

2024-01-05 Thread via GitHub



nastra closed pull request #9371: Build: Bump spring-boot from 2.5.4 to 3.2.1
URL: https://github.com/apache/iceberg/pull/9371


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-05 Thread via GitHub



nastra commented on PR #9185:
URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1878559904

   @cgpoh can you please port these changes to other Flink versions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink 1.17: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-05 Thread via GitHub



nastra merged PR #9185:
URL: https://github.com/apache/iceberg/pull/9185


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Create JUnit5-version of TestFlinkScan [iceberg]

2024-01-05 Thread via GitHub



nastra closed issue #9077: Create JUnit5-version of TestFlinkScan
URL: https://github.com/apache/iceberg/issues/9077


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Nessie: Strip trailing slash for warehouse location [iceberg]

2024-01-05 Thread via GitHub



nastra merged PR #9415:
URL: https://github.com/apache/iceberg/pull/9415


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Adding Junit 5 conversion and AssertJ style for TestFlinkCatalogTable… [iceberg]

2024-01-05 Thread via GitHub



nastra commented on PR #9381:
URL: https://github.com/apache/iceberg/pull/9381#issuecomment-1878626062

   @vinitpatni the issue with `GenericAppenderHelper` has been addressed in 
https://github.com/apache/iceberg/pull/9185, so this class should now be usable 
with JUnit4 + JUni5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.31.0 [iceberg]

2024-01-05 Thread via GitHub



dependabot[bot] commented on PR #9041:
URL: https://github.com/apache/iceberg/pull/9041#issuecomment-1878626952

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.31.0 [iceberg]

2024-01-05 Thread via GitHub



nastra closed pull request #9041: Build: Bump 
com.palantir.baseline:gradle-baseline-java from 4.42.0 to 5.31.0
URL: https://github.com/apache/iceberg/pull/9041


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9417:
URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442849817


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestChangelogIterator.java:
##
@@ -196,10 +196,10 @@ public void testUpdatedRowsWithDuplication() {
 Iterator iterator =
 ChangelogIterator.computeUpdates(rowsWithDuplication.iterator(), 
SCHEMA, IDENTIFIER_FIELDS);
 
-assertThrows(
-"Cannot compute updates because there are multiple rows with the same 
identifier fields([id, name]). Please make sure the rows are unique.",
-IllegalStateException.class,
-() -> Lists.newArrayList(iterator));
+assertThatThrownBy(() -> Lists.newArrayList(iterator))
+.as(
+"Cannot compute updates because there are multiple rows with the 
same identifier fields([id, name]). Please make sure the rows are unique.")
+.isInstanceOf(IllegalStateException.class);

Review Comment:
   can you please add a `.hasMessage()` check here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Arrow: Use case instead of wrapping a map/list [iceberg-python]

2024-01-05 Thread via GitHub



Fokko opened a new pull request, #252:
URL: https://github.com/apache/iceberg-python/pull/252

   Wrapping the list seems to introduce an odd behavior where `null` values are 
converted to an empty list `[]`.
   
   Resolves #251


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9417:
URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442860998


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/TestTableSerialization.java:
##
@@ -36,29 +39,23 @@
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.spark.source.SerializableTableWithSize;
 import org.apache.iceberg.types.Types;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Rule;
-import org.junit.Test;
-import org.junit.rules.TemporaryFolder;
-import org.junit.runner.RunWith;
-import org.junit.runners.Parameterized;
-
-@RunWith(Parameterized.class)
-public class TestTableSerialization {
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.junit.jupiter.api.io.TempDir;
 
-  public TestTableSerialization(String isObjectStoreEnabled) {
-this.isObjectStoreEnabled = isObjectStoreEnabled;
-  }
+@ExtendWith(ParameterizedTestExtension.class)
+public class TestTableSerialization {
 
-  @Parameterized.Parameters(name = "isObjectStoreEnabled = {0}")
-  public static Object[] parameters() {
-return new Object[] {"true", "false"};
+  @Parameters(name = "isObjectStoreEnabled = {0}")
+  public static Object[][] parameters() {

Review Comment:
   probably better to have it less complex:
   ```
   public static List parameters() {
   return Arrays.asList("true", "false");
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9417:
URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442861281


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/TestTableSerialization.java:
##
@@ -36,29 +39,23 @@
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.spark.source.SerializableTableWithSize;
 import org.apache.iceberg.types.Types;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Rule;
-import org.junit.Test;
-import org.junit.rules.TemporaryFolder;
-import org.junit.runner.RunWith;
-import org.junit.runners.Parameterized;
-
-@RunWith(Parameterized.class)
-public class TestTableSerialization {
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.junit.jupiter.api.io.TempDir;
 
-  public TestTableSerialization(String isObjectStoreEnabled) {
-this.isObjectStoreEnabled = isObjectStoreEnabled;
-  }
+@ExtendWith(ParameterizedTestExtension.class)
+public class TestTableSerialization {
 
-  @Parameterized.Parameters(name = "isObjectStoreEnabled = {0}")
-  public static Object[] parameters() {
-return new Object[] {"true", "false"};
+  @Parameters(name = "isObjectStoreEnabled = {0}")
+  public static Object[][] parameters() {
+return new Object[][] {{"true"}, {"false"}};
   }
 
   private static final HadoopTables TABLES = new HadoopTables();
 
-  private final String isObjectStoreEnabled;
+  @Parameter(index = 0)

Review Comment:
   ```suggestion
 @Parameter
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-05 Thread via GitHub



javrasya commented on issue #9410:
URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1878646665

   No idea tbh since I run into this on production and there I don't have the 
ability to go deep and debug. 
   
   I wouldn't say it is a wide table in terms of number of columns and I don't 
know about column stats. Is there a way to fetch that from somewhere?
   
   
   But the metadata about the file I know at least is like;
   
   One of the files:
   
   **File name:** 
`s3://data-furnishing-pipeline-eu-production/data-lake/dafu_pipeline.db/transactional_customer_ids_created/data/__key_bucket=0/createdAt_day=2018-10-04/country=NO/00021-0-ca186d77-6349-4e79-bb4f-874fac0ee123-00186.parquet`
   
   **Schema**:
   ```json
   [
 {
   "Name": "__key",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "1",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "eventid",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "2",
 "iceberg.field.optional": "false"
   }
 },
 {
   "Name": "authorizationid",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "3",
 "iceberg.field.optional": "false"
   }
 },
 {
   "Name": "authorizationkrn",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "4",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "orderid",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "5",
 "iceberg.field.optional": "false"
   }
 },
 {
   "Name": "ordershortid",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "6",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "orderkrn",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "7",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "nationalidentificationnumber",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "8",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "personaid",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "9",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "personakrn",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "10",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "accountid",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "11",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "accountkrn",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "12",
 "iceberg.field.optional": "true"
   }
 },
 {
   "Name": "country",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "13",
 "iceberg.field.optional": "false"
   }
 },
 {
   "Name": "createdat",
   "Type": "timestamp",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "14",
 "iceberg.field.optional": "false"
   }
 }
   ]
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9417:
URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442867841


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestFunctionCatalog.java:
##
@@ -27,107 +30,106 @@
 import org.apache.spark.sql.connector.catalog.FunctionCatalog;
 import org.apache.spark.sql.connector.catalog.Identifier;
 import org.apache.spark.sql.connector.catalog.functions.UnboundFunction;
-import org.assertj.core.api.Assertions;
-import org.junit.After;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Test;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.TestTemplate;
 
-public class TestFunctionCatalog extends SparkTestBaseWithCatalog {
+public class TestFunctionCatalog extends TestBaseWithCatalog {
   private static final String[] EMPTY_NAMESPACE = new String[] {};
   private static final String[] SYSTEM_NAMESPACE = new String[] {"system"};
   private static final String[] DEFAULT_NAMESPACE = new String[] {"default"};
   private static final String[] DB_NAMESPACE = new String[] {"db"};
-  private final FunctionCatalog asFunctionCatalog;
+  private FunctionCatalog asFunctionCatalog;
 
-  public TestFunctionCatalog() {
+  @BeforeEach
+  public void before() {
+super.before();
 this.asFunctionCatalog = castToFunctionCatalog(catalogName);
+createDefaultNamespace();
   }
 
-  @Before
   public void createDefaultNamespace() {
 sql("CREATE NAMESPACE IF NOT EXISTS %s", catalogName + ".default");
   }
 
-  @After
+  @AfterEach
   public void dropDefaultNamespace() {
 sql("DROP NAMESPACE IF EXISTS %s", catalogName + ".default");
   }
 
-  @Test
+  @TestTemplate
   public void testListFunctionsViaCatalog() throws NoSuchNamespaceException {
-Assertions.assertThat(asFunctionCatalog.listFunctions(EMPTY_NAMESPACE))

Review Comment:
   these changes are adding lots of noise. Using static imports in new test 
files is preferrable, but we typically don't want to add too much changes to 
new PRs by adding static imports. It's ok to keep these now, but just a 
reminder for any future PRs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9417:
URL: https://github.com/apache/iceberg/pull/9417#discussion_r1442872510


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestFunctionCatalog.java:
##
@@ -27,107 +30,106 @@
 import org.apache.spark.sql.connector.catalog.FunctionCatalog;
 import org.apache.spark.sql.connector.catalog.Identifier;
 import org.apache.spark.sql.connector.catalog.functions.UnboundFunction;
-import org.assertj.core.api.Assertions;
-import org.junit.After;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Test;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.TestTemplate;
 
-public class TestFunctionCatalog extends SparkTestBaseWithCatalog {
+public class TestFunctionCatalog extends TestBaseWithCatalog {
   private static final String[] EMPTY_NAMESPACE = new String[] {};
   private static final String[] SYSTEM_NAMESPACE = new String[] {"system"};
   private static final String[] DEFAULT_NAMESPACE = new String[] {"default"};
   private static final String[] DB_NAMESPACE = new String[] {"db"};
-  private final FunctionCatalog asFunctionCatalog;
+  private FunctionCatalog asFunctionCatalog;
 
-  public TestFunctionCatalog() {
+  @BeforeEach
+  public void before() {
+super.before();
 this.asFunctionCatalog = castToFunctionCatalog(catalogName);
+createDefaultNamespace();
   }
 
-  @Before
   public void createDefaultNamespace() {

Review Comment:
   no need to have this separate method now. Can you just move `sql("CREATE 
NAMESPACE IF NOT EXISTS %s", catalogName + ".default");` to `before()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]

2024-01-05 Thread via GitHub



nastra commented on PR #9417:
URL: https://github.com/apache/iceberg/pull/9417#issuecomment-1878660248

   @chinmay-bhat this is almost ready to go, just a few minor things to address


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Migrate tests in SQL directory to JUnit5 [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9401:
URL: https://github.com/apache/iceberg/pull/9401#discussion_r1442885901


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/PartitionedWritesTestBase.java:
##
@@ -18,53 +18,49 @@
  */
 package org.apache.iceberg.spark.sql;
 
+import static org.assertj.core.api.Assertions.assertThat;
+
 import java.util.Arrays;
 import java.util.List;
-import java.util.Map;
 import org.apache.iceberg.Table;
 import org.apache.iceberg.expressions.Expressions;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
-import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.apache.iceberg.spark.CatalogTestBase;
 import org.apache.iceberg.spark.source.SimpleRecord;
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
 import org.apache.spark.sql.functions;
-import org.junit.After;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Test;
-
-public abstract class PartitionedWritesTestBase extends SparkCatalogTestBase {
-  public PartitionedWritesTestBase(
-  String catalogName, String implementation, Map config) {
-super(catalogName, implementation, config);
-  }
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.TestTemplate;
+
+public abstract class /**/ PartitionedWritesTestBase extends CatalogTestBase {

Review Comment:
   ```suggestion
   public abstract class PartitionedWritesTestBase extends CatalogTestBase {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Set log level to WARN for rewrite task failure with partial progress [iceberg]

2024-01-05 Thread via GitHub



manuzhang commented on code in PR #9400:
URL: https://github.com/apache/iceberg/pull/9400#discussion_r1442954829


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##
@@ -345,7 +345,7 @@ private Result doExecuteWithPartialProgress(
 .noRetry()
 .onFailure(
 (fileGroup, exception) -> {
-  LOG.error("Failure during rewrite group {}", fileGroup.info(), 
exception);
+  LOG.warn("Failure during rewrite group {}", fileGroup.info(), 
exception);

Review Comment:
   I agree ERROR is better for non-partial case. For partial progress, since 
failure is allowed, WARN sends the signal that failure is not fatal to the 
whole procedure. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Fix parquet default compression codec [iceberg-docs]

2024-01-05 Thread via GitHub



manuzhang opened a new pull request, #305:
URL: https://github.com/apache/iceberg-docs/pull/305

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]

2024-01-05 Thread via GitHub



chinmay-bhat commented on PR #9417:
URL: https://github.com/apache/iceberg/pull/9417#issuecomment-1878814101

   Thanks for the review, I've updated the PR with the suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-05 Thread via GitHub



pvary commented on issue #9410:
URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1878817631

   @javrasya: This is table should not be too wide, and the statistics should 
be limited as well (unless you did some specific tweaking there). My best guess 
is your first suggestion:
   
   > reduce the size of FileScanTasks per split to be one
   
   How did you try to archive this?
   The best way to archive this is to set `splitOpenFileCost` to the same value 
as the split size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Migrate tests in SQL directory to JUnit5 [iceberg]

2024-01-05 Thread via GitHub



chinmay-bhat commented on code in PR #9401:
URL: https://github.com/apache/iceberg/pull/9401#discussion_r1442989187


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/PartitionedWritesTestBase.java:
##
@@ -18,53 +18,49 @@
  */
 package org.apache.iceberg.spark.sql;
 
+import static org.assertj.core.api.Assertions.assertThat;
+
 import java.util.Arrays;
 import java.util.List;
-import java.util.Map;
 import org.apache.iceberg.Table;
 import org.apache.iceberg.expressions.Expressions;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
-import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.apache.iceberg.spark.CatalogTestBase;
 import org.apache.iceberg.spark.source.SimpleRecord;
 import org.apache.spark.sql.Dataset;
 import org.apache.spark.sql.Row;
 import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
 import org.apache.spark.sql.functions;
-import org.junit.After;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Test;
-
-public abstract class PartitionedWritesTestBase extends SparkCatalogTestBase {
-  public PartitionedWritesTestBase(
-  String catalogName, String implementation, Map config) {
-super(catalogName, implementation, config);
-  }
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.TestTemplate;
+
+public abstract class /**/ PartitionedWritesTestBase extends CatalogTestBase {

Review Comment:
   my bad, fixed!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink 1.17: Create JUnit5 version of TestFlinkScan [iceberg]

2024-01-05 Thread via GitHub



cgpoh commented on PR #9185:
URL: https://github.com/apache/iceberg/pull/9185#issuecomment-1878844200

   > @cgpoh can you please port these changes to other Flink versions?
   
   Sure. Will try to port these changes next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #8988:
URL: https://github.com/apache/iceberg/pull/8988#discussion_r1443018102


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##
@@ -2181,20 +2181,28 @@ public void testTableWithInt96Timestamp() throws 
IOException {
 stagingLocation);
 
 // validate we get the expected results back
-List expected = 
spark.table("parquet_table").select("tmp_col").collectAsList();
-List actual =
-spark
-.read()
-.format("iceberg")
-.load(loadLocation(tableIdentifier))
-.select("tmp_col")
-.collectAsList();
-assertThat(actual).as("Rows must 
match").containsExactlyInAnyOrderElementsOf(expected);
+testWithFilter("tmp_col < to_timestamp('2000-01-31 08:30:00')", 
tableIdentifier);

Review Comment:
   these tests are all passing without the fix. Can you please add a test that 
reproduces the issue from https://github.com/apache/iceberg/issues/8990?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9217:
URL: https://github.com/apache/iceberg/pull/9217#discussion_r1443020124


##
core/src/test/java/org/apache/iceberg/TestCreateSnapshotEvent.java:
##
@@ -40,76 +41,76 @@ public TestCreateSnapshotEvent(int formatVersion) {
 Listeners.register(new MyListener(), CreateSnapshotEvent.class);
   }
 
-  @Test
+  @TestTemplate
   public void testAppendCommitEvent() {
-Assert.assertEquals("Table should start empty", 0, 
listManifestFiles().size());
+Assertions.assertThat(listManifestFiles()).as("Table should start 
empty").isEmpty();

Review Comment:
   can you statically import `assertThat()` please?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]

2024-01-05 Thread via GitHub



manuzhang commented on code in PR #8988:
URL: https://github.com/apache/iceberg/pull/8988#discussion_r1443027055


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##
@@ -2181,20 +2181,28 @@ public void testTableWithInt96Timestamp() throws 
IOException {
 stagingLocation);
 
 // validate we get the expected results back
-List expected = 
spark.table("parquet_table").select("tmp_col").collectAsList();
-List actual =
-spark
-.read()
-.format("iceberg")
-.load(loadLocation(tableIdentifier))
-.select("tmp_col")
-.collectAsList();
-assertThat(actual).as("Rows must 
match").containsExactlyInAnyOrderElementsOf(expected);
+testWithFilter("tmp_col < to_timestamp('2000-01-31 08:30:00')", 
tableIdentifier);

Review Comment:
   Filter expression was no longer applied after last rebase. It's fixed now. 
Please try again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9217:
URL: https://github.com/apache/iceberg/pull/9217#discussion_r1443033477


##
core/src/test/java/org/apache/iceberg/TestCreateSnapshotEvent.java:
##
@@ -40,76 +41,76 @@ public TestCreateSnapshotEvent(int formatVersion) {
 Listeners.register(new MyListener(), CreateSnapshotEvent.class);
   }
 
-  @Test
+  @TestTemplate
   public void testAppendCommitEvent() {
-Assert.assertEquals("Table should start empty", 0, 
listManifestFiles().size());
+Assertions.assertThat(listManifestFiles()).as("Table should start 
empty").isEmpty();
 
 table.newAppend().appendFile(FILE_A).commit();
-Assert.assertNotNull(currentEvent);
-Assert.assertEquals(
-"Added records in the table should be 1", "1", 
currentEvent.summary().get("added-records"));
-Assert.assertEquals(
-"Added files in the table should be 1",
-"1",
-currentEvent.summary().get("added-data-files"));
-Assert.assertEquals(
-"Total records in the table should be 1", "1", 
currentEvent.summary().get("total-records"));
-Assert.assertEquals(
-"Total data files in the table should be 1",
-"1",
-currentEvent.summary().get("total-data-files"));
+Assertions.assertThat(currentEvent).isNotNull();
+Assertions.assertThat(currentEvent.summary().get("added-records"))

Review Comment:
   can be simplified to 
   ```
   assertThat(currentEvent.summary())
   .containsEntry("added-records", "1")
   .containsEntry("added-data-files", "1")
   .containsEntry("total-records", "1")
   .containsEntry("total-data-files", "1");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Adding Junit 5 conversion and AssertJ style for TestFlinkCatalogTable… [iceberg]

2024-01-05 Thread via GitHub



vinitpatni commented on PR #9381:
URL: https://github.com/apache/iceberg/pull/9381#issuecomment-1878891716

   I have done changes in TestStreamScanSql locally but running testcases on my 
local machine is taking lot of time. It is running indefinitely for this class. 
@nastra Do you have any idea ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Remaining tests migrated to Junit5 [iceberg]

2024-01-05 Thread via GitHub



nastra merged PR #9417:
URL: https://github.com/apache/iceberg/pull/9417


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.4: Add support for reading Iceberg views [iceberg]

2024-01-05 Thread via GitHub



amogh-jahagirdar commented on PR #9422:
URL: https://github.com/apache/iceberg/pull/9422#issuecomment-1878930402

   Since this was a clean backport. I'll go ahead and merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.4: Add support for reading Iceberg views [iceberg]

2024-01-05 Thread via GitHub



amogh-jahagirdar merged PR #9422:
URL: https://github.com/apache/iceberg/pull/9422


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]

2024-01-05 Thread via GitHub



manuzhang commented on code in PR #8988:
URL: https://github.com/apache/iceberg/pull/8988#discussion_r1443085007


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##
@@ -2181,20 +2181,28 @@ public void testTableWithInt96Timestamp() throws 
IOException {
 stagingLocation);
 
 // validate we get the expected results back
-List expected = 
spark.table("parquet_table").select("tmp_col").collectAsList();
-List actual =
-spark
-.read()
-.format("iceberg")
-.load(loadLocation(tableIdentifier))
-.select("tmp_col")
-.collectAsList();
-assertThat(actual).as("Rows must 
match").containsExactlyInAnyOrderElementsOf(expected);
+testWithFilter("tmp_col < to_timestamp('2000-01-31 08:30:00')", 
tableIdentifier);

Review Comment:
   Test failed after last applying the filter expression. Still working on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet: Support reading INT96 column in row group filter [iceberg]

2024-01-05 Thread via GitHub



manuzhang commented on code in PR #8988:
URL: https://github.com/apache/iceberg/pull/8988#discussion_r1443085007


##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##
@@ -2181,20 +2181,28 @@ public void testTableWithInt96Timestamp() throws 
IOException {
 stagingLocation);
 
 // validate we get the expected results back
-List expected = 
spark.table("parquet_table").select("tmp_col").collectAsList();
-List actual =
-spark
-.read()
-.format("iceberg")
-.load(loadLocation(tableIdentifier))
-.select("tmp_col")
-.collectAsList();
-assertThat(actual).as("Rows must 
match").containsExactlyInAnyOrderElementsOf(expected);
+testWithFilter("tmp_col < to_timestamp('2000-01-31 08:30:00')", 
tableIdentifier);

Review Comment:
   Test failed after applying the filter expression. Still working on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9217:
URL: https://github.com/apache/iceberg/pull/9217#discussion_r1443105597


##
core/src/test/java/org/apache/iceberg/TestManifestReader.java:
##
@@ -32,17 +32,15 @@
 import org.apache.iceberg.types.Types;
 import org.assertj.core.api.Assertions;
 import 
org.assertj.core.api.recursive.comparison.RecursiveComparisonConfiguration;
-import org.junit.Assert;
-import org.junit.Assume;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.junit.runners.Parameterized;
-
-@RunWith(Parameterized.class)
-public class TestManifestReader extends TableTestBase {
-  @Parameterized.Parameters(name = "formatVersion = {0}")
-  public static Object[] parameters() {
-return new Object[] {1, 2};
+import org.junit.jupiter.api.Assumptions;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.ExtendWith;
+
+@ExtendWith(ParameterizedTestExtension.class)

Review Comment:
   this should be part of the base class. Also the base + subclass shouldn't 
have a constructor anymore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Core: Add JUnit5 version of TableTestBase [iceberg]

2024-01-05 Thread via GitHub



nastra opened a new pull request, #9424:
URL: https://github.com/apache/iceberg/pull/9424

   This fixes https://github.com/apache/iceberg/issues/9073


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Create JUnit5 version of TableTestBase [iceberg]

2024-01-05 Thread via GitHub



nastra commented on code in PR #9217:
URL: https://github.com/apache/iceberg/pull/9217#discussion_r1443109201


##
core/src/test/java/org/apache/iceberg/TestManifestReader.java:
##
@@ -32,17 +32,15 @@
 import org.apache.iceberg.types.Types;
 import org.assertj.core.api.Assertions;
 import 
org.assertj.core.api.recursive.comparison.RecursiveComparisonConfiguration;
-import org.junit.Assert;
-import org.junit.Assume;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.junit.runners.Parameterized;
-
-@RunWith(Parameterized.class)
-public class TestManifestReader extends TableTestBase {
-  @Parameterized.Parameters(name = "formatVersion = {0}")
-  public static Object[] parameters() {
-return new Object[] {1, 2};
+import org.junit.jupiter.api.Assumptions;
+import org.junit.jupiter.api.TestTemplate;
+import org.junit.jupiter.api.extension.ExtendWith;
+
+@ExtendWith(ParameterizedTestExtension.class)

Review Comment:
   @lisirrx I've opened https://github.com/apache/iceberg/pull/9424 to fix the 
parameterization as that should live in the base class. If you're ok with the 
changes I can push them to your branch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]

2024-01-05 Thread via GitHub



rdblue commented on PR #9414:
URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1878993066

   @amogh-jahagirdar, I agree with not adding a `with` method to modify a 
schema. If we can avoid making that kind of API addition I would prefer it.
   
   I'm also not sure that we want to modify existing views by updating the 
schema. That's a lot of validation and complexity. The schema should be 
produced by fairly standard SQL syntax:
   
   ```sql
   CREATE OR REPLACE VIEW default.event_agg (
   event_count COMMENT 'Count of events',
   event_date)
   COMMENT 'Daily event counts'
   AS
   SELECT
   COUNT(1), CAST(event_ts AS DATE)
   FROM prod.default.events
   GROUP BY 2
   ```
   
   If we allow making schema
   -only changes, then would we also allow modifying the view by 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]

2024-01-05 Thread via GitHub



rdblue closed pull request #9414: API, Core: Add Schema#withUpdatedDoc and 
View#updateColumnDoc APIs
URL: https://github.com/apache/iceberg/pull/9414


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Watermark read options [iceberg]

2024-01-05 Thread via GitHub



stevenzwu commented on code in PR #9346:
URL: https://github.com/apache/iceberg/pull/9346#discussion_r1443084956


##
flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/FlinkReadOptions.java:
##
@@ -109,4 +110,13 @@ private FlinkReadOptions() {}
   public static final String MAX_ALLOWED_PLANNING_FAILURES = 
"max-allowed-planning-failures";
   public static final ConfigOption 
MAX_ALLOWED_PLANNING_FAILURES_OPTION =
   ConfigOptions.key(PREFIX + 
MAX_ALLOWED_PLANNING_FAILURES).intType().defaultValue(3);
+
+  public static final String WATERMARK_COLUMN = "watermark-column";
+  public static final ConfigOption WATERMARK_COLUMN_OPTION =
+  ConfigOptions.key(PREFIX + 
WATERMARK_COLUMN).stringType().noDefaultValue();
+  public static final String WATERMARK_TIME_COLUMN_UNIT = 
"watermark-column-time-unit";

Review Comment:
   nit: it seems the style is to have an empty line btw a config



##
flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##
@@ -482,24 +478,25 @@ public IcebergSource build() {
   }
 
   contextBuilder.resolveConfig(table, readOptions, flinkConfig);
-
   Schema icebergSchema = table.schema();
   if (projectedFlinkSchema != null) {
 contextBuilder.project(FlinkSchemaUtil.convert(icebergSchema, 
projectedFlinkSchema));
   }
 
   SerializableRecordEmitter emitter = 
SerializableRecordEmitter.defaultEmitter();
+  FlinkReadConf flinkReadConf = new FlinkReadConf(table, readOptions, 
flinkConfig);
+  String watermarkColumn = flinkReadConf.watermarkColumn();
+  TimeUnit watermarkTimeUnit = flinkReadConf.watermarkTimeUnit();
+
   if (watermarkColumn != null) {
 // Column statistics is needed for watermark generation
 contextBuilder.includeColumnStats(Sets.newHashSet(watermarkColumn));
-

Review Comment:
   nit: revert unnecessary empty line change



##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceBoundedSql.java:
##
@@ -29,11 +31,17 @@
 import org.apache.iceberg.Schema;
 import org.apache.iceberg.expressions.Expression;
 import org.apache.iceberg.flink.FlinkConfigOptions;
+import org.apache.iceberg.types.Types;
 import org.junit.Before;
 
 public class TestIcebergSourceBoundedSql extends TestIcebergSourceBounded {
   private volatile TableEnvironment tEnv;
 
+  private static final Schema SCHEMA_TS =

Review Comment:
   all changes in this file should be reverted, right?



##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceSql.java:
##
@@ -39,4 +62,76 @@ public void before() throws IOException {
 .getConfiguration()
 .set(TableConfigOptions.TABLE_DYNAMIC_TABLE_OPTIONS_ENABLED, true);
   }
+
+  private Record generateRecord(Instant t1) {
+Record record = GenericRecord.create(SCHEMA_TS);
+record.setField("t1", t1.atZone(ZoneId.systemDefault()).toLocalDateTime());
+record.setField("t2", t1.getEpochSecond());
+return record;
+  }
+
+  private List testWatermarkOptionsInternal(boolean ascending) throws 
Exception {

Review Comment:
   this is not actually `test`. it is just preparing the table by appending 
records/files to the table. maybe also add a Javadoc to explain the return list 
is the expected order of results



##
flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##
@@ -482,24 +478,25 @@ public IcebergSource build() {
   }
 
   contextBuilder.resolveConfig(table, readOptions, flinkConfig);
-
   Schema icebergSchema = table.schema();
   if (projectedFlinkSchema != null) {
 contextBuilder.project(FlinkSchemaUtil.convert(icebergSchema, 
projectedFlinkSchema));
   }
 
   SerializableRecordEmitter emitter = 
SerializableRecordEmitter.defaultEmitter();
+  FlinkReadConf flinkReadConf = new FlinkReadConf(table, readOptions, 
flinkConfig);
+  String watermarkColumn = flinkReadConf.watermarkColumn();
+  TimeUnit watermarkTimeUnit = flinkReadConf.watermarkTimeUnit();
+
   if (watermarkColumn != null) {
 // Column statistics is needed for watermark generation
 contextBuilder.includeColumnStats(Sets.newHashSet(watermarkColumn));
-
 SplitWatermarkExtractor watermarkExtractor =
 new ColumnStatsWatermarkExtractor(icebergSchema, watermarkColumn, 
watermarkTimeUnit);
 emitter = 
SerializableRecordEmitter.emitterWithWatermark(watermarkExtractor);
 splitAssignerFactory =
 new 
OrderedSplitAssignerFactory(SplitComparators.watermark(watermarkExtractor));
   }
-

Review Comment:
   nit: unnecessary empty line change. Iceberg style has an empty line after a 
control block ends with `}`



##
flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/TestHelpers.java:
##
@@ -126,7 +126,7 @@ public static List convertRowDataToRow(List 
rowDataList, RowType r
 .col

Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]

2024-01-05 Thread via GitHub



rdblue commented on PR #9414:
URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1879017514

   Oops, I didn't intend to close this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet: Move to ValueReader generation to a visitor [iceberg]

2024-01-05 Thread via GitHub



amogh-jahagirdar commented on code in PR #9063:
URL: https://github.com/apache/iceberg/pull/9063#discussion_r1443148176


##
parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java:
##
@@ -108,6 +110,113 @@ public ParquetValueReader struct(
 }
   }
 
+  private class LogicalTypeAnnotationParquetValueReaderVisitor
+  implements 
LogicalTypeAnnotation.LogicalTypeAnnotationVisitor> {
+
+private final ColumnDescriptor desc;
+private final org.apache.iceberg.types.Type.PrimitiveType expected;
+private final PrimitiveType primitive;
+
+LogicalTypeAnnotationParquetValueReaderVisitor(
+ColumnDescriptor desc,
+org.apache.iceberg.types.Type.PrimitiveType expected,
+PrimitiveType primitive) {
+  this.desc = desc;
+  this.expected = expected;
+  this.primitive = primitive;
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.StringLogicalTypeAnnotation stringLogicalType) {
+  return Optional.of(new ParquetValueReaders.StringReader(desc));
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumLogicalType) {
+  return Optional.of(new ParquetValueReaders.StringReader(desc));
+}
+
+@Override
+public Optional> visit(DecimalLogicalTypeAnnotation 
decimalLogicalType) {
+  switch (primitive.getPrimitiveTypeName()) {
+case BINARY:
+case FIXED_LEN_BYTE_ARRAY:
+  return Optional.of(
+  new ParquetValueReaders.BinaryAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+case INT64:
+  return Optional.of(
+  new ParquetValueReaders.LongAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+case INT32:
+  return Optional.of(
+  new ParquetValueReaders.IntegerAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+default:
+  throw new UnsupportedOperationException(
+  "Unsupported base type for decimal: " + 
primitive.getPrimitiveTypeName());
+  }
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.DateLogicalTypeAnnotation dateLogicalType) {
+  return Optional.of(new DateReader(desc));
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.TimeLogicalTypeAnnotation timeLogicalType) {
+  if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) {
+return Optional.of(new TimeReader(desc));
+  } else if (timeLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MILLIS) {
+return Optional.of(new TimeMillisReader(desc));
+  }
+
+  return Optional.empty();
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.TimestampLogicalTypeAnnotation 
timestampLogicalType) {
+  if (timestampLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MICROS) {
+Types.TimestampType tsMicrosType = (Types.TimestampType) expected;
+return tsMicrosType.shouldAdjustToUTC()
+? Optional.of(new TimestamptzReader(desc))
+: Optional.of(new TimestampReader(desc));
+  } else if (timestampLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MICROS) {

Review Comment:
   I think this should be == LogicalTypeAnnotation.TimeUnit.MILLIS?



##
parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java:
##
@@ -108,6 +110,113 @@ public ParquetValueReader struct(
 }
   }
 
+  private class LogicalTypeAnnotationParquetValueReaderVisitor
+  implements 
LogicalTypeAnnotation.LogicalTypeAnnotationVisitor> {
+
+private final ColumnDescriptor desc;
+private final org.apache.iceberg.types.Type.PrimitiveType expected;
+private final PrimitiveType primitive;
+
+LogicalTypeAnnotationParquetValueReaderVisitor(
+ColumnDescriptor desc,
+org.apache.iceberg.types.Type.PrimitiveType expected,
+PrimitiveType primitive) {
+  this.desc = desc;
+  this.expected = expected;
+  this.primitive = primitive;
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.StringLogicalTypeAnnotation stringLogicalType) {
+  return Optional.of(new ParquetValueReaders.StringReader(desc));
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumLogicalType) {
+  return Optional.of(new ParquetValueReaders.StringReader(desc));
+}
+
+@Override
+public Optional> visit(DecimalLogicalTypeAnnotation 
decimalLogicalType) {
+  switch (primitive.getPrimitiveTypeName()) {
+case BINARY:
+case FIXED_LEN_BYTE_ARRAY:
+  return Optional.of(
+  new ParquetValueReaders.BinaryAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+case INT64:
+  return Optional.of(
+  new ParquetValueReaders.LongAsDecimalReader(desc, 
decimalLogicalT

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-05 Thread via GitHub



rdblue commented on code in PR #9421:
URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443152002


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala:
##
@@ -122,7 +128,27 @@ class IcebergSparkSqlExtensionsParser(delegate: 
ParserInterface) extends ParserI
 if (isIcebergCommand(sqlTextAfterSubstitution)) {
   parse(sqlTextAfterSubstitution) { parser => 
astBuilder.visit(parser.singleStatement()) }.asInstanceOf[LogicalPlan]
 } else {
-  delegate.parsePlan(sqlText)
+  ViewSubstitutionExecutor.execute(delegate.parsePlan(sqlText))
+}
+  }
+
+  private object ViewSubstitutionExecutor extends RuleExecutor[LogicalPlan] {
+private val fixedPoint = FixedPoint(
+  maxIterations,
+  errorOnExceed = true,
+  maxIterationsSetting = SQLConf.ANALYZER_MAX_ITERATIONS.key)
+
+override protected def batches: Seq[Batch] = Seq(Batch("pre-substitution", 
fixedPoint, V2ViewSubstitution))

Review Comment:
   Do we want to call this pre-substitution still? I originally used that 
because I thought we wanted substitution rules in it. But it turns out that we 
don't need this for view substitution, only for command hijacking. Maybe a 
"Hijack Commands" batch?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-05 Thread via GitHub



rdblue commented on code in PR #9421:
URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443153917


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala:
##
@@ -122,7 +128,27 @@ class IcebergSparkSqlExtensionsParser(delegate: 
ParserInterface) extends ParserI
 if (isIcebergCommand(sqlTextAfterSubstitution)) {
   parse(sqlTextAfterSubstitution) { parser => 
astBuilder.visit(parser.singleStatement()) }.asInstanceOf[LogicalPlan]
 } else {
-  delegate.parsePlan(sqlText)
+  ViewSubstitutionExecutor.execute(delegate.parsePlan(sqlText))
+}
+  }
+
+  private object ViewSubstitutionExecutor extends RuleExecutor[LogicalPlan] {
+private val fixedPoint = FixedPoint(
+  maxIterations,
+  errorOnExceed = true,
+  maxIterationsSetting = SQLConf.ANALYZER_MAX_ITERATIONS.key)
+
+override protected def batches: Seq[Batch] = Seq(Batch("pre-substitution", 
fixedPoint, V2ViewSubstitution))

Review Comment:
   And it just occurred to me that we may not need an executor at all if we 
don't need to run to a fixed point. Can we just apply a command hijacking rule 
by itself instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-05 Thread via GitHub



rdblue commented on code in PR #9421:
URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443156571


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DropV2ViewExec.scala:
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.NoSuchViewException
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.ViewCatalog
+
+
+case class DropV2ViewExec(
+  catalog: ViewCatalog,
+  ident: Identifier,
+  ifExists: Boolean) extends LeafV2CommandExec {
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+if (catalog.viewExists(ident)) {
+  catalog.dropView(ident)

Review Comment:
   Doesn't this return a boolean? If so, we could avoid loading or checking 
existence and just call `dropView` directly. Then if that returns `false` and 
`ifExists` is false then we could throw `NoSuchViewException`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet: Move to ValueReader generation to a visitor [iceberg]

2024-01-05 Thread via GitHub



amogh-jahagirdar commented on code in PR #9063:
URL: https://github.com/apache/iceberg/pull/9063#discussion_r1443148385


##
parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java:
##
@@ -108,6 +110,113 @@ public ParquetValueReader struct(
 }
   }
 
+  private class LogicalTypeAnnotationParquetValueReaderVisitor
+  implements 
LogicalTypeAnnotation.LogicalTypeAnnotationVisitor> {
+
+private final ColumnDescriptor desc;
+private final org.apache.iceberg.types.Type.PrimitiveType expected;
+private final PrimitiveType primitive;
+
+LogicalTypeAnnotationParquetValueReaderVisitor(
+ColumnDescriptor desc,
+org.apache.iceberg.types.Type.PrimitiveType expected,
+PrimitiveType primitive) {
+  this.desc = desc;
+  this.expected = expected;
+  this.primitive = primitive;
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.StringLogicalTypeAnnotation stringLogicalType) {
+  return Optional.of(new ParquetValueReaders.StringReader(desc));
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumLogicalType) {
+  return Optional.of(new ParquetValueReaders.StringReader(desc));
+}
+
+@Override
+public Optional> visit(DecimalLogicalTypeAnnotation 
decimalLogicalType) {
+  switch (primitive.getPrimitiveTypeName()) {
+case BINARY:
+case FIXED_LEN_BYTE_ARRAY:
+  return Optional.of(
+  new ParquetValueReaders.BinaryAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+case INT64:
+  return Optional.of(
+  new ParquetValueReaders.LongAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+case INT32:
+  return Optional.of(
+  new ParquetValueReaders.IntegerAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+default:
+  throw new UnsupportedOperationException(
+  "Unsupported base type for decimal: " + 
primitive.getPrimitiveTypeName());
+  }
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.DateLogicalTypeAnnotation dateLogicalType) {
+  return Optional.of(new DateReader(desc));
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.TimeLogicalTypeAnnotation timeLogicalType) {
+  if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) {
+return Optional.of(new TimeReader(desc));
+  } else if (timeLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MILLIS) {
+return Optional.of(new TimeMillisReader(desc));
+  }
+
+  return Optional.empty();
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.TimestampLogicalTypeAnnotation 
timestampLogicalType) {
+  if (timestampLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MICROS) {
+Types.TimestampType tsMicrosType = (Types.TimestampType) expected;
+return tsMicrosType.shouldAdjustToUTC()
+? Optional.of(new TimestamptzReader(desc))
+: Optional.of(new TimestampReader(desc));
+  } else if (timestampLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MICROS) {

Review Comment:
   I think this should be == LogicalTypeAnnotation.TimeUnit.MILLIS? Not Micros



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Spark: Session level Iceberg table config defaults [iceberg]

2024-01-05 Thread via GitHub



kuldeepsinghchauhan commented on issue #2270:
URL: https://github.com/apache/iceberg/issues/2270#issuecomment-1879091855

   This is a very critical feature, In an actual production scenario where we 
are planning to rollout the iceberg format for operations and analytics user. 
we are currently stuck and cannot decide how to set up iceberg table level 
properties defaults in a central config file. I know we are talking about to 
set up this at spark session level however it would be best if iceberg can 
leverage properties from spark-deafults.conf


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-05 Thread via GitHub



rdblue commented on code in PR #9421:
URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443226670


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -635,6 +633,51 @@ private Catalog tableCatalog() {
 return Spark3Util.loadIcebergCatalog(spark, catalogName);
   }
 
+  @Test
+  public void dropView() {
+String viewName = "viewToBeDropped";
+String sql = String.format("SELECT id FROM %s", tableName);
+
+ViewCatalog viewCatalog = viewCatalog();
+
+TableIdentifier identifier = TableIdentifier.of(NAMESPACE, viewName);
+viewCatalog
+.buildView(identifier)
+.withQuery("spark", sql)
+.withDefaultNamespace(NAMESPACE)
+.withDefaultCatalog(catalogName)
+.withSchema(schema(sql))
+.create();
+
+assertThat(viewCatalog.viewExists(identifier)).isTrue();
+
+sql("DROP VIEW %s", viewName);
+assertThat(viewCatalog.viewExists(identifier)).isFalse();
+  }
+
+  @Test
+  public void dropNonExistingView() {
+assertThatThrownBy(() -> sql("DROP VIEW non_existing"))
+.isInstanceOf(AnalysisException.class)
+.hasMessageContaining("The view %s.%s cannot be found", NAMESPACE, 
"non_existing");
+
+assertThatNoException().isThrownBy(() -> sql("DROP VIEW IF EXISTS 
non_existing"));

Review Comment:
   Nit: I think that `IF EXISTS` is a separate option and probably deserves a 
separate test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-05 Thread via GitHub



rdblue commented on code in PR #9421:
URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443227383


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -635,6 +633,51 @@ private Catalog tableCatalog() {
 return Spark3Util.loadIcebergCatalog(spark, catalogName);
   }
 
+  @Test
+  public void dropView() {
+String viewName = "viewToBeDropped";
+String sql = String.format("SELECT id FROM %s", tableName);
+
+ViewCatalog viewCatalog = viewCatalog();
+
+TableIdentifier identifier = TableIdentifier.of(NAMESPACE, viewName);
+viewCatalog
+.buildView(identifier)
+.withQuery("spark", sql)
+.withDefaultNamespace(NAMESPACE)
+.withDefaultCatalog(catalogName)
+.withSchema(schema(sql))
+.create();
+
+assertThat(viewCatalog.viewExists(identifier)).isTrue();
+
+sql("DROP VIEW %s", viewName);
+assertThat(viewCatalog.viewExists(identifier)).isFalse();
+  }
+
+  @Test
+  public void dropNonExistingView() {
+assertThatThrownBy(() -> sql("DROP VIEW non_existing"))
+.isInstanceOf(AnalysisException.class)
+.hasMessageContaining("The view %s.%s cannot be found", NAMESPACE, 
"non_existing");
+
+assertThatNoException().isThrownBy(() -> sql("DROP VIEW IF EXISTS 
non_existing"));
+  }
+
+  @Test
+  public void dropGlobalTempView() {
+String globalTempView = "globalViewToBeDropped";

Review Comment:
   Are these cases just to validate that we don't mess up normal view 
creation/deletion? If so, can you add a comment?
   
   Do we also want to have one for v1 views in a different catalog?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Support dropping views [iceberg]

2024-01-05 Thread via GitHub



rdblue commented on code in PR #9421:
URL: https://github.com/apache/iceberg/pull/9421#discussion_r1443275368


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala:
##
@@ -45,14 +47,17 @@ case class ResolveViews(spark: SparkSession) extends 
Rule[LogicalPlan] with Look
   protected lazy val catalogManager: CatalogManager = 
spark.sessionState.catalogManager
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-case u@UnresolvedRelation(nameParts, _, _)
-  if catalogManager.v1SessionCatalog.isTempView(nameParts) =>
+case u@UnresolvedRelation(nameParts, _, _) if isTempView(nameParts) =>
   u
 
 case u@UnresolvedRelation(parts@CatalogAndIdentifier(catalog, ident), _, 
_) =>
   loadView(catalog, ident)
 .map(createViewRelation(parts, _))
 .getOrElse(u)
+
+case DropIcebergView(r@ResolvedIdentifier(catalog, ident), ifExists)
+  if isTempView(ident.asMultipartIdentifier) || !isViewCatalog(catalog) =>
+  DropView(r, ifExists)

Review Comment:
   Why do we need to convert back to a non-Iceberg plan? Is it that we always 
convert in the pre-substitution batch? If so, is it reasonable to detect 
whether it is an Iceberg `ViewCatalog` before converting rather than converting 
back?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Failed to assign splits due to the serialized split size [iceberg]

2024-01-05 Thread via GitHub



javrasya commented on issue #9410:
URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1879135064

   @pvary I wasn't aware of `splitOpenFileCost`, thank you for sharing that. 
   
   The way how I did it is that I introduced my own SplitAssignerFactory and 
SplitAssigner and pass that down to the source. Here is the code to that custom 
Splitassigner;
   
   https://gist.github.com/javrasya/98cfe90bd1a2585c56c4c3346a518477
   
   But the thing is that even though I manage to reduce the number of task per 
Split, it was still big enough to raise the same error. So it did not solve my 
problem.
   
   How do you limit the statistics, I didn't do anything apart from creating my 
table and ingesting the data into it. Also do you think a table with 14 columns 
is too wide? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix parquet default compression codec in 1.4.0 [iceberg-docs]

2024-01-05 Thread via GitHub



Fokko merged PR #301:
URL: https://github.com/apache/iceberg-docs/pull/301


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix parquet default compression codec in 1.4.1 [iceberg-docs]

2024-01-05 Thread via GitHub



Fokko merged PR #302:
URL: https://github.com/apache/iceberg-docs/pull/302


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix parquet default compression codec in 1.4.3 [iceberg-docs]

2024-01-05 Thread via GitHub



Fokko merged PR #304:
URL: https://github.com/apache/iceberg-docs/pull/304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix parquet default compression codec in 1.4.2 [iceberg-docs]

2024-01-05 Thread via GitHub



Fokko merged PR #303:
URL: https://github.com/apache/iceberg-docs/pull/303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix parquet default compression codec [iceberg-docs]

2024-01-05 Thread via GitHub



Fokko merged PR #305:
URL: https://github.com/apache/iceberg-docs/pull/305


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix PR link in 1.4.3 release notes [iceberg-docs]

2024-01-05 Thread via GitHub



Fokko merged PR #299:
URL: https://github.com/apache/iceberg-docs/pull/299


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] docs: Add link for iceberg rust [iceberg-docs]

2024-01-05 Thread via GitHub



Fokko merged PR #300:
URL: https://github.com/apache/iceberg-docs/pull/300


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API: Fix day partition transform result type [iceberg]

2024-01-05 Thread via GitHub



tdcmeehan commented on PR #9345:
URL: https://github.com/apache/iceberg/pull/9345#issuecomment-1879158054

   It seems the consensus is it's more straightforward to keep this as `int`, 
so I will close this.  Thanks folks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API: Fix day partition transform result type [iceberg]

2024-01-05 Thread via GitHub



tdcmeehan closed pull request #9345: API: Fix day partition transform result 
type
URL: https://github.com/apache/iceberg/pull/9345


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Adding Snowflake's public documentation [iceberg-docs]

2024-01-05 Thread via GitHub



Fokko commented on PR #297:
URL: https://github.com/apache/iceberg-docs/pull/297#issuecomment-1879159046

   @scottteal sorry for the late reply here, could you fix the conflicts once 
more?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] How can we build self-defined iceberg-spark-runtime jar package? [iceberg]

2024-01-05 Thread via GitHub



gaoshihang opened a new issue, #9425:
URL: https://github.com/apache/iceberg/issues/9425

   ### Query engine
   
   Spark
   
   ### Question
   
   How can we build self-defined iceberg-spark-runtime jar package?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Adding Snowflake's public documentation [iceberg-docs]

2024-01-05 Thread via GitHub



scottteal commented on PR #297:
URL: https://github.com/apache/iceberg-docs/pull/297#issuecomment-1879185914

   @Fokko I think I've now resolved the conflicts. I tried to preserve the 
ordering to match the current website as much as possible, although it's still 
unclear to me what the ordering logic is. They aren't sorted alphabetically, or 
grouped in a logical manner that I can tell.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet: Move to ValueReader generation to a visitor [iceberg]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #9063:
URL: https://github.com/apache/iceberg/pull/9063#discussion_r1443316711


##
parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java:
##
@@ -108,6 +110,113 @@ public ParquetValueReader struct(
 }
   }
 
+  private class LogicalTypeAnnotationParquetValueReaderVisitor
+  implements 
LogicalTypeAnnotation.LogicalTypeAnnotationVisitor> {
+
+private final ColumnDescriptor desc;
+private final org.apache.iceberg.types.Type.PrimitiveType expected;
+private final PrimitiveType primitive;
+
+LogicalTypeAnnotationParquetValueReaderVisitor(
+ColumnDescriptor desc,
+org.apache.iceberg.types.Type.PrimitiveType expected,
+PrimitiveType primitive) {
+  this.desc = desc;
+  this.expected = expected;
+  this.primitive = primitive;
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.StringLogicalTypeAnnotation stringLogicalType) {
+  return Optional.of(new ParquetValueReaders.StringReader(desc));
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.EnumLogicalTypeAnnotation enumLogicalType) {
+  return Optional.of(new ParquetValueReaders.StringReader(desc));
+}
+
+@Override
+public Optional> visit(DecimalLogicalTypeAnnotation 
decimalLogicalType) {
+  switch (primitive.getPrimitiveTypeName()) {
+case BINARY:
+case FIXED_LEN_BYTE_ARRAY:
+  return Optional.of(
+  new ParquetValueReaders.BinaryAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+case INT64:
+  return Optional.of(
+  new ParquetValueReaders.LongAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+case INT32:
+  return Optional.of(
+  new ParquetValueReaders.IntegerAsDecimalReader(desc, 
decimalLogicalType.getScale()));
+default:
+  throw new UnsupportedOperationException(
+  "Unsupported base type for decimal: " + 
primitive.getPrimitiveTypeName());
+  }
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.DateLogicalTypeAnnotation dateLogicalType) {
+  return Optional.of(new DateReader(desc));
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.TimeLogicalTypeAnnotation timeLogicalType) {
+  if (timeLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) {
+return Optional.of(new TimeReader(desc));
+  } else if (timeLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MILLIS) {
+return Optional.of(new TimeMillisReader(desc));
+  }
+
+  return Optional.empty();
+}
+
+@Override
+public Optional> visit(
+LogicalTypeAnnotation.TimestampLogicalTypeAnnotation 
timestampLogicalType) {
+  if (timestampLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MICROS) {
+Types.TimestampType tsMicrosType = (Types.TimestampType) expected;
+return tsMicrosType.shouldAdjustToUTC()
+? Optional.of(new TimestamptzReader(desc))
+: Optional.of(new TimestampReader(desc));
+  } else if (timestampLogicalType.getUnit() == 
LogicalTypeAnnotation.TimeUnit.MICROS) {

Review Comment:
   Oof, that's a good one! I would this part to be thoroughly tested as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Add the homepage to the `Cargo.toml` [iceberg-rust]

2024-01-05 Thread via GitHub



Fokko opened a new issue, #154:
URL: https://github.com/apache/iceberg-rust/issues/154

   Looks like we can add the homepage as well: 
https://doc.rust-lang.org/cargo/reference/manifest.html#the-homepage-field
   
   To our newly created https://rust.iceberg.apache.org/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Replace black by Ruff Formatter [iceberg-python]

2024-01-05 Thread via GitHub



Fokko merged PR #127:
URL: https://github.com/apache/iceberg-python/pull/127


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2024-01-05 Thread via GitHub



Fokko commented on code in PR #8919:
URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443340509


##
site/docs/community.md:
##
@@ -40,13 +40,13 @@ Issues are tracked in GitHub:
 
 ## Slack
 
-We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on 
Slack. To be invited, follow [this invite 
link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-1znkcg5zm-7_FE~pcox347XwZE3GNfPg).
+We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on 
Slack. To be invited, follow [this invite 
link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-287g3akar-K9Oe_En5j1UL7Y_Ikpai3A).

Review Comment:
   I checked and this is the latest one: 
https://github.com/apache/iceberg-docs/commit/87149e3da04eafad1c8766132f0bf1e822e4d075



##
.github/workflows/site-ci.yml:
##
@@ -1,3 +1,4 @@
+#

Review Comment:
   Looks like git sees this as a move 🤔 Not a real problem.



##
site/nav.yml:
##
@@ -0,0 +1,49 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+nav:
+  - Home: index.md
+  - Quickstart:
+- Spark: spark-quickstart.md
+- Hive: hive-quickstart.md
+  - Docs: 
+#- nightly: '!include docs/docs/nightly/mkdocs.yml'
+- latest: '!include docs/docs/latest/mkdocs.yml'
+- 1.4.2: '!include docs/docs/1.4.2/mkdocs.yml'

Review Comment:
   I think 1.4.3 is missing here.



##
site/docs/spec.md:
##
@@ -1128,6 +1125,41 @@ Example
  ] } ]
 ```
 
+### Content File (Data and Delete) Serialization

Review Comment:
   Not for this PR, but it would be good at some point to just include the spec 
from the repository itself.



##
site/docs/roadmap.md:
##
@@ -20,28 +20,37 @@ title: "Roadmap"
 
 # Roadmap Overview
 
-This roadmap outlines projects that the Iceberg community is working on, their 
priority, and a rough size estimate.
-This is based on the latest [community priority 
discussion](https://lists.apache.org/thread.html/r84e80216c259c81f824c6971504c321cd8c785774c489d52d4fc123f%40%3Cdev.iceberg.apache.org%3E).
+This roadmap outlines projects that the Iceberg community is working on.
 Each high-level item links to a Github project board that tracks the current 
status.
 Related design docs will be linked on the planning boards.
 
-# Priority 1
-
-* API: [Iceberg 1.0.0](https://github.com/apache/iceberg/projects/3) [medium]
-* Python: [Pythonic refactor](https://github.com/apache/iceberg/projects/7) 
[medium]
-* Spec: [Z-ordering / Space-filling 
curves](https://github.com/apache/iceberg/projects/16) [medium]
-* Spec: [Snapshot tagging and 
branching](https://github.com/apache/iceberg/projects/4) [small]
-* Views: [Spec](https://github.com/apache/iceberg/projects/6) [medium]
-* Puffin: [Implement statistics information in table 
snapshot](https://github.com/apache/iceberg/pull/4741) [medium]
-* Flink: [FLIP-27 based Iceberg 
source](https://github.com/apache/iceberg/projects/23) [large]
-
-# Priority 2
-
-* ORC: [Support delete files stored as 
ORC](https://github.com/apache/iceberg/projects/13) [small]
-* Spark: [DSv2 streaming 
improvements](https://github.com/apache/iceberg/projects/2) [small]
-* Flink: [Inline file 
compaction](https://github.com/apache/iceberg/projects/14) [small]
-* Flink: [Support UPSERT](https://github.com/apache/iceberg/projects/15) 
[small]
-* Spec: [Secondary indexes](https://github.com/apache/iceberg/projects/17) 
[large]
-* Spec v3: [Encryption](https://github.com/apache/iceberg/projects/5) [large]
-* Spec v3: [Relative paths](https://github.com/apache/iceberg/projects/18) 
[large]
-* Spec v3: [Default field 
values](https://github.com/apache/iceberg/projects/19) [medium]
+# General
+
+* [Multi-table transaction 
support](https://github.com/apache/iceberg/projects/30)
+* [Views Support](https://github.com/apache/iceberg/projects/29)
+* [Change Data Capture (CDC) 
Support](https://github.com/apache/iceberg/projects/26)
+* [Snapshot tagging and 
branching](https://github.com/apache/iceberg/projects/4)
+* [Inline file compaction](https://github.com/apache/iceberg/projects/14)
+* [Delete File compaction](https://github.com/apache/iceberg/projects/10)
+* [Z-ordering / Space-filling 
curves](https://github.

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2024-01-05 Thread via GitHub



bitsondatadev commented on code in PR #8919:
URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443356487


##
site/docs/community.md:
##
@@ -40,13 +40,13 @@ Issues are tracked in GitHub:
 
 ## Slack
 
-We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on 
Slack. To be invited, follow [this invite 
link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-1znkcg5zm-7_FE~pcox347XwZE3GNfPg).
+We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on 
Slack. To be invited, follow [this invite 
link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-287g3akar-K9Oe_En5j1UL7Y_Ikpai3A).

Review Comment:
   Oh yeah, I still need to figure out how to embed jitsi stuff there, another 
day.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Core: Add Schema#withUpdatedDoc and View#updateColumnDoc APIs [iceberg]

2024-01-05 Thread via GitHub



amogh-jahagirdar commented on PR #9414:
URL: https://github.com/apache/iceberg/pull/9414#issuecomment-1879243149

   > That line of reasoning makes me think that any change (even comments) 
should just produce a new version of the view.
   
   Yes, that's the approach in this PR currently. A new version is produced via 
ReplaceViewVersion (view.updateColumnDoc())  returns ReplaceViewVersion. In my 
next iteration, I'm going to just put the logic in the existing Replace API. 
That fits the operation pattern that exists throughout Iceberg better.
   
I think creating a new version is required, since we are producing a new 
schema and there *must* be a new version to point to the new schema. The issue 
is that currently the only API that exists for producing a new version is 
`replace` which requires users to pass in query/dialect pair(s), default 
namespace, schema, etc.
   
   The case for a new API is so that engine integrations don't need to 
determine all the queries/dialects/default namespace/catalog when the replace 
is performed for just column doc updates.
   
   More concretely, take the Trino case, which allows the following 
syntax:https://trino.io/docs/current/sql/comment.html
   
   ```
   COMMENT ON view.col IS 'some col'
   ```
   
   Currently, just using the replace API we'd have to implement creating the 
new Schema with the updated doc (the code in Schema#withUpdatedColumnDoc) and 
then we'd also want to preserve all the queries/dialects, default namespace, 
and default catalog so the integration has to pass the current state of all of 
those to the replace API. 
   
   ```
   public void updateColumn(String col, String doc) {
 View view = loadView(...)
   
 Schema updatedSchema = newSchemaWithColumnDoc(view.schema(), col, doc);
 view.replaceVersion()
   .withSchema(updatedSchema)
   .withQuery(allCurrentDialects, allTheirQueries)
  .withDefaultCatalog(view.currentVersion().defaultCatalog())
  .withNamespace(view.currentVersion().namespace())
   .commit()
   }
   
   private Schema newSchemaWithColumnDoc(Schema schema, String col, String doc) 
{
  NestedField fieldToUpdate = findField(name);
   Preconditions.checkArgument(fieldToUpdate != null, "Field %s does not 
exist", name);
   NestedField updatedField =
   NestedField.of(
   fieldToUpdate.fieldId(),
   fieldToUpdate.isOptional(),
   fieldToUpdate.name(),
   fieldToUpdate.type(),
   doc);
   
   List newFields = Lists.newArrayList();
   newFields.add(updatedField);
   
   for (NestedField field : columns()) {
 if (field.fieldId() != updatedField.fieldId()) {
   newFields.add(field);
 }
   }
   
   return new Schema(newFields, getAliases(), identifierFieldIds());
   
   }
   ```
   
   So the integration has to pass in a bunch of mostly irrelevant details into 
the replace just for updating the docs and if an engine integration misses a 
dialect to include in the replace, then updating the column doc has removed the 
other dialect unintentionally for the new version. Having an API in the library 
which takes care of these details would avoid any of these weird issues. 
   
   With the new API the integration should look something like:
   
   ```
   
   public void updateColumnDoc(String col, String doc) {
View view = loadView(...)
 view.replaceVersion()
  .withColumnDoc(col, doc)
   .commit()
   }

   ```
   
   >If we allow making schema changes, then would we also allow modifying the 
view by widening view columns? 
   
   I would advocate against this. These kinds of schema evolution capabilities 
on views don't seem that useful to me at the cost of a lot of complexity. I 
think there's a semantic distinction in those kinds of capabilities and adding 
docs. That's why I don't think it makes sense to have UpdateViewSchema 
implementation for views and discarded it for the current approach.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2024-01-05 Thread via GitHub



bitsondatadev commented on code in PR #8919:
URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443359408


##
site/nav.yml:
##
@@ -0,0 +1,49 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+nav:
+  - Home: index.md
+  - Quickstart:
+- Spark: spark-quickstart.md
+- Hive: hive-quickstart.md
+  - Docs: 
+#- nightly: '!include docs/docs/nightly/mkdocs.yml'
+- latest: '!include docs/docs/latest/mkdocs.yml'
+- 1.4.2: '!include docs/docs/1.4.2/mkdocs.yml'

Review Comment:
   I will update 1.4.3 on another PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Shift site build to use monorepo and gh-pages [iceberg]

2024-01-05 Thread via GitHub



bitsondatadev commented on code in PR #8919:
URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443361277


##
site/docs/community.md:
##
@@ -40,13 +40,13 @@ Issues are tracked in GitHub:
 
 ## Slack
 
-We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on 
Slack. To be invited, follow [this invite 
link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-1znkcg5zm-7_FE~pcox347XwZE3GNfPg).
+We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on 
Slack. To be invited, follow [this invite 
link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-287g3akar-K9Oe_En5j1UL7Y_Ikpai3A).

Review Comment:
   wait yeah this is the latest one haha and that's what's currently in there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] How can we build self-defined iceberg-spark-runtime jar package? [iceberg]

2024-01-05 Thread via GitHub



gaoshihang closed issue #9425: How can we build self-defined 
iceberg-spark-runtime jar package?
URL: https://github.com/apache/iceberg/issues/9425


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

1 2 >

1 - 100 of 117 matches

Mail list logo