Fokko commented on code in PR #8919: URL: https://github.com/apache/iceberg/pull/8919#discussion_r1443340509
########## site/docs/community.md: ########## @@ -40,13 +40,13 @@ Issues are tracked in GitHub: ## Slack -We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on Slack. To be invited, follow [this invite link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-1znkcg5zm-7_FE~pcox347XwZE3GNfPg). +We use the [Apache Iceberg workspace](https://apache-iceberg.slack.com/) on Slack. To be invited, follow [this invite link](https://join.slack.com/t/apache-iceberg/shared_invite/zt-287g3akar-K9Oe_En5j1UL7Y_Ikpai3A). Review Comment: I checked and this is the latest one: https://github.com/apache/iceberg-docs/commit/87149e3da04eafad1c8766132f0bf1e822e4d075 ########## .github/workflows/site-ci.yml: ########## @@ -1,3 +1,4 @@ +# Review Comment: Looks like git sees this as a move 🤔 Not a real problem. ########## site/nav.yml: ########## @@ -0,0 +1,49 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +nav: + - Home: index.md + - Quickstart: + - Spark: spark-quickstart.md + - Hive: hive-quickstart.md + - Docs: + #- nightly: '!include docs/docs/nightly/mkdocs.yml' + - latest: '!include docs/docs/latest/mkdocs.yml' + - 1.4.2: '!include docs/docs/1.4.2/mkdocs.yml' Review Comment: I think 1.4.3 is missing here. ########## site/docs/spec.md: ########## @@ -1128,6 +1125,41 @@ Example ] } ] ``` +### Content File (Data and Delete) Serialization Review Comment: Not for this PR, but it would be good at some point to just include the spec from the repository itself. ########## site/docs/roadmap.md: ########## @@ -20,28 +20,37 @@ title: "Roadmap" # Roadmap Overview -This roadmap outlines projects that the Iceberg community is working on, their priority, and a rough size estimate. -This is based on the latest [community priority discussion](https://lists.apache.org/thread.html/r84e80216c259c81f824c6971504c321cd8c785774c489d52d4fc123f%40%3Cdev.iceberg.apache.org%3E). +This roadmap outlines projects that the Iceberg community is working on. Each high-level item links to a Github project board that tracks the current status. Related design docs will be linked on the planning boards. -# Priority 1 - -* API: [Iceberg 1.0.0](https://github.com/apache/iceberg/projects/3) [medium] -* Python: [Pythonic refactor](https://github.com/apache/iceberg/projects/7) [medium] -* Spec: [Z-ordering / Space-filling curves](https://github.com/apache/iceberg/projects/16) [medium] -* Spec: [Snapshot tagging and branching](https://github.com/apache/iceberg/projects/4) [small] -* Views: [Spec](https://github.com/apache/iceberg/projects/6) [medium] -* Puffin: [Implement statistics information in table snapshot](https://github.com/apache/iceberg/pull/4741) [medium] -* Flink: [FLIP-27 based Iceberg source](https://github.com/apache/iceberg/projects/23) [large] - -# Priority 2 - -* ORC: [Support delete files stored as ORC](https://github.com/apache/iceberg/projects/13) [small] -* Spark: [DSv2 streaming improvements](https://github.com/apache/iceberg/projects/2) [small] -* Flink: [Inline file compaction](https://github.com/apache/iceberg/projects/14) [small] -* Flink: [Support UPSERT](https://github.com/apache/iceberg/projects/15) [small] -* Spec: [Secondary indexes](https://github.com/apache/iceberg/projects/17) [large] -* Spec v3: [Encryption](https://github.com/apache/iceberg/projects/5) [large] -* Spec v3: [Relative paths](https://github.com/apache/iceberg/projects/18) [large] -* Spec v3: [Default field values](https://github.com/apache/iceberg/projects/19) [medium] +# General + +* [Multi-table transaction support](https://github.com/apache/iceberg/projects/30) +* [Views Support](https://github.com/apache/iceberg/projects/29) +* [Change Data Capture (CDC) Support](https://github.com/apache/iceberg/projects/26) +* [Snapshot tagging and branching](https://github.com/apache/iceberg/projects/4) +* [Inline file compaction](https://github.com/apache/iceberg/projects/14) +* [Delete File compaction](https://github.com/apache/iceberg/projects/10) +* [Z-ordering / Space-filling curves](https://github.com/apache/iceberg/projects/16) +* [Support UPSERT](https://github.com/apache/iceberg/projects/15) + +# Clients +_Rust and Go projects are pointing to their respective repositories which include Review Comment: ```suggestion _Python, Rust and Go projects are pointing to their respective repositories which include ``` ########## site/dev/common.sh: ########## @@ -0,0 +1,228 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +set -e + +REMOTE="iceberg_docs" + +# Ensures the presence of a specified remote repository for documentation. +# If the remote doesn't exist, it adds it using the provided URL. +# Then, it fetches updates from the remote repository. +create_or_update_docs_remote () { + echo " --> create or update docs remote" + + # Check if the remote exists before attempting to add it + git config "remote.${REMOTE}.url" >/dev/null || + git remote add "${REMOTE}" https://github.com/apache/iceberg.git + + # Fetch updates from the remote repository + git fetch "${REMOTE}" +} + + +# Pulls updates from a specified branch of a remote repository. +# Arguments: +# $1: Branch name to pull updates from +pull_remote () { + echo " --> pull remote" + + local BRANCH="$1" + + # Ensure the branch argument is not empty + assert_not_empty "${BRANCH}" + + # Perform a pull from the specified branch of the remote repository + git pull "${REMOTE}" "${BRANCH}" +} + +# Pushes changes from a local branch to a specified branch of a remote repository. +# Arguments: +# $1: Branch name to push changes to +push_remote () { + echo " --> push remote" + + local BRANCH="$1" + + # Ensure the branch argument is not empty + assert_not_empty "${BRANCH}" + + # Push changes to the specified branch of the remote repository + git push "${REMOTE}" "${BRANCH}" +} + +# Installs or upgrades dependencies specified in the 'requirements.txt' file using pip. +install_deps () { + echo " --> install deps" + + # Use pip to install or upgrade dependencies from the 'requirements.txt' file quietly + pip -q install -r requirements.txt --upgrade +} + +# Checks if a provided argument is not empty. If empty, displays an error message and exits with a status code 1. +# Arguments: +# $1: Argument to check for emptiness +assert_not_empty () { + + if [ -z "$1" ]; then + echo "No argument supplied" + + # Exit with an error code if no argument is provided + exit 1 + fi +} + +# Finds and retrieves the latest version of the documentation based on the directory structure. +# Assumes the documentation versions are numeric folders within 'docs/docs/'. +get_latest_version () { + # Find the latest numeric folder within 'docs/docs/' structure + local latest=$(ls -d docs/docs/[0-9]* | sort -V | tail -1) + + # Extract the version number from the latest directory path + local latest_version=$(basename "${latest}") + + # Output the latest version number + echo "${latest_version}" +} + +# Creates a symbolic link for a 'nightly' version of the documentation. +create_nightly () { + echo " --> create nightly" + + # Remove any existing 'nightly' symbolic link to prevent conflicts + rm -f docs/docs/nightly/ + + # Create a symbolic link pointing to the 'nightly' documentation + ln -s ../nightly ../docs +} + +# Creates a 'latest' version of the documentation based on a specified ICEBERG_VERSION. +# Arguments: +# $1: ICEBERG_VERSION - The version number of the documentation to be treated as the latest. +create_latest () { + echo " --> create latest" + + local ICEBERG_VERSION="$1" + + # Ensure ICEBERG_VERSION is not empty + assert_not_empty "${ICEBERG_VERSION}" + + # Output the provided ICEBERG_VERSION for verification + echo "${ICEBERG_VERSION}" + + # Remove any existing 'latest' directory and recreate it + rm -rf docs/docs/latest/ + mkdir docs/docs/latest/ + + # Create symbolic links and copy configuration files for the 'latest' documentation + ln -s "../${ICEBERG_VERSION}/docs" docs/docs/latest/docs + cp "docs/docs/${ICEBERG_VERSION}/mkdocs.yml" docs/docs/latest/ + + cd docs/docs/ + + # Update version information within the 'latest' documentation + update_version "latest" + cd - +} + +# Updates version information within the mkdocs.yml file for a specified ICEBERG_VERSION. +# Arguments: +# $1: ICEBERG_VERSION - The version number used for updating the mkdocs.yml file. +update_version () { + echo " --> update version" + + local ICEBERG_VERSION="$1" + + # Ensure ICEBERG_VERSION is not empty + assert_not_empty "${ICEBERG_VERSION}" + + # Update version information within the mkdocs.yml file using sed commands + if [ "$(uname)" == "Darwin" ] + then + sed -i '' -E "s/(^site\_name:[[:space:]]+docs\/).*$/\1${ICEBERG_VERSION}/" ${ICEBERG_VERSION}/mkdocs.yml + sed -i '' -E "s/(^[[:space:]]*-[[:space:]]+Javadoc:.*\/javadoc\/).*$/\1${ICEBERG_VERSION}/" ${ICEBERG_VERSION}/mkdocs.yml + elif [ "$(expr substr $(uname -s) 1 5)" == "Linux" ] + then + sed -i'' -E "s/(^site_name:[[:space:]]+docs\/)[^[:space:]]+/\1${ICEBERG_VERSION}/" "${ICEBERG_VERSION}/mkdocs.yml" + sed -i'' -E "s/(^[[:space:]]*-[[:space:]]+Javadoc:.*\/javadoc\/).*$/\1${ICEBERG_VERSION}/" "${ICEBERG_VERSION}/mkdocs.yml" + fi + +} + +# Excludes versioned documentation from search indexing by modifying .md files. +# Arguments: +# $1: ICEBERG_VERSION - The version number of the documentation to exclude from search indexing. +search_exclude_versioned_docs () { + echo " --> search exclude version docs" + local ICEBERG_VERSION="$1" + + # Ensure ICEBERG_VERSION is not empty + assert_not_empty "${ICEBERG_VERSION}" + + cd "${ICEBERG_VERSION}/docs/" + + # Modify .md files to exclude versioned documentation from search indexing + python3 -c "import os +for f in filter(lambda x: x.endswith('.md'), os.listdir()): lines = open(f).readlines(); open(f, 'w').writelines(lines[:2] + ['search:\n', ' exclude: true\n'] + lines[2:]);" + + cd - +} + +# Sets up local worktrees for the documentation and performs operations related to different versions. +pull_versioned_docs () { + echo " --> pull versioned docs" + + # Ensure the remote repository for documentation exists and is up-to-date + create_or_update_docs_remote + + rm -r docs/docs + + # Add local worktrees for documentation and javadoc from the remote repository + git worktree add docs/docs "${REMOTE}/docs" + git worktree add docs/javadoc "${REMOTE}/javadoc" Review Comment: I had still some old cruft on my worktree, so I needed to do this: ```suggestion git worktree add -f docs/docs "${REMOTE}/docs" git worktree add -f docs/javadoc "${REMOTE}/javadoc" ``` ########## site/docs/releases.md: ########## @@ -64,7 +67,140 @@ To add a dependency on Iceberg in Maven, add the following to your `pom.xml`: </dependencies> ``` -## 1.3.1 release +### 1.4.3 Release + +Apache Iceberg 1.4.3 was released on December 27, 2023. The main issue it solves is missing files from a transaction retry with conflicting manifests. It is recommended to upgrade if you use transactions. + +- Core: Scan only live entries in partitions table (#8969) by @Fokko in [#9197](https://github.com/apache/iceberg/pull/9197) +- Core: Fix missing files from transaction retries with conflicting manifest merges by [@nastra](https://github.com/nastra) in [#9337]O(https://github.com/apache/iceberg/pull/9337) Review Comment: This was caught by @manuzhang in https://github.com/apache/iceberg-docs/pull/299 🙌 ```suggestion - Core: Fix missing files from transaction retries with conflicting manifest merges by [@nastra](https://github.com/nastra) in [#9337](https://github.com/apache/iceberg/pull/9337) ``` ########## .github/workflows/site-ci.yml: ########## @@ -14,13 +15,24 @@ # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. - -extra: - icebergVersion: 1.4.0 - social: - - icon: fontawesome/brands/github-alt - link: https://github.com/apache/iceberg - - icon: fontawesome/brands/youtube - link: https://www.youtube.com/@ApacheIceberg - - icon: fontawesome/brands/slack - link: https://join.slack.com/t/apache-iceberg/shared_invite/zt-1znkcg5zm-7_FE~pcox347XwZE3GNfPg +# +name: site-ci +on: + push: + branches: + - main + paths: + - site/** + workflow_dispatch: +jobs: + deploy: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + - uses: actions/setup-python@v4 + with: + python-version: 3.x + - name: Deploy Iceberg documentation + run: | + make deploy Review Comment: Nit: ```suggestion run: make deploy ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org