GitHub user tuhaihe edited a discussion: [Proposal] Apache Cloudberry (Incubating) Roadmap
### Proposers - Max Yang (https://github.com/my-ship-it) - Lirong Jian (https://github.com/jianlirong) - Dianjin Wang (https://github.com/tuhaihe) - Ed Espino (https://github.com/edespino) - Leonid Borchuk ([email protected]) **Proposal Contributors** - Maxim Smyatkin ([email protected]) - Greg Spiegelberg ([email protected]) - 陈淼 - Jun Sheng([email protected]) - Joshua Drake - Louis Mugnano - Tushar Pednekar ([email protected]) ### Proposal Status Completed ### Abstract As we have discussed the roadmap on the mailing list and Google Docs, I would like to copy the details to GitHub Discussions to provide an overview of the roadmap. - Mailing list thread: https://lists.apache.org/thread/c83jo8lxowvy7pqvs4txprt8zfvj649w - Google Docs: https://docs.google.com/document/d/1qLHNYKfGRh6Sed9Bbcuik3Yfu2w11PZ89izlQ1Z73CI/edit?usp=sharing ### Motivation Along with Apache Cloudberry™️ (Incubating) starting its incubation journey under the Apache umbrella, we would like to submit one proposal on the Apache Cloudberry roadmap. This roadmap outlines the key milestones and paths of the Apache Cloudberry project in the future. It aims to illustrate what Apache Cloudberry will be in the short-term, middle-term, or long-term. It's hard to cover everything, but we hope the community members can review this documentation to leave comments or feedback. Also, we can review these items to see the progress half-yearly or quarterly to decide whether they need to be updated at the community meetings when necessary. Welcome to have your feedback to help shape the future of Apache Cloudberry together. ### Implementation ## Overview Before going into details, let's see the landscape we want Cloudberry should focus on: * Start the incubation journey and graduate from Apache Incubator to be one Apache Top Level Project in a few months or 1\~ year. * Cherry-pick the commits from Greenplum to Cloudberry to catch up on the latest Greenplum’s open-source version codebase. * Upgrade the PostgreSQL kernel yearly to let users utilize the features and enhancements introduced by the newer PostgreSQL. * Strengthen the stability by introducing more test frameworks. Optimize the performance and high availability, Introduce the minor version binary compatible tests, and more. * Improve the usability and user experiences to lower the bar for installation, deployment, operation, management, observability, etc. * Provide more modern solutions including Streaming/Real-time, Lakehouse, and AI/ML around Apache Cloudberry. * Build and grow the ecosystem, including the tools, and integrations with other ASF projects and the upstream or downstream projects. ## Details ### Community Meetings We would like to start and help coordinate regular community meetings monthly (can be biweekly if there are more active developments). The initial meetings will aim to discuss the key features and plans of the project and involve developers and users from the Americas, EMEA, and Asia-Pacific. However, finding an ideal time for all regions may be challenging. If some people would like to help organize other focus meetings, that would be great, including the marketing focus meeting, or the CI/CD focus meeting, allowing for deeper and more streamlined discussions. These meetings will be open to all the community members, not limited to PPMC members. *Time: TBD (Option: last Friday 8 am UTC/11:00 am UTC+3/16:00 UTC+8 / 1:00 am UTC-7 every month.)* The meeting will take about 45\~ minutes. Everyone can apply to be the volunteer host to help manage the meeting efficiently and take meeting notes. The meeting language will be in English preferred. Meeting agendas should be prepared in advance and documented in the cwiki or Google Docs (preferred, everyone can comment. ). Additionally, the sessions will be recorded for easy follow-up and reference. We will do meeting tools research, like Zoom, Jit.si Meet, Google Meet, and other tools. (We can start one new mailing list thread on this to talk more.) ### Apache Incubation and Graduation In the coming times, we will develop the Cloudberry and build the community following the Apache Way. This may take a few months or 1\~ year for Cloudberry to graduate from the incubator to become a Top Level Project. There's lots to do for us. We will take the incubator website (https://incubator.apache.org/), Apache Project Maturity Model (https://community.apache.org/apache-way/apache-project-maturity-model.html) and more ASF policy or guides as reference and process with the help from our mentors. ### Cherry-pick from Greenplum to Cloudberry (Highest Priority) As you know, Cloudberry takes Greenplum 7.0.0-beta.3 and newer PostgreSQL kernel as the codebase. Firstly, we plan to cherry-pick the commits from the archived open-source Greenplum to Cloudberry to catch up the Greenplum's latest code (See [https://github.com/apache/cloudberry/discussions/675](https://github.com/apache/cloudberry/discussions/675)). - **Updates - 10/17, 2025:** - [x] Now this task should be almost done; you can track its progress here: [https://github.com/apache/cloudberry/discussions/675](https://github.com/apache/cloudberry/discussions/675). We will cherry-pick any more necessary commits in the future. ### PostgreSQL Kernel Upgrade (TBD) Then, we would upgrade the built-in PostgreSQL kernel annually to help Cloudberry users utilize the features and enhancements introduced by the newer stable versions of PostgreSQL. The target upgrade PostgreSQL version will be two versions behind the latest released PostgreSQL of that year. Eg, In 2024, the latest PG version is 17, so we will upgrade the PostgreSQL kernel to PG15.x (=17-2). For a long-term strategy, we want to split the features and decouple them from the PostgreSQL kernel to make Cloudberry components more pluggable. Now interconnect has been pluggable, the dispatcher, optimizer, planner, and transaction management (2-phase commit) are still waiting to be done. Welcome to have your contributions on these. - **Updates - 10/17, 2025:** - [x] PostgreSQL 14 ~> PostgreSQL 16 kernel upgrade work is in progress: https://lists.apache.org/thread/1b5sr96315txsvs1zg65vsd1n01kf0ql ### Performance and Usability - [x] Support hybrid Row-Column storage, inspired by Partition Attributes Across (https://www.vldb.org/conf/2001/P169.pdf), which has the same write performance as AO tables and the same read performance as AOCS tables. We will also integrate the latest compression algorithms and encoding algorithms (such as dictionary encoding) into it. - **Updates - 10/17, 2025:** https://github.com/apache/cloudberry/tree/main/contrib/pax_storage * Support vectorization execution engine to optimize the query performance. * Refactor the dispatch logic for improved efficiency. - [x] Refactor the Materialized view and query for external tables. - [x] Support parallel execution in ORCA. - **Updates - 10/17, 2025:** https://github.com/apache/cloudberry/pull/1398 - [x] Parallel query optimization to support more SQL operators. - **Updates - 10/17, 2025:** https://github.com/apache/cloudberry/pull/1261 * Projection support (materialized view for AO tables). * Support more than 10 tables in ORCA, improve search space exploration * ORCA time limits (based on the number of permutations or optimization time) * … * …. ### Availability improvements * Support cold standby - [x] Support hot (read-only) standby - **Updates - 10/17, 2025:** https://github.com/apache/cloudberry/pull/1268 * Support ephemeral temporary objects * Support cluster write barrier: with this we can take consistent snapshot for all disks on all cluster nodes * Support more than one segment mirror * Support mirrors in different AZ * Support quorum replication and jepsen tests to check if we not lose data * Graceful segment shutdown - [x] Robust resource groups isolation \- IO/CPU/Memory/Network - **Updates - 10/17, 2025:** Already supported in Kernel ### Functionality improvements * Support Compute/Storage decouple by introducing Yezzey ([https://github.com/open-gpdb/yezzey](https://github.com/open-gpdb/yezzey)). * Built-in Time Travel, select data from the past * Support all types of queries in ORCA - [x] Pg\_hint\_plan for ORCA. * Stored hints and plan stability **Quality Assurance** * Introduce more testing frameworks and methodology of open source. For example, introduce automatic SQL generation testing, SQLancer, and Chaos testing for system robustness. * Refactor current ICW cases to reduce running time. * Binary swap tests between minor versions ### Usability * Disaster Recovery \- providing disaster recovery capabilities for Cloudberry to enable point-in-time recovery (PITR) to recover the Cloudberry cluster to a certain restore point in the case of a disaster. * Cloudberry Central Console (like GPCC) * Support upgrading tools for in-place upgrade. * K8S deployment support for Cloudberry Database * ~~Migration tool from Oracle/MSSQL database~~ * Rename the gp\* or greenplum\* related commands or keywords to cb\* or cloudberry\* for better compliance. But we need to create aliases to bridge the old and new ones to let users have a smooth transition. ### Streaming / Real-time - [x] Implementing kafka\_fdw extension to enable streaming data from Kafka to Cloudberry. - **Updates - 10/17, 2025:** https://github.com/cloudberry-contrib/kafka_fdw - [ ] Integration with Flink CDC / Kafka connector to support near real-time data integration. - **Updates - 10/17, 2025** - [x] Flink Connector JDBC - https://github.com/apache/flink-connector-jdbc/commit/544275c8c8b03426b71192b0dde39bc51c041bab - [x] Support Dynamic Tables. - [x] **Updates - 10/17, 2025:** https://cloudberry.apache.org/docs/performance/use-dynamic-tables ### Lakehouse * Integration with various data lakes (including Iceberg, Hudi, Delta Lake, and more) as plugins. For the integration with Apache Iceberg, see the discussion: [https://github.com/apache/cloudberry/discussions/369](https://github.com/apache/cloudberry/discussions/369) . ### ### AI/ML * Integration with Ray (https://www.ray.io/) to support AI/ML workloads. (High priority) * Working with Apache MADlib community to support Cloudberry natively in MADlib upstream codebase. * Support graph query: AI applications are moving to using graph to fetch knowledge for better outcomes ### Utilities and Ecosystem We aim to let Cloudberry as the first-class citizen be supported in the ecosystem, not just doing some minor updates based on the Greenplum supports in the upstream tools. - [ ] Cherry-pick the latest commits from the original Greenplum projects to Cloudberry, including cloudberry-pxf, cloudberry-gpbackup, cloudberry-gpbackup-s3-plugin, cloudberry-go-libs. - [x] **Updates - 10/17, 2025** - [x] Cloudberry-gpbackup has been renamed to cloudberry-backup, its codebase has synced with the GP’s archived version: https://github.com/apache/cloudberry-backup - [x] Cloudberry-go-libs: cloudberry-go-libs has synced with the GP’s archived version: https://github.com/apache/cloudberry-go-libs - [x] Cloudberry-gpbackup-s3-plugin: this repo has been archived and its core files are merged into the cloudberry-backup: https://github.com/apache/cloudberry-backup/tree/main/plugins/s3plugin - ⚠️Cloudberry-pxf: still in progress on the archived commits sync to Cloudberry. - [x] Support PGRX to support writing UDFs in Rust in Cloudberry. - **Updates - 10/17, 2025**: https://github.com/cloudberry-contrib/pgrx - [x] DBeaver for Cloudberry - **Updates - 10/17, 2025**: It has supported Cloudberry since its 25.2.2: https://github.com/dbeaver/dbeaver/releases - [x] JDBC/ODBC for Cloudberry - **Updates - 10/17, 2025**: we can use the PostgreSQL JDBC/ODBC drive for Cloudberry * Container Service for Cloudberry Database * Cloudberry command center for database, query and resource management * TPC-H / TPC-DS benchmark for Cloudberry * Integrations with other ASF projects - [x] Apache SeaTunnel \- Source(V2) of SeaTunnel \- Greenplum \- [https://seatunnel.apache.org/docs/2.3.6/category/source-v2](https://seatunnel.apache.org/docs/2.3.6/category/source-v2) * …… > Note: > The original GitHub organization > [github.com/cloudberrydb](http://github.com/cloudberrydb) will be renamed to > [github.com/cloudberry-contrib](http://github.com/cloudberry-contrib), which > includes some Cloudberry developers but is not officially maintained by > Cloudberry PPMC. The org will store some non-Apache License extensions or > projects for Cloudberry, like pgvector, PL/Java, PL/R and more tools. > * Pgvector version upgrade from 0.5.x to 0.8.x > * PostGIS version upgrade from 2.5.x to 3.3.2 ### Release Management We will establish a predictable and sustainable release process to provide stable software to our users while maintaining quality: #### Release Cadence * Major releases quarterly (x.y.0) * Minor releases (x.y.z) as needed for critical fixes - [x] First Apache release targeted for \[DATE\] - **Updates - 10/17, 2025**: the first Apache release can be downloaded here: https://cloudberry.apache.org/releases * Release candidates to undergo minimum 2-week community testing #### Version Management * Follow semantic versioning (MAJOR.MINOR.PATCH) * Major version: Incompatible API/ABI changes * Minor version: New functionality in backward-compatible manner * Patch version: Backward-compatible bug fixes * Pre-release versions marked with \-alpha, \-beta, \-rc suffixes #### Release Process - [x] Documented release procedures following Apache guidelines - **Updates - 10/17, 2025**: https://github.com/apache/cloudberry/wiki - [x] Automated release preparation and verification tools - **Updates - 10/17, 2025**: https://github.com/apache/cloudberry/tree/main/devops/release - [x] Release notes and migration guides for each version - [x] Security vulnerability handling process - **Updates - 10/17, 2025**: see https://github.com/apache/cloudberry/blob/main/SECURITY.md ### Release & Pipelines Our goal is to make Cloudberry's CICD workflow more flexible, robust, and automated, which also can be reused by the community users and developers in their environments. - [x] Introduce the new build, test, and deployment workflows for Cloudberry based on GitHub Actions and Docker. - **Updates - 10/17, 2025**: https://github.com/apache/cloudberry/tree/main/.github/workflows * Support more OS matrices and artifacts, Docker images, including Rocky Linux, Debian, and Ubuntu. * Support more CPU arch, including x86\_64, ARM, RISCV, and LoongArch. - [x] Support skipping the CICD workflow for some pull requests with specified file formats or directories in the main repo, like \*.txt, \*.md, \*.mdx, \*.png, and /doc dir to save test resources. * Add comments commands for the pull request review, like /build ,/rebase , /ok-to-test to trigger the commands by PR authors and project committers, which also can help reduce the cost. Reference: http://prow.k8s.io/command-help. * Add git pre-commit workflow to help check the commit message conventions. * Ansible playbook on cloud provider. ### Website, Documents and Marketing This part will include some short-term and mid-term items we want to do for the website, documents, and marketing. * Website: - [x] Clean up the website source code. - [x] Update the disclaimer and check as per the ASF brand policy and the Podling website guide. - [x] Optimize the website style and redesign some pages like the homepage or blog index page. * Documents: - [x] Restructure the existing documents to make them more organized. * Cherry-pick the doc updates from Greenplum and PostgreSQL. * Generate more new documents to align with the project features. * Work with the developer to create the development guide. * Marketing: * Adopt ASF social media guidelines for Cloudberry and create the workstream for social media platforms. ### Rollout/Adoption Plan _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! --- ### Update log - Updates - 10/17, 2025 GitHub link: https://github.com/apache/cloudberry/discussions/868 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
