GitHub user tuhaihe edited a discussion: [Proposal] Apache Cloudberry 
(Incubating) Roadmap

### Proposers

- Max Yang (https://github.com/my-ship-it)
- Lirong Jian (https://github.com/jianlirong)
- Dianjin Wang  (https://github.com/tuhaihe)
- Ed Espino (https://github.com/edespino)
- Leonid Borchuk ([email protected])

**Proposal Contributors**
- Maxim Smyatkin ([email protected])
- Greg Spiegelberg ([email protected])
- 陈淼
- Jun Sheng([email protected])
- Joshua Drake
- Louis Mugnano
- Tushar Pednekar ([email protected])

### Proposal Status

Completed

### Abstract

As we have discussed the roadmap on the mailing list and Google Docs, I would 
like to copy the details to GitHub Discussions to provide an overview of the 
roadmap.

- Mailing list thread: 
https://lists.apache.org/thread/c83jo8lxowvy7pqvs4txprt8zfvj649w
- Google Docs: 
https://docs.google.com/document/d/1qLHNYKfGRh6Sed9Bbcuik3Yfu2w11PZ89izlQ1Z73CI/edit?usp=sharing

### Motivation

Along with Apache Cloudberry™️ (Incubating) starting its incubation journey 
under the Apache umbrella, we would like to submit one proposal on the Apache 
Cloudberry roadmap. This roadmap outlines the key milestones and paths of the 
Apache Cloudberry project in the future. It aims to illustrate what Apache 
Cloudberry will be in the short-term, middle-term, or long-term. It's hard to 
cover everything, but we hope the community members can review this 
documentation to leave comments or feedback. Also, we can review these items to 
see the progress half-yearly or quarterly to decide whether they need to be 
updated at the community meetings when necessary.

Welcome to have your feedback to help shape the future of Apache Cloudberry 
together.

### Implementation

## Overview

Before going into details, let's see the landscape we want Cloudberry should 
focus on:

* Start the incubation journey and graduate from Apache Incubator to be one 
Apache Top Level Project in a few months or 1\~ year.  
* Cherry-pick the commits from Greenplum to Cloudberry to catch up on the 
latest Greenplum’s open-source version codebase.  
* Upgrade the PostgreSQL kernel yearly to let users utilize the features and 
enhancements introduced by the newer PostgreSQL.  
* Strengthen the stability by introducing more test frameworks. Optimize the 
performance and high availability, Introduce the minor version binary 
compatible tests, and more.  
* Improve the usability and user experiences to lower the bar for installation, 
deployment, operation,  management, observability, etc.  
* Provide more modern solutions including Streaming/Real-time, Lakehouse, and 
AI/ML around Apache Cloudberry.  
* Build and grow the ecosystem, including the tools, and integrations with 
other ASF projects and the upstream or downstream projects.

## Details

### Community Meetings

We would like to start and help coordinate regular community meetings monthly 
(can be biweekly if there are more active developments). The initial meetings 
will aim to discuss the key features and plans of the project and involve 
developers and users from the Americas, EMEA, and Asia-Pacific. However, 
finding an ideal time for all regions may be challenging.   
If some people would like to help organize other focus meetings, that would be 
great, including the marketing focus meeting, or the CI/CD focus meeting, 
allowing for deeper and more streamlined discussions. These meetings will be 
open to all the community members, not limited to PPMC members.

*Time: TBD (Option: last Friday 8 am UTC/11:00 am UTC+3/16:00 UTC+8 / 1:00 am 
UTC-7 every month.)*

The meeting will take about 45\~ minutes. Everyone can apply to be the 
volunteer host to help manage the meeting efficiently and take meeting notes. 
The meeting language will be in English preferred.

Meeting agendas should be prepared in advance and documented in the cwiki or 
Google Docs (preferred, everyone can comment. ). Additionally, the sessions 
will be recorded for easy follow-up and reference.

We will do meeting tools research, like Zoom, Jit.si Meet, Google Meet, and 
other tools.

(We can start one new mailing list thread on this to talk more.)

### Apache Incubation and Graduation

In the coming times, we will develop the Cloudberry and build the community 
following the Apache Way. This may take a few months or 1\~ year for Cloudberry 
to graduate from the incubator to become a Top Level Project. There's lots to 
do for us. 

We will take the incubator website (https://incubator.apache.org/), Apache 
Project Maturity Model 
(https://community.apache.org/apache-way/apache-project-maturity-model.html) 
and more ASF policy or guides as reference and process with the help from our 
mentors.

### Cherry-pick from Greenplum to Cloudberry (Highest Priority)

As you know, Cloudberry takes Greenplum 7.0.0-beta.3 and newer PostgreSQL 
kernel as the codebase. Firstly, we plan to cherry-pick the commits from the 
archived open-source Greenplum to Cloudberry to catch up the Greenplum's latest 
code (See 
[https://github.com/apache/cloudberry/discussions/675](https://github.com/apache/cloudberry/discussions/675)).

- **Updates - 10/17, 2025:**
  - [x] Now this task should be almost done; you can track its progress here: 
[https://github.com/apache/cloudberry/discussions/675](https://github.com/apache/cloudberry/discussions/675).
 We will cherry-pick any more necessary commits in the future.

### PostgreSQL Kernel Upgrade (TBD)

Then, we would upgrade the built-in PostgreSQL kernel annually to help 
Cloudberry users utilize the features and enhancements introduced by the newer 
stable versions of PostgreSQL. The target upgrade PostgreSQL version will be 
two versions behind the latest released PostgreSQL of that year. Eg, In 2024, 
the latest PG version is 17, so we will upgrade the PostgreSQL kernel to PG15.x 
(=17-2).

For a long-term strategy, we want to split the features and decouple them from 
the PostgreSQL kernel to make Cloudberry components more pluggable. Now 
interconnect has been pluggable, the dispatcher, optimizer, planner, and 
transaction management (2-phase commit) are still waiting to be done. Welcome 
to have your contributions on these.

- **Updates - 10/17, 2025:**
  - [x] PostgreSQL 14 ~> PostgreSQL 16 kernel upgrade work is in progress: 
https://lists.apache.org/thread/1b5sr96315txsvs1zg65vsd1n01kf0ql

### Performance and Usability

- [x] Support hybrid Row-Column storage, inspired by Partition Attributes 
Across (https://www.vldb.org/conf/2001/P169.pdf), which has the same write 
performance as AO tables and the same read performance as AOCS tables. We will 
also integrate the latest compression algorithms and encoding algorithms (such 
as dictionary encoding) into it.  
  - **Updates - 10/17, 2025:** 
https://github.com/apache/cloudberry/tree/main/contrib/pax_storage
* Support vectorization execution engine to optimize the query performance.  
* Refactor the dispatch logic for improved efficiency.  
- [x] Refactor the Materialized view and query for external tables.  
- [x] Support parallel execution in ORCA.
  - **Updates - 10/17, 2025:** https://github.com/apache/cloudberry/pull/1398
- [x] Parallel query optimization to support more SQL operators.
  - **Updates - 10/17, 2025:** https://github.com/apache/cloudberry/pull/1261
* Projection support (materialized view for AO tables).  
* Support more than 10 tables in ORCA, improve search space exploration  
* ORCA time limits (based on the number of permutations or optimization time)  
* …  
* ….

### Availability improvements

* Support cold standby
- [x] Support hot (read-only) standby
  - **Updates - 10/17, 2025:** https://github.com/apache/cloudberry/pull/1268
* Support ephemeral temporary objects  
* Support cluster write barrier: with this we can take consistent snapshot for 
all disks on all cluster nodes  
* Support more than one segment mirror  
* Support mirrors in different AZ  
* Support quorum replication and jepsen tests to check if we not lose data  
* Graceful segment shutdown
- [x] Robust resource groups isolation \- IO/CPU/Memory/Network
  - **Updates - 10/17, 2025:** Already supported in Kernel

### Functionality improvements

* Support Compute/Storage decouple by introducing Yezzey 
([https://github.com/open-gpdb/yezzey](https://github.com/open-gpdb/yezzey)).  
* Built-in Time Travel, select data from the past  
* Support all types of queries in ORCA
- [x] Pg\_hint\_plan for ORCA.
* Stored hints and plan stability

**Quality Assurance**

* Introduce more testing frameworks and methodology of open source. For 
example, introduce automatic SQL generation testing, SQLancer, and Chaos 
testing for system robustness.  
* Refactor current ICW cases to reduce running time.  
* Binary swap tests between minor versions

### Usability

* Disaster Recovery \- providing disaster recovery capabilities for Cloudberry 
to enable point-in-time recovery (PITR) to recover the Cloudberry cluster to a 
certain restore point in the case of a disaster.  
* Cloudberry Central Console (like GPCC)  
* Support upgrading tools for in-place upgrade.  
* K8S deployment support for Cloudberry Database  
* ~~Migration tool from Oracle/MSSQL database~~  
* Rename the gp\* or greenplum\* related commands or keywords to cb\* or 
cloudberry\* for better compliance. But we need to create aliases to bridge the 
old and new ones to let users have a smooth transition.


### Streaming / Real-time

- [x] Implementing kafka\_fdw extension to enable streaming data from Kafka to 
Cloudberry.
  - **Updates - 10/17, 2025:** https://github.com/cloudberry-contrib/kafka_fdw
- [ ] Integration with Flink CDC / Kafka connector to support near real-time 
data integration. 
  - **Updates - 10/17, 2025**
    - [x] Flink Connector JDBC - 
https://github.com/apache/flink-connector-jdbc/commit/544275c8c8b03426b71192b0dde39bc51c041bab
- [x] Support Dynamic Tables.
  - [x] **Updates - 10/17, 2025:** 
https://cloudberry.apache.org/docs/performance/use-dynamic-tables 

### Lakehouse

* Integration with various data lakes (including Iceberg, Hudi, Delta Lake, and 
more) as plugins. For the integration with Apache Iceberg, see the discussion: 
[https://github.com/apache/cloudberry/discussions/369](https://github.com/apache/cloudberry/discussions/369)
 . 

### 

### AI/ML

* Integration with Ray (https://www.ray.io/) to support AI/ML workloads. (High 
priority)  
* Working with Apache MADlib community to support Cloudberry natively in MADlib 
upstream codebase.  
* Support graph query: AI applications are moving to using graph to fetch 
knowledge for better outcomes

### Utilities and Ecosystem

We aim to let Cloudberry as the first-class citizen be supported in the 
ecosystem, not just doing some minor updates based on the Greenplum supports in 
the upstream tools.

- [ ] Cherry-pick the latest commits from the original Greenplum projects to 
Cloudberry, including cloudberry-pxf, cloudberry-gpbackup, 
cloudberry-gpbackup-s3-plugin, cloudberry-go-libs.
  - [x] **Updates - 10/17, 2025**
    - [x] Cloudberry-gpbackup has been renamed to cloudberry-backup, its 
codebase has synced with the GP’s archived version: 
https://github.com/apache/cloudberry-backup 
    - [x] Cloudberry-go-libs: cloudberry-go-libs has synced with the GP’s 
archived version: https://github.com/apache/cloudberry-go-libs 
    - [x] Cloudberry-gpbackup-s3-plugin: this repo has been archived and its 
core files are merged into the cloudberry-backup: 
https://github.com/apache/cloudberry-backup/tree/main/plugins/s3plugin
    - ⚠️Cloudberry-pxf: still in progress on the archived commits sync to 
Cloudberry.
- [x] Support PGRX to support writing UDFs in Rust in Cloudberry.
  - **Updates - 10/17, 2025**: https://github.com/cloudberry-contrib/pgrx
- [x] DBeaver for Cloudberry
  - **Updates - 10/17, 2025**: It has supported Cloudberry since its 25.2.2: 
https://github.com/dbeaver/dbeaver/releases 
- [x] JDBC/ODBC for Cloudberry
  - **Updates - 10/17, 2025**: we can use the PostgreSQL JDBC/ODBC drive for 
Cloudberry
* Container Service for Cloudberry Database  
* Cloudberry command center for database, query and resource management  
* TPC-H / TPC-DS benchmark for Cloudberry  
* Integrations with other ASF projects
  - [x] Apache SeaTunnel \- Source(V2) of SeaTunnel \- Greenplum \- 
[https://seatunnel.apache.org/docs/2.3.6/category/source-v2](https://seatunnel.apache.org/docs/2.3.6/category/source-v2)
  
  * ……

> Note: 
> The original GitHub organization 
> [github.com/cloudberrydb](http://github.com/cloudberrydb) will be renamed to 
> [github.com/cloudberry-contrib](http://github.com/cloudberry-contrib), which 
> includes some Cloudberry developers but is not officially maintained by 
> Cloudberry PPMC. The org will store some non-Apache License extensions or 
> projects for Cloudberry, like pgvector, PL/Java, PL/R and more tools.
> * Pgvector version upgrade from 0.5.x to 0.8.x  
> * PostGIS version upgrade from 2.5.x to 3.3.2

### Release Management

We will establish a predictable and sustainable release process to provide 
stable software to our users while maintaining quality:

#### Release Cadence

* Major releases quarterly (x.y.0)  
* Minor releases (x.y.z) as needed for critical fixes  
- [x] First Apache release targeted for \[DATE\]
  - **Updates - 10/17, 2025**: the first Apache release can be downloaded here: 
https://cloudberry.apache.org/releases 
* Release candidates to undergo minimum 2-week community testing

#### Version Management

* Follow semantic versioning (MAJOR.MINOR.PATCH)  
* Major version: Incompatible API/ABI changes  
* Minor version: New functionality in backward-compatible manner  
* Patch version: Backward-compatible bug fixes  
* Pre-release versions marked with \-alpha, \-beta, \-rc suffixes

#### Release Process

- [x] Documented release procedures following Apache guidelines
  - **Updates - 10/17, 2025**: https://github.com/apache/cloudberry/wiki 
- [x] Automated release preparation and verification tools
  - **Updates - 10/17, 2025**: 
https://github.com/apache/cloudberry/tree/main/devops/release
- [x] Release notes and migration guides for each version
- [x] Security vulnerability handling process
  - **Updates - 10/17, 2025**: see 
https://github.com/apache/cloudberry/blob/main/SECURITY.md 

### Release & Pipelines

Our goal is to make Cloudberry's CICD workflow more flexible, robust, and 
automated, which also can be reused by the community users and developers in 
their environments.

- [x] Introduce the new build, test, and deployment workflows for Cloudberry 
based on GitHub Actions and Docker.
  - **Updates - 10/17, 2025**: 
https://github.com/apache/cloudberry/tree/main/.github/workflows
* Support more OS matrices and artifacts, Docker images, including Rocky Linux, 
Debian, and Ubuntu.  
* Support more CPU arch, including x86\_64, ARM, RISCV, and LoongArch.  
- [x] Support skipping the CICD workflow for some pull requests with specified 
file formats or directories in the main repo, like \*.txt, \*.md, \*.mdx, 
\*.png, and /doc dir to save test resources.
* Add comments commands for the pull request review, like /build ,/rebase , 
/ok-to-test to trigger the commands by PR authors and project committers, which 
also can help reduce the cost. Reference: http://prow.k8s.io/command-help.  
* Add git pre-commit workflow to help check the commit message conventions.  
* Ansible playbook on cloud provider.

### Website, Documents and Marketing

This part will include some short-term and mid-term items we want to do for the 
website, documents, and marketing.

* Website:  
  - [x] Clean up the website source code.
  - [x] Update the disclaimer and check as per the ASF brand policy and the 
Podling website guide.
  - [x] Optimize the website style and redesign some pages like the homepage or 
blog index page.
* Documents:  
  - [x] Restructure the existing documents to make them more organized.
  * Cherry-pick the doc updates from Greenplum and PostgreSQL.  
  * Generate more new documents to align with the project features.  
  * Work with the developer to create the development guide.  
* Marketing:  
  * Adopt ASF social media guidelines for Cloudberry and create the workstream 
for social media platforms.

### Rollout/Adoption Plan

_No response_

### Are you willing to submit a PR?

- [ ] Yes I am willing to submit a PR!

---

### Update log

- Updates - 10/17, 2025

GitHub link: https://github.com/apache/cloudberry/discussions/868

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to