Re: [DISCUSS] Volunteering to Serve as Release Manager for Apache Cloudberry (Incubating) 2.0.0

Dianjin Wang Tue, 22 Apr 2025 01:25:24 -0700

Excellent! Thanks Alwin, Tong, and Max for your work — this will be a
big help for the upcoming release.


BTW, the CBDR-related description can be ignored for now, as it hasn’t
been contributed back to the Cloudberry main branch yet. Just FYI.

Best,
Dianjin Wang

On Tue, Apr 22, 2025 at 10:24 AM Alwin Tang <[email protected]> wrote:
>
> Thanks Ed for volunteering to serve as the release manager for Cloudberry
> 2.0.0. I support this initiative.
>
> To assist this release effort,  myself, Tong (@TomShawn) and Max
> (@my-ship-it) have put together a preliminary list of feature enhancements
> and bug fixes for 2.0.0, included below. This list may not be complete or
> 100% correct for this major upgrade, but we hope it is a starting point for
> the success of this release.
>
> Warmly,
> + Alwin
>
> --------------------
>
> <feature list>
>
> Cloudberry v2.0.0 is a major release that brings significant enhancements
> to the database kernel. This release represents a substantial leap forward
> in performance, reliability, and manageability. It also includes hundreds
> of bug fixes and stability improvements.
>
> ## New features
>
> ### Query processing and optimization
>
> #### Index and scan
>
> -  **Enhanced index-only scan capabilities**
>
>    -  Supports index-only scans on a broader range of index types when
> using the GPORCA optimizer, including those with covering indexes using
> `INCLUDE` columns. This helps improve query performance.
>
>    -  Supports dynamic index-only scan when using the GPORCA optimizer to
> accelerate queries on partitioned tables. This feature combines partition
> pruning with index-only access to avoid heap lookups, significantly
> reducing I/O and improving performance. It is ideal for wide tables with
> narrow covering indexes and can be enabled using `SET
> optimizer_enable_dynamicindexonlyscan = on`.
>
>    -  Supports index-only scans when using the GPORCA optimizer on
> append-only (AO) tables and PAX tables, enabling faster query execution by
> avoiding block access when possible. This improves performance in scenarios
> where traditional index scans on AO and PAX tables were previously
> inefficient.
>
> -  **Improved index scan performance and flexibility**
>
>    -  Supports backward index scans when using the GPORCA optimizer for
> queries with `ORDER BY ... DESC`, eliminating the need for explicit sorting
> when a B-tree index exists in the opposite order. This optimization reduces
> resource usage and improves performance, especially for top-N and
> pagination queries.
>
>    -  The GPORCA optimizer supports triggering Bitmap Index Scans using
> array comparison predicates like `col IN (...)` or `col = ANY(array)`,
> including for hash indexes. This improves query performance on large
> datasets by enabling more efficient multi-value matching. The optimizer
> automatically chooses the bitmap scan path based on cost estimation.
>
>    -  The GPORCA optimizer now considers the width of `INCLUDE` columns
> when costing index-only scans, favoring narrower indexes that return fewer
> unused columns. This improves plan selection for queries where multiple
> covering indexes are available. The cost model also more accurately
> estimates I/O by refining how `relallvisible` is used in index-only scan
> costing.
>
> -  **BRIN index enhancements**
>
>    -  Redesigns BRIN index internals for AO/CO tables to replace the
> `UPPER` page structure with a more efficient chaining model. This
> significantly reduces disk space usage for empty indexes and improves
> performance by avoiding unnecessary page access. The new design better
> handles the unique layout of AO/CO tables while maintaining correctness and
> compatibility.
>
>    -  BRIN indexes on AO/CO tables now support summarizing specific logical
> heap block ranges using `brin_summarize_range()`, enabling more precise
> control during index maintenance and testing. This enhancement also adds
> improved coverage for scenarios involving aborted rows, increasing
> robustness and correctness in edge cases.
>
>    -  Supports generating `IndexScan` plans when using the GPORCA optimizer
> with `ScalarArrayOp` qualifiers (for example, `col = ANY(array)`) for
> B-tree indexes. This enhancement aligns ORCA with the planner's behavior
> and allows more efficient execution of array comparison queries, as long as
> the predicate column is the first key in a multicolumn index.
>
> #### View and materialized view
>
> -  Improves performance of `REFRESH MATERIALIZED VIEW WITH NO DATA` by
> avoiding full query execution. The command now behaves like a `TRUNCATE`,
> significantly reducing execution time while preserving proper dispatch to
> segments.
>
> #### Join
>
> -  Supports left join pruning when using the GPORCA optimizer, allowing
> unnecessary left joins to be eliminated during query optimization. This
> applies when the query only uses columns from the outer table and the join
> condition fully covers the inner table's unique or primary keys. This can
> lead to more efficient query plans.
>
> -  Supports `FULL JOIN` using the `Hash Full Join` strategy when using the
> GPORCA optimizer. This approach avoids sorting join keys and reduces data
> redistribution, making it suitable for large datasets or joins on
> non-aligned distribution keys. All `FULL JOIN` queries now use `Hash Full
> Join`.
>
> -  The GPORCA optimizer now avoids unnecessary data redistribution for
> multi-way self joins using left or right outer joins when the join keys are
> symmetric. This optimization improves performance by recognizing that such
> joins preserve data colocation, eliminating redundant motion operations.
>
> -  The GPORCA optimizer no longer penalizes broadcast plans for `NOT IN`
> queries (Left Anti Semi Join), regardless of the
> `optimizer_penalize_broadcast_threshold` setting. This change improves
> performance and avoids potential OOM issues by enabling parallel execution
> instead of concentrating large tables on the coordinator node.
>
> #### Function & aggregate
>
> -  Supports intermediate aggregates when using the GPORCA optimizer,
> enabling more efficient execution of queries that include both `DISTINCT`
> aggregates and regular aggregates. This ensures correct handling of
> aggregation stages using `AGGSPLIT`. In addition, ORCA introduces an
> optimization for `MIN()` and `MAX()` functions by using index scans with a
> limit, instead of full table scans with regular aggregation. This
> optimization also supports `IS NULL` and `IS NOT NULL` conditions on
> indexed columns, significantly improving performance for applicable queries.
>
> -  Enables more `HashAggregate` plan alternatives for queries that include
> `DISTINCT` aggregates when using the GPORCA optimizer. By generating a
> two-stage aggregation plan that avoids placing `DISTINCT` functions in
> hash-based nodes, ORCA ensures compatibility with the executor and expands
> the range of supported query plans. This improvement enhances optimization
> choices for group-by queries.
>
> -  Supports queries using `GROUP BY CUBE`, enabling multi-dimensional
> grouping sets in query plans. This expands analytic query capabilities.
> Note that optimization time for `CUBE` queries may be high due to the large
> number of generated plan alternatives.
>
> #### Preprocessing
>
> -  Inlines Common Table Expressions (CTEs) that contain outer references,
> allowing such queries to be planned and explained successfully. Previously,
> these queries would fall back to the legacy planner due to limitations in
> handling shared scans with outer references. This change improves
> compatibility and enables ORCA to optimize a broader range of CTE-based
> queries.
>
> -  No longer rewrites `IN` queries to `EXISTS` when the inner subquery
> contains a set-returning function. This prevents invalid query
> transformations that could previously result in execution errors. The
> change ensures correct handling of queries like `a IN (SELECT
> generate_series(1, a))`.
>
> #### Optimization and performance enhancements
>
> -  **Plan hint**
>
>    -  Supports plan hints for scan types and join row estimates when using
> the GPORCA optimizer, enabling users to guide query planning using
> `pg_hint_plan`-style comments. Supports scan hints include `SeqScan`,
> `IndexScan`, `BitmapScan`, and their negations, while row hints allow users
> to specify expected join cardinalities.
>
>    -  The `plan hint` field is now required in the ORCA optimizer
> configuration. This change simplifies internal parsing logic and ensures
> consistent handling of optimizer configuration files.
>
>    -  Supports join order hints for left and right outer joins when using
> the GPORCA optimizer, extending the existing hint framework beyond inner
> joins. This enhancement allows users to guide the optimizer's join order
> more precisely in complex queries involving outer joins, improving plan
> control and potentially execution performance.
>
> -  **Enhancements to ORCA**
>
>    -  Supports table aliases in query plans when using the GPORCA
> optimizer, making `EXPLAIN` outputs more descriptive and aligned with
> user-defined query syntax. In addition, ORCA adds support for query
> parameters, including those used in functions and prepared statements,
> enabling better compatibility with parameterized workloads and dynamic SQL
> execution.
>
>    -  When using the GPORCA optimizer, supports generating plans for
> queries on tables with row-level security (RLS) enabled. Security policies
> are enforced during plan generation, ensuring only permitted rows are
> visible to each user. ORCA still falls back to the planner for RLS queries
> with sublinks, foreign tables, or for `INSERT` and `UPDATE` statements.
>
>    -  The GPORCA optimizer now gracefully falls back to the Postgres
> planner when a function in the `FROM` clause uses `WITH ORDINALITY`, which
> is not currently supported. The fallback includes a clear error message
> indicating the unsupported feature.
>
>    -  When using the GPORCA optimizer, supports pushing down filters with
> `BETWEEN` predicates when combined with constant filters, enabling more
> effective predicate propagation. This enhancement can reduce the number of
> rows processed during joins, improving query performance in applicable
> cases.
>
>    -  When using the GPORCA optimizer, supports hashed subplans when the
> subquery expression is hashable and contains no outer references. This
> enhancement can significantly improve query performance by reducing
> execution time in applicable cases.
>
>    -  ORCA now supports executing foreign tables with `mpp_execute='ANY'`
> on either the coordinator or segments, depending on cost. This allows more
> flexible and efficient execution plans for foreign data sources. A new
> "Universal" distribution type is introduced to support this behavior,
> similar to how `generate_series()` is handled.
>
>    -  ORCA now supports direct dispatch for randomly distributed tables
> when the query includes a filter on `gp_segment_id`. This enhancement
> improves query performance by routing execution directly to the relevant
> segment, reducing unnecessary data processing across the cluster.
>
>    -  ORCA now supports generating plans with the `ProjectSet` node,
> enabling correct execution of queries that include set-returning functions
> (SRFs) in the target list. This enhancement prevents fallback to the legacy
> planner and ensures compatibility with PostgreSQL 11+ behavior.
>
>    -  ORCA now supports the `FIELDSELECT` node, which allows it to optimize
> a broader range of queries involving composite data types. Previously, such
> queries would fall back to the legacy planner. This enhancement improves
> compatibility and reduces unnecessary planner fallbacks.
>
>    -  ORCA now derives statistics only for the columns used in `UNION ALL`
> queries, instead of all output columns from the input tables. This
> optimization reduces unnecessary computation and can improve planning
> performance for large queries.
>
>    -  Updates naming in logs and `EXPLAIN` output to refer to the
> optimizers as "GPORCA" and "Postgres based planner" for improved clarity
> and consistency.
>
>    -  Optimizes ORCA's `Union All` performance by deriving statistics only
> for columns used in the query output. This reduces unnecessary computation
> and improves planning efficiency for queries with unused columns.
>
> ### Transaction management
>
> #### Lock management
>
> -  Updates logic to ignore invalidated slots while computing the oldest
> catalog Xmin, reducing the risk of deadlocks and improving transaction
> concurrency.
>
> -  Performs serializable isolation checks early for AO/CO tables, ensuring
> stricter consistency guarantees and reducing the likelihood of isolation
> conflicts.
>
> -  Enhances the index creation process to prevent deadlocks by ensuring the
> coordinator acquires an `AccessShareLock` on `pg_index` before dispatching
> a synchronization query to segments, thus aligning `indcheckxmin` and
> avoiding conflicts that GDD cannot resolve.
>
> #### Transaction performance and reliability
>
> -  Avoids replaying DTX information in checkpoints for newly expanded
> segments, preventing potential inconsistencies during recovery.
>
> -  Adds `gp_stat_progress_dtx_recovery` for better observability of
> distributed transaction recovery progress.
>
> -  Improves error reporting for DTX protocol command dispatch errors,
> making it easier to diagnose and resolve issues.
>
> -  Allows utility mode on the coordinator to skip upgrading locks for
> `SELECT` locking clauses, improving efficiency for maintenance operations.
>
> ### Storage
>
> #### AO/CO table enhancements
>
> -  Optimizes `CREATE INDEX` operations on AO tables with scan progress
> reporting, enhancing the efficiency of index creation.
>
> -  Declares the connected variable as "volatile" to ensure proper handling
> across `PG_TRY` and `PG_CATCH` blocks, mirroring PostgreSQL's best
> practices for exception-safe variable usage in transaction control.
>
> #### Partitioning
>
> -  Extends Orca's planning capabilities to include support for foreign
> partitions, enabling optimized query execution for tables with a mix of
> foreign and non-foreign partitions. The implementation introduces new
> logical and physical operators for foreign partitions, supports static and
> dynamic partition elimination, and integrates with any foreign data wrapper
> compatible, enhancing performance and flexibility for external data queries.
>
> -  Optimizes the analysis of leaf partitions in multi-level partition
> tables to avoid unnecessary resampling of intermediate partitions.
>
> -  Supports dynamic partition elimination (DPE) when using the GPORCA
> optimizer for plans involving duplicate-sensitive random motions. This
> allows partition selectors to pass through segment filters, enabling more
> efficient query plans and reducing the number of scanned partitions.
>
> -  Adds Dynamic Partition Elimination for Hash Right Joins, which enhances
> the efficiency of join operations on partitioned tables.
>
> -  Supports boolean static partition pruning in ORCA, enhancing the
> efficiency of partition pruning during query optimization.
>
> -  Enhances ORCA's query planning by incorporating partition key opfamily
> checks during partition pruning to optimize data distribution and partition
> scanning, ensuring correct motion triggering and partition scanning by
> aligning predicate operators with the distribution or partition key's
> opfamily, addressing issues with missing motion, incorrect direct dispatch,
> and ineffective partition pruning.
>
> -  Caches the last found partition in `ExecFindPartition` to improve
> performance for repeated partition lookups.
>
> -  Enables ORCA to derive dynamic table scan cardinality from leaf
> partitions, addressing limitations in handling date and time-related data
> types by changing their internal representation to doubles.
>
> -  Enhances the DPv2 algorithm to include distribution spec information
> with partition selectors, improving the efficiency of distributed query
> execution.
>
> -  Introduces a new Non-Replicated distribution specification to optimize
> join operations in database processing. By relaxing the enforcement of
> singleton distribution for outer tables when the inner table is universally
> distributed, it aims to reduce unnecessary data gathering and
> duplicate-sensitive motions, thereby generating more efficient execution
> plans.
>
> #### Memory management
>
> -  Implements a custom allocator to enable ORCA to use standard C++
> containers, addressing heap allocation management.
>
> -  Refactors ORCA's memory pool by making several methods static and adds
> assertions to ensure pointer safety.
>
> -  Optimizes serialization of IMDId objects in ORCA to be lazy, improving
> performance by deferring serialization until necessary. Improves
> optimization time when loading objects into the relcache and when involving
> large and wide partition tables.
>
> -  Ensures that strings returned by `GetDatabasePath` are always freed
> using `pfree`, preventing memory leaks.
>
> -  Enables MPP (Massively Parallel Processing) support for `pg_buffercache`
> and builds it by default, making buffer cache management more scalable and
> efficient in distributed environments.
>
> -  Introduces `pg_buffercache_summary()` to offer a high-level overview of
> buffer cache activity.
>
> #### Metadata and access methods
>
> -  Allows the definition of lock modes for custom reloptions, providing
> more control over table and index access.
>
> -  Supports specification of reloptions when switching storage models,
> allowing seamless transitions between different storage formats.
>
> -  Introduces a new struct member in `CreateStmt` to indicate the origin of
> the statement, specifying if it was generated from GP style classic
> partitioning syntax.
>
> -  Adds syscache lookup for `pg_attribute_encoding` and `pg_appendonly`,
> improving performance and efficiency in metadata access.
>
> -  Introduces a new catalog entry in `pg_aggregate` to store replication
> safety information for aggregates, allowing users to mark specific
> aggregates as safe for execution on replicated slices via an optional
> repsafe parameter during the `CREATE AGGREGATE` command. This helps
> optimize performance by avoiding unnecessary broadcasts on large replicated
> datasets.
>
> -  Enhances the dispatch of `ALTER DATABASE` commands by allowing options
> like `ALLOW_CONNECTIONS` and `IS_TEMPLATE` to be dispatched to segments,
> ensuring catalog changes are reflected everywhere.
>
> ### Data loading and external tables
>
> #### External table enhancements
>
> -  Adds clearer restrictions and warnings when exchanging or attaching
> external tables. Writable external tables can no longer be used as
> partitions, and attaching readable external tables without validation now
> triggers a warning instead of requiring a no-op clause.
>
> -  Disables `SET DISTRIBUTED REPLICATED` for `ALTER EXTERNAL TABLE` to
> prevent misuse and ensure consistency.
>
> #### Foreign data wrapper
>
> -  Improves performance and stability for `gpfdist` external tables. Adds
> TCP keepalive support for more reliable reads, and increases the default
> buffer size to enhance write throughput for writable external tables.
>
> -  ORCA now falls back to the planner for queries involving foreign
> partitions using `greenplum_fdw`, preventing crashes caused by incompatible
> execution behavior. Queries on non-partitioned foreign tables using
> `greenplum_fdw` remain supported by ORCA.
>
> ### High availability and high reliability
>
> #### Backup and disaster recovery
>
> -  CBDR
>
>    Introduces CBDR, a backup and recovery tool designed for Apache
> Cloudberry databases. Built on WAL-G, CBDR offers a user-friendly
> command-line interface that simplifies disaster recovery and ensures data
> safety.
>
>    CBDR supports both full and incremental backups, making it efficient for
> large-scale clusters. Users can list available backups, restore from any
> selected point, and store backups either locally or in S3-compatible object
> storage. Compared to existing tools like `gpbackup` and `gprestore`, CBDR
> provides enhanced flexibility with features such as support for multiple
> compression formats (lz4, lzma, zstd, brotli) and backup encryption, making
> it a comprehensive solution for enterprise-grade backup strategies.
>
> -  DB recovery and synchronization
>
>    -  Improves archiver performance when handling many `.ready` files by
> reducing redundant directory scans. This change speeds up WAL archiving,
> especially when `archive_command` has been failing and many files have
> accumulated.
>
>    -  `gp_create_restore_point()` can only be executed on the Coordinator
> node. Calling this function on a segment node will result in an error. The
> function returns a structured record value, including the restore point
> name and LSN, which you can view directly by running `SELECT * FROM
> gp_create_restore_point()`.
>
> #### WAL
>
> -  Improves WAL replication management by restricting a
> coordinator-specific tracking mechanism to the coordinator only. This
> change simplifies primary segment behavior and aligns replication practices
> more closely across segments. No functional change for users, but helps
> reduce unnecessary complexity in WAL retention logic.
>
> -  Enhances WAL retention logic to improve reliability of incremental
> recovery using `pg_rewind`. Physical replication slots now retain WAL files
> up to the last common checkpoint, reducing risk of missing WAL during
> recovery. This change also simplifies the underlying logic and adds test
> coverage for WAL recycling.
>
> -  Switches WAL replication connections to use the standard libpq protocol
> instead of a legacy internal one. This improves compatibility and
> reliability of replication behavior. Also fixed test failures and improved
> error handling for replication connections.
>
> ### Security
>
> #### DB Operations
>
> -  `REFRESH MATERIALIZED VIEW CONCURRENTLY` runs all internal operations in
> the correct security context to prevent potential privilege escalation.
> This change ensures safer execution by restricting operations to the
> appropriate permission level.
>
> -  Improves internal handling of new `aoseg` and `aocsseg` tuples by
> aligning tuple freezing behavior with other catalog operations. This change
> enhances consistency with upstream PostgreSQL practices and removes the
> need for `CatalogTupleFrozenInsert`.
>
> #### System processes
>
> -  Orphaned file checks now exclude idle sessions during safety validation.
> This prevents unnecessary errors when persistent connections from services
> are active, allowing the detection process to complete successfully.
>
> -  Adds a safety check in backend signal handlers to ensure signals are
> handled by the correct process. This prevents unintended shared memory
> access by child processes and improves overall process isolation and
> stability.
>
> -  Improves process safety by preventing child processes spawned via
> `system()` from calling `proc_exit()`. This avoids potential corruption of
> shared memory structures and ensures only the parent process performs
> cleanup operations.
>
> -  Removes the permission check for `cpu.pressure` when using
> `gp_resource_manager='group-v2'`. This prevents startup failures on systems
> where PSI is disabled, without affecting resource management functionality.
>
> #### Replication/Mirrorless clusters
>
> -  Improves replication error reporting by setting persistent `WalSndError`
> when a replication slot is invalidated. This ensures accurate error
> visibility in `gp_stat_replication`.
>
> #### Permission management
>
> -  Strengthens security by rejecting extension schema or owner
> substitutions containing potentially unsafe characters like `$`, `'`, or
> `\`. This prevents SQL injection in extension scripts and protects against
> privilege escalation in certain non-bundled extensions.
>
> -  Creating or assigning roles to the `system_group` resource group now
> results in an error, as this group is reserved for internal system
> processes only.
>
> -  Reverts the restriction requiring superuser privileges to set the
> `gp_resource_group_bypass` GUC. This allows applications like GPCC to
> function more easily while still limiting resource impact.
>
> -  Altering the `mpp_execute` option of a foreign server or wrapper is now
> disallowed to prevent inconsistencies in foreign table distribution
> policies. Changing these options previously could result in outdated cached
> plans and incorrect query execution. This update ensures plan correctness
> by enforcing cache invalidation only when appropriate.
>
> #### pgcrypto
>
> -  Adds support for FIPS mode in `pgcrypto`, controlled by a GUC. This
> allows Cloudberry to operate in FIPS-compliant environments when linked
> with a supported FIPS-enabled OpenSSL version. Certain ciphers are disabled
> in this mode to comply with FIPS requirements.
>
> -  `pgcrypto` now allows enabling FIPS mode even on systems where FIPS is
> not pre-enabled by the OS or environment. This change removes the
> dependency on `FIPS_mode()` checks, offering more flexibility in managing
> FIPS compliance through the database.
>
> ### Resource management
>
> #### Resource group management
>
> -  Renames the `memory_limit` parameter to `memory_quota` in `CREATE/ALTER
> RESOURCE GROUP` to clarify its meaning and unit.
>
> -  Adds a new system view `gp_toolkit.gp_resgroup_status_per_segment` to
> monitor memory usage per resource group on each segment. This view helps
> database administrators track real-time vmem consumption (in MB) when
> resource group-based resource management is enabled.
>
> -  Improves logging behavior when memory usage reaches Vmem or resource
> group limits. The system now prints log messages directly to stderr to
> avoid stack overflow errors during allocation failures.
>
> -  Removes unnecessary permission check for `cpu.pressure` when using the
> `group-v2` resource manager. This prevents startup failures on systems
> where PSI is not enabled, improving compatibility across Linux
> distributions.
>
> #### Logging and monitoring
>
> -  Adds additional log messages for GDD backends to help investigate
> memory-related issues. These logs provide better visibility into backend
> behavior during high memory usage scenarios.
>
> -  Adds a log ignore rule for "terminating connection" messages to reduce
> noise in test outputs. This helps avoid unnecessary diffs in CI for tests
> that involve connection termination.
>
> -  Adds more verbose logging to `ResCheckSelfDeadlock()`.
>
> -  Logs queue IDs and portal IDs in resource queue logs.
>
> -  Dumps more information when releasing resource queue locks to aid in
> troubleshooting and monitoring.
>
> -  Uses `ERROR` for dispatcher liveness checks.
>
> -  Enhances logging for dispatch connection liveness checks to improve
> clarity during connection failures. Logs now include more accurate error
> messages based on socket state and system errors.
>
> #### Platform compatibility and build
>
> -  Improves `gp_sparse_vector` compatibility with ARM platforms by fixing
> type handling in serialization logic. This ensures consistent behavior
> across different architectures.
>
> -  Adds support for `sigaction()` on Windows to align signal handling
> behavior with other platforms. This reduces platform-specific differences
> and improves code consistency.
>
> -  Updates ACL mode type in ORCA to match the parser's definition, ensuring
> consistent type usage.
>
> #### System views and statistics
>
> -  Improves join cardinality estimation for projected columns that preserve
> the number of distinct values (NDVs), such as additions or subtractions
> with constants. This allows the optimizer to use underlying column
> histograms for more accurate estimates, improving plan quality for queries
> with scalar projections in join conditions.
>
> -  Increases precision for frequency and NDV values in ORCA when processing
> metadata population scripts (MDPs). This change ensures consistent behavior
> between MDPs and live database queries, reducing discrepancies caused by
> rounding small values.
>
> -  ORCA now considers null value skew when costing redistribute motions,
> improving plan accuracy for queries involving columns with many nulls. This
> helps avoid performance issues caused by data being unevenly distributed
> across segments.
>
> -  ORCA now supports extended statistics to improve cardinality estimation
> for queries with correlated columns. This allows the optimizer to use real
> data-driven correlation factors instead of relying on arbitrary GUC
> settings, leading to more accurate query plans.
>
> -  Introduces `gp_log_backend_memory_contexts` to log memory contexts
> across segments, with optional targeting by content ID. This enhances
> observability and helps diagnose memory issues in distributed queries.
>
> -  ORCA now supports statistics derivation for predicates involving
> different time-related data types, such as date and timestamp. This
> improves plan accuracy and performance for queries comparing mixed temporal
> types.
>
> -  Autostats now uses `SKIP LOCKED` for `ANALYZE` operations to avoid
> blocking on locks, reducing the risk of deadlocks and improving
> predictability. This behavior is enabled by default and can be controlled
> using the `gp_autostats_lock_wait` GUC.
>
> -  ORCA now supports `STATS_EXT_NDISTINCT` extended statistics for
> estimating cardinality on correlated columns. This improves accuracy for
> queries using `GROUP BY` or `DISTINCT` on such columns.
>
> #### Network connections
>
> -  Marks `gp_reject_internal_tcp_connection` as defunct to improve
> reliability of internal QD-to-entry DB connections. These connections over
> TCP/IP are now treated as authenticated by default, preventing
> authentication errors caused by `pg_hba.conf` settings.
>
> ### Tools and utilities
>
> -  **analyzedb**
>
>    -  `analyzedb` now includes materialized views in its list of tables to
> analyze. This improves the performance immediately after analysis.
>
> -  **gpexpand**
>
>    -  `gpexpand` now includes a cluster health check to ensure all segments
> are up and in their preferred roles before proceeding. This prevents
> incorrect port assignments and avoids potential issues during expansion
> when nodes are not in a stable state.
>
> -  **gp_toolkit**
>
>    -  Added an update path for the `gp_toolkit` extension to version 1.6.
> This update renames the column `memory_limit` to `memory_quota` in the
> `gp_resgroup_config` view for improved clarity. Users can apply the update
> using `ALTER EXTENSION gp_toolkit UPDATE TO '1.6'`.
>
> ## Bug fixes
>
> - Fixed data loss caused by incorrect shared snapshot handling.
> - Fixed memory corruption during AOCO ADD COLUMN abort.
> - Fixed checkpoint WAL replay failure.
> - Fixed incorrect results when using UNION for RECURSIVE_CTE.
> - Fixed incorrect results from hash joins on char columns.
> - Fixed incorrect results produced by WITH RECURSIVE queries.
> - Fixed incorrect results when a REPLICATED table is unioned with a
> DISTRIBUTED table.
> - Fixed incorrect results when the outer query had ORDER BY after a LATERAL
> subquery.
> - Fixed incorrect behavior of DELETE with split update.
> - Fixed incorrect results when using direct dispatch.
> - Fixed memory leaks in ORCA and various components.
> - Fixed long-running execution with bitmap indexes.
> - Fixed redundant SORT enforcement on group aggregates.
> - Fixed incorrect index position in target list in ExecTupleSplit.
> - Fixed incorrect value in the cpu_usage column returned by
> pg_resgroup_get_status().
> - Fixed incorrect behavior of gp_toolkit.gp_move_orphaned_files.
> - Fixed incorrect results in multi-stage aggregate queries.
> - Fixed incorrect plan and output in multi-stage aggregate queries.
> - Fixed incorrect reltuples value after VACUUM.
> - Fixed incorrect index->reltuples value after VACUUM.
> - Fixed a vulnerability where LDAP leaked user information.
> - Fixed incorrect permissions warning on the pgpass file.
> - Fixed incorrect handling of ONLY keyword for multiple tables in
> GRANT/REVOKE statements.
> - Fixed incorrect permissions in resource management DDL.
> - Fixed incorrect security context in REFRESH MATERIALIZED VIEW
> CONCURRENTLY.
> - Fixed deadlock between coordinator and segments.
> - Fixed race condition in CTE reader-writer communication.
> - Fixed race condition when invalidating obsolete replication slots.
> - Fixed deadlock by allowing concurrent creation of non-first indexes on AO
> tables.
> - Fixed locking issue when opening range tables inside
> ExecInitModifyTable().
> - Fixed incorrect unlock mode in DefineRelation.
> - Fixed incorrect locking in partition distribution policies.
> - Fixed issues with rle_type when converting a table from AO to AOCO.
> - Fixed incorrect handling of empty ranges and NULL values in BRIN indexes.
> - Fixed incorrect handling of NULL values when merging BRIN summaries.
> - Fixed incorrect TIDs order when building bitmap indexes.
> - Fixed possible inconsistency between bitmap LOV table and its index.
> - Fixed incorrect behavior of VACUUM in AO tables with indexes.
> - Fixed incorrect handling of TOAST values for invisible AppendOptimized
> tuples during VACUUM.
> - Fixed ORCA's invalid processing of nested SubLinks under aggregates.
> - Fixed ORCA's invalid processing of nested SubLinks referenced in GROUP BY
> clauses.
> - Fixed ORCA's invalid processing of nested SubLinks with GROUP BY
> attributes.
> - Fixed incorrect predicate pushdown when using casted columns.
> - Fixed incorrect join condition loss after pulling up sublinks to join
> nodes.
> - Fixed incorrect hash-key generation for Redistribute Motion in multi-DQA
> expressions.
> - Fixed incorrect plan generation for SEMI JOIN with RANDOM distributed
> tables.
> - Fixed incorrect behavior of gp_stat_bgwriter.
> - Fixed incorrect monitoring in pg_stat_slru.
> - Fixed incorrect monitoring in gp_stat_progress_dtx_recovery.
> - Fixed incorrect monitoring in pg_resgroup_get_status().
> - Fixed incorrect monitoring in gp_toolkit.gp_resgroup_config.
> - Fixed compilation issues on various platforms.
> - Fixed documentation and comment typos.
> - Fixed build system and Makefile issues.
> - Fixed various memory leaks and resource management issues.
> - Fixed various error handling and logging improvements.
> - Fixed mismatched types.
> - Fixed the ORCA preprocess step for queries with the
> Select-Project-NaryJoin pattern.
> - Fixed the missing discard_output variable in shared scan node functions.
> - Fixed the crash caused by running VACUUM AO_AUX_ONLY on an AO-partitioned
> table.
> - Fixed an obvious memory leak in _bitmap_xlog_insert_bitmapwords().
> - Fixed a memory leak in the merge join implementation.
> - Fixed the issue where the token for user ID xxx did not exist.
> - Fixed the issue where plan hints could not derive table descriptors.
> - Fixed the issue where inject_fault suspend could not be canceled.
> - Fixed fallback in debug builds due to scalars with invalid return types.
> - Fixed relptr encoding of the base address.
> - Fixed visimap consults for unique checks during UPDATE operations.
> - Fixed the issue where external table location URIs containing | caused
> errors.
> - Fixed handling of the time command output containing commas.
> - Fixed a small overestimation of the output length of base64 encoding.
> - Fixed gp_toolkit.__gp_aocsseg_history crash on non-AO columnar tables.
> - Fixed a race condition between termination and resqueue wakeup.
> - Fixed a statement leak involving self-deadlocks.
> - Fixed the detection of child output columns when the parent is a UNION
> during join pruning.
> - Fixed a query crash when using a negative memory_limit value in resource
> groups.
> - Fixed issues in pgarch new directory-scanning logic.
> - Fixed a memory leak in the FTS PROBE process.
> - Fixed check_multi_column_list_partition_keys.
> - Fixed a memory leak caught via ICW with memory check enabled.
> - Fixed query hang and fallback issues involving CTEs on replicated tables.
> - Fixed the unrecognized join type error with LASJ Not-In and network types.
> - Fixed issues in upgrade_adapt.sql related to queries using WITH OIDS.
> - Fixed the double declaration of check_ok() in pg_upgrade.h.
> - Fixed logic error with subdirectories generated by pg_upgrade for
> internal files.
> - Fixed a typo in the pg_upgrade file header.
> - Fixed the bug where PL/Python functions caused the master process to
> reset.
> - Fixed the Shared Scan hang issue involving initplans.
> - Fixed motion toast error.
> - Fixed a memory leak related to fsync in AO tables.
> - Fixed CDatumSortedSet handling of empty arrays that caused errors in ORCA.
> - Fixed ORCA returning incorrect column type modifier information.
> - Fixed DbgStr output when printing DP structs in ORCA.
> - Fixed the comment on performDtxProtocolPrepare.
> - Fixed a memory leak in Dynamic Index, IndexOnly, and BitmapIndex scans
> during execution.
> - Fixed the memory accounting bug when moving MemoryContext under another
> accounting node.
> - Fixed the ALTER TABLE ALTER COLUMN TYPE issue that reuses an incorrect
> index.
> - Fixed query fallback when a subquery is present within LEAST() or
> GREATEST().
> - Fixed the typo in timestamp.
> - Fixed unexpected warnings related to pg_stat_statements node types.
> - Fixed the crash involving initplan in MPP.
> - Fixed LeftJoinPruning pruning essential LEFT JOINs.
> - Fixed the SET command that incorrectly sends DTX protocol commands.
> - Fixed the segmentation fault in addOneOption().
> - Fixed parallel_retrieve_cursor diffs.
> - Fixed gpdiff.pl to ignore information when EXPLAIN ignores costs.
> - Fixed the uninitialized-use warning in CTranslatorDXLToPlStmt.cpp.
> - Fixed the bug where the LOCALE flag cannot be used with a string pattern.
> - Fixed a typo in cdbmutate.c.
> - Fixed CColRefSet debug printing.
> - Fixed ORCA producing incorrect plans when handling SEMI JOIN with RANDOM
> distributed tables.
> - Fixed orphaned temp tables on the coordinator.
> - Fixed the segmentation fault caused by concurrent INSERT ON CONFLICT and
> DROP TABLE.
> - Fixed redundant columns in a multi-stage aggregate plan.
> - Fixed the import of ICU collations in pg_import_system_collations().
> - Fixed the error: "Cannot add cell to table content: total cell count of
> XXX exceeded."
> - Fixed orphaned temporary namespace catalog entries left on the
> coordinator.
> - Fixed REFRESH MATERIALIZED VIEW on AO tables with indexes.
> - Fixed the use of PORTNAME in the gp_toolkit Makefile.
> - Fixed pg_stat_activity display for bypassed and unassigned queries.
> - Fixed the recursive CTE MergeJoin that involved a motion on WTS.
> - Fixed the column width display for partitioned tables.
> - Fixed the LDAP crash when ldaptls=1 and ldapscheme is not set.
> - Fixed the gpstop pipeline flakiness after the referenced change.
> - Fixed the ANALYZE bug in expand_vacuum_rels.
> - Fixed the compilation error.
> - Fixed the ORCA crash due to improper colref mapping with CTEs.
> - Fixed the bug where gpload insert mode was not included in a transaction.
> - Fixed the bug where resgroup total wait time was always zero.
> - Fixed the gpcheckcat error against pg_description.
> - Fixed flakiness caused by waiting for a different number of fault
> triggers.
> - Fixed the bug involving RelabelType in the GROUP BY clause.
> - Fixed the planner error with multiple copies of an AlternativeSubPlan.
> - Fixed the issue with bitmap indexes.
> - Fixed the bug in HashAgg related to selective-column-spilling logic.
> - Fixed the bug in disk-based hash aggregation.
> - Fixed the pipeline stall issue in LookupTupleHashEntryHash().
> - Fixed the use of version in ArgumentParser, which is deprecated.
> - Fixed the use of BaseException.message, which has been deprecated since
> Python 2.6.
> - Fixed the case pg_rewind_fail_missing_xlog.
> - Fixed the compiler warning for gcc-12.
> - Fixed support for the DEFERRABLE keyword on primary and unique keys.
> - Fixed the unlocking of pruned partitions in partitioned tables.
> - Fixed the crash in ORCA involving skip-level correlated queries.
> - Fixed the removal of Assert statements in release builds.
> - Fixed the typo in comments: JOIN_SEMI_DEDUP/JOIN_SEMI_DEDUP_REVERSE.
> - Fixed the issue where REORGANIZE=TRUE did not redistribute
> randomly-distributed tables.
> - Fixed the core dump caused by concurrent updates on partition tables in
> DynamicScan.
> - Fixed the typo: ANALZE to ANALYZE.
> - Fixed the issue where cgroup v1 cpu_quota_us cannot be larger than its
> parent's value.
> - Fixed indentation and trailing whitespace in UDFs in
> resgroup/resgroup_auxiliary_tools_v1.
> - Fixed the name of cpu_hard_quota_limit in resgroup_syntax.sql.
> - Fixed multi-row DEFAULT handling in INSERT ... SELECT rules.
> - Fixed invalid function references in several comments.
> - Fixed the bug where COPY FORM does not throw ERROR: extra data after last
> expected column.
> - Fixed the issue where file .204800 was not being checked in
> ao_foreach_extent_file.
> - Fixed the issue of incorrectly incrementing the command counter.
> - Fixed the coordinator crash in MPPnoticeReceiver.
> - Fixed the dangling pointer in ExecDynamicIndexScan().
> - Fixed the ORCA bug that incorrectly removed required redistribution
> motion when using GROUP BY over gp_segment_id.
> - Fixed header handling in url_curl.c.
> - Fixed ao_filehandler to support new attnum to filenum mapping changes.
> - Fixed pg_aocsseg to work with attnum to filenum mapping.
> - Fixed a comment in pg_dump.
> - Fixed the ORCA build break.
> - Fixed the gpconfig SSH retry undefined parameter issue.
> - Fixed the stale gp_default_storage_options comment.
> - Fixed the bug: unrecognized node type: 147.
> - Fixed spelling errors identified by lintian.
> - Fixed the bypass catalog unit test.
> - Fixed erroneous Valgrind markings in AllocSetRealloc.
> - Fixed the legacy bug in the DatabaseFrozenIds lock.
> - Fixed the mirror checkpointer error on the ALTER DATABASE query.
> - Fixed the bug: get_ao_compression_ratio() failed on root partitioned
> tables with AO children.
> - Fixed the issue where InterruptHoldoffCount was not being reset.
> - Fixed gpexpand failure caused by an event trigger.
> - Fixed missing redistribute for CTAS or INSERT INTO on randomly
> distributed tables when using ORCA.
> - Fixed the double free of remapper->typmodmap in
> TeardownUDPIFCInterconnect().
> - Fixed the bug in the upstream-merged COMMIT AND CHAIN feature.
> - Fixed inconsistency between gp_fastsequence row and index after a crash.
> - Fixed the typo allocatd to allocated.
> - Fixed the error: unrecognized node type: 145 in transformExpr.
> - Fixed build error caused by unused variable.
> - Fixed the issue where the distribution key was missing when creating a
> stage table.
> - Fixed the regex for etc/environment.d.
> - Fixed the string comparison warning.
> - Fixed obsolete references to SnapshotNow in comments.
> - Fixed pull-up error when the target list contains a RelabelType node.
> - Fixed the issue where index DDL operations were recorded in QEs'
> pg_last_stat_operation.
> - Fixed two compiler warnings.
> - Fixed the wrong value of maxAttrNum in TupleSplitState.
> - Fixed the bug of incorrect index position in target list in
> ExecTupleSplit.
> - Fixed the format error of the library name on Mac M1.
> - Fixed the pg_resgroup_get_status_kv() function.
> - Fixed interconnect bugs in ic_proxy_ibuf_push().
> - Fixed memory leaks in auto_explain.
> - Fixed ic_proxy compilation when HOST_NAME_MAX is unavailable.
> - Fixed duplicate filters caused by reversed operator argument order.
> - Fixed pg_rewind when the log file is a symbolic link.
> - Fixed and enabled 64-bit bitmapset and updated visimap.
> - Fixed the hang caused by multi-DQA with filters in the planner.
> - Fixed the bogus ORCA plan that incorrectly joins a CTE and a REPLICATED
> table.
> - Fixed the error in ATSETAM when applied to ao_column with a dropped
> column.
> - Fixed the LWLockHeldByMe assert failure in SharedSnapshotDump.
> - Fixed the KeepLogSeg() unit test.
> - Fixed the race condition when invalidating obsolete replication slots.
> - Fixed the uninitialized value in segno calculation.
> - Fixed issues in the invalidation logic for obsolete replication slots.
> - Fixed checkpoint signalling.
> - Fixed memory overrun when querying pg_stat_slru.
> - Fixed the bug where ORCA fails to decorrelate subqueries ordered by outer
> references.
> - Fixed unused variable compile warnings.
> - Fixed the bug where NestLoop join fails to materialize the inner child in
> some cases.
> - Fixed COPY execution via FDW on coordinator as executor.
> - Fixed inFunction usage for auto_stats in CTAS.
> - Fixed a compiler warning.
> - Fixed the syntax error with CREATE MATERIALIZED VIEW.
> - Fixed the issue preventing temporary table creation LIKE existing tables
> with comments.
> - Fixed and rewrote IndexOpProperties API.
> - Removed redundant Get/SetStaticPruneResult usage.
> - Fixed EPQ handling for DML operations.
> - Fixed gpcheckperf failure when using -V with -f option.
> - Fixed possible mirror startup failure triggered by FTS promotion.
> - Fixed the parallel retrieve cursor issue when selecting transient record
> types.
> - Fixed the resource management DDL warning: unrecognized node type when
> log_statement='ddl'
> - Fixed the resgroup init error when many cores are present in cpuset.cpus.
> - Fixed resqueue malfunction when using JDBC extended protocol.
> - Fixed the missing LOCKING CLAUSE on foreign tables when ORCA is enabled.
> - Fixed the test_consume_xids behavior where it consumes one more
> transaction ID than expected.
> - Fixed the ONLY keyword handling for multiple tables in GRANT/REVOKE
> statements.
> - Fixed the regression test to ignore memory usage values in JSON format
> EXPLAIN output.
> - Fixed relcache lookup in ORCA when selecting from sequences.
> - Fixed missing WAL files required by pg_rewind.
> - Fixed the gp_dqa test to explicitly ANALYZE tables.
> - Fixed the crash of AggNode in the executor caused by an ORCA plan.
> - Fixed the resource group cpuset test case.
> - Fixed the compiler warning caused by gpfdist with compressed external
> tables.
> - Fixed link issues on macOS and Windows.
> - Fixed failure when DynamicSeqScan contains a SubPlan.
> - Fixed the error: cache lookup failed for type 0.
> - Fixed the multi-level correlated subquery bug.
> - Fixed checkpoint WAL replay failure.
> - Fixed the check for BufFileRead() in ExecHashJoinGetSavedTuple().
> - Fixed the test extension to allow executing SQL code inside a Portal.
> - Fixed resgroup view test cases.
> - Fixed incorrect DISTKEY assignment when copying partitions on segments.
> - Fixed ic-proxy mis-disconnecting addresses after reloading the config
> file.
> - Fixed the gpcheckcat check on partition distribution policies.
> - Fixed colid remapping in disjunctive constraints.
> - Fixed the Makefile by removing the tablespace-step target from all.
> - Fixed CBitSet intersection logic in ORCA.
> - Fixed the query preprocessor for nested Select-Project-NaryJoin patterns.
> - Fixed incorrect unlock mode in DefineRelation.
> - Fixed the upgrade process for external tables with dropped columns.
> - Fixed the formatting issue in SECURITY.md.
> - Fixed gp_gettmid to return the correct startup timestamp.
> - Fixed the gpload regression test failure when the OS user is not gpadmin.
> - Fixed the compiler warning in appendonlyblockdirectory.c.
> - Fixed missing reloptions in partition roots created using Cloudberry
> syntax.
> - Fixed the crash when calling get_ao_compression_ratio on HEAP tables.
> - Fixed incorrect sortOp and eqOp values generated by
> IsCorrelatedEqualityOpExpr.
> - Fixed the dependency bug involving minirepro and materialized views.
> - Fixed recursion handling in ALTER TABLE ... ENABLE/DISABLE TRIGGER.
> - Fixed SPE plans to display Partitions selected: 1 (out of 5).
> - Fixed incorrect hash-key generation for Redistribute Motion when creating
> paths for multi-DQA expressions.
> - Removed gp_enable_sort_distinct and noduplicates optimizations.
> - Fixed gpinitsystem Behave tests that use environment variables.
> - Fixed false alarms in gpcheckcat for pg_default_acl.
> - Fixed gpinitsystem failure with custom locale settings.
> - Fixed a panic in the greenplum_fdw test.
> - Fixed the failure in bitmap index null-array condition.
> - Fixed the compilation warning in gram.y.
> - Fixed multiple issues related to DistributedTransaction handling.
> - Fixed compile-time warnings in pg_basebackup code.
> - Fixed gplogfilter to correctly generate CSV output.
> - Fixed the assert in the OpExecutor node.
> - Fixed improper copying of group statistics in ORCA.
> - Fixed error reporting after ioctl() call in pg_upgrade --clone mode.
> - Fixed replay of CREATE DATABASE records on standby.
> - Fixed a minor memory leak in pg_dump.
> - Fixed parallel restore of foreign keys to partitioned tables.
> - Fixed the issue where the pg_appendonly entry was not removed during
> AO-to-HEAP table conversion.
> - Fixed assertion failure and segmentation fault in the backup code.
> - Fixed fallback behavior for non-default collations.
> - Fixed the subtransaction test for Python 3.10.
> - Fixed Windows client compilation of libpgcommon.
> - Fixed compiler warnings introduced by the Dynamic Scan commit.
> - Fixed the issue where CREATE OR REPLACE TRANSFORM failed.
> - Fixed compiler warnings for non-assert builds.
> - Fixed lock assertions in dshash.c.
> - Fixed \watch interaction with libedit on C.
>
> </feature list>
>
>
> Lirong Jian <[email protected]> 于2025年4月20日周日 23:31写道：
>
> > Thanks, Ed.
> >
> > Best,
> > Lirong
> >
> >
> > Ed Espino <[email protected]> 于2025年4月20日周日 16:37写道：
> >
> > > Hi all,
> > >
> > > As we prepare for the next release of Apache Cloudberry (Incubating), I’d
> > > like to volunteer to serve as the Release Manager (RM) for the upcoming
> > > 2.0.0 release.
> > >
> > > For those who may be new to the Apache process: the Release Manager is a
> > > community volunteer who coordinates the technical and procedural aspects
> > of
> > > a release. This includes tagging the release, assembling the source
> > > artifacts, verifying licensing and compliance (e.g., via Apache RAT),
> > > initiating votes on the dev@ and general@ mailing lists, and completing
> > > post-vote publication steps. While the RM helps move the release forward,
> > > all decisions are made openly and collaboratively.
> > >
> > > I’ve previously served in this role on other Apache projects, including
> > > Apache MADlib and Apache HAWQ (now in the Attic), so I’m familiar with
> > the
> > > overall release process. That said, I’ll be refreshing my understanding
> > of
> > > the specific requirements for incubating projects as we go, and will
> > ensure
> > > we follow ASF Incubator policy throughout.
> > >
> > > For anyone interested in reviewing the process or following along, the
> > > project’s Wiki includes our release procedures and checklists:
> > > https://github.com/apache/cloudberry/wiki
> > >
> > > I'll be documenting each step, so others can get familiar with the
> > process
> > > or jump in to assist. If there are any questions or concerns, please feel
> > > free to raise them here.
> > >
> > > Looking forward to working with everyone on this.
> > >
> > > Best,
> > > -=e
> > >
> > > --
> > > Ed Espino
> > > Apache Cloudberry (Incubating) & MADlib
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Volunteering to Serve as Release Manager for Apache Cloudberry (Incubating) 2.0.0

Reply via email to