Re: [D] Introducing the [perfmon] Extension for Cloudberry Database Monitoring [cloudberry]

via GitHub Wed, 07 May 2025 01:44:41 -0700


GitHub user fanfuxiaoran added a comment to the discussion: Introducing the 
[perfmon] Extension for Cloudberry Database Monitoring


Hi @edespino , thanks for your suggestions and questions!
> @fanfuxiaoran - Thanks for the detailed proposal—this is exciting 
> functionality. That said, the architecture and component naming (`gpmon`, 
> `gpmmon`, `gpsmon`) seem very reminiscent of the original **Greenplum 
> `gpperfmon`** system from GPDB 5 and 6.
> 
> Could you clarify a few points?
> 
> ### 🔄 Is this a revival or a full rewrite?
> * Is `perfmon` intended to **revive, rebrand, or reimplement** the legacy 
> `gpperfmon` monitoring stack from earlier Greenplum releases?
> * How much of the original design and codebase is reused or adapted?
> * Is this based on a fork of the old implementation, or built from scratch?
> 
The perfmon extension is derived from GPDB6's gpperfmon, with several 
significant modifications implemented.
1. perfmon is built as a shared library instead of a binary in GPDB6, it will 
be started as a backgroud worker by postgres if
  the guc `perfmon.enable` is on.  It is used as a extension, after `create 
extension perfmon`, all the tables used to store the data 
  will be created.
2. The original gpperfmon does not store query plan information in its 
queries_history table, whereas our enhanced perfmon extension captures this 
critical data. Each execution plan node includes key metrics such as rows 
processed and actual time cost, which are essential for users to analyze and 
troubleshoot slow queries effectively.
3. The way collecting running query statistics is totally different between 
perfmon and gpperfmon . In gpperfmon, when query is running, the postgres 
process will send the statistics to gpsmon  . This will lead to a lot of 
traffic. Considering the performance, we just discard this part. Instead, we 
introduce the `pg_query_state`. If user wants to track the performance of an 
active query,  can call `select pg_query_state(pid)`. 
> ### ⚙️ Build & Runtime Dependencies
> * What are the **current build and runtime dependencies**?
> * The original `gpsmon` used **`libsigar`**, which is an old and likely 
> abandoned project. Is it still required?
> * If `libsigar` is still in use, are there plans to replace it or mitigate 
> its maintenance risks?
> 
Yes, the `libsigar` is still in use. We don't have any plans to replace it 
currently.
> ### 🛠️ `configure` Integration
> * Previously, `gpperfmon` was enabled using a `configure` option like:
>   ```shell
>   --enable-gpperfmon      build with gpperfmon
>   ```
> * Will `perfmon` provide a similar `configure` option?
> 
Yes,  --enable-perfmon is used to build perfmon
> ### 📦 Installation & Deployment
> * How is the extension intended to be **installed and initialized**?
> * Will there be tooling or guidance to **create the `gpperfmon` database**, 
> or is manual setup expected?
> * A step-by-step outline of how users go from source to a fully monitored 
> cluster would be very helpful.
> 
Similar to Greenplum, we provide a Python script called 
gperfmon_install to assist users in setting up the gpperfmon database 
and performing additional preparatory tasks.
```
gpperfmon_install --port 5432 --enable --password 123456
gpstop -ari
```
Then perfmon will be enabled and gpmmon backgroud worker will be started to 
monitor the database.
> ### 🔤 Naming Consistency
> One suggestion: rather than retaining the `gpperfmon` naming convention, 
> consider **renaming the database and components together** using a `cb` 
> prefix (e.g., `cbperfmon`, `cbmon`, `cbmmon`, etc.).
> 
> Using a Cloudberry-specific naming convention would:
> 
> * Better reflect the project's identity and direction
> * Help distinguish new functionality from legacy Greenplum components
> * Reduce confusion around tool origin and long-term maintenance expectations
> 
> Even if the implementation shares technical roots with Greenplum, a 
> consistent and forward-looking naming strategy would reinforce the project's 
> independence and clarity for new adopters.
> 
Currently, we don't have any plans to rename them.  This needs a lot of work 
and may lead to a lot of conflicts. Such as if we rename gpperfmon database to 
cbperfmon database ,  other components will not work.
> ### 🧭 User Interface & Vision
> I believe this also raises a key question: what kind of **frontend or 
> user-accessible tooling** will be provided?
> 
> Previously, Greenplum included a **commercial, non-open source product** 
> known as **Greenplum Command Center (GPCC)**—developed and maintained by 
> Broadcom—which exposed real-time and historical metrics from the `gpperfmon` 
> database in a web-based interface. GPCC allowed users to:
> 
> * Monitor live queries and system usage
> * Analyze historical performance trends
> * Cancel runaway queries
> * Apply workload management policies
> 
> If `perfmon` is collecting similar telemetry, will users be expected to query 
> it directly via SQL, or will there be:
> 
> * Prebuilt dashboards (e.g., Grafana, Metabase)?
> * Planned integration with existing admin tools?
> * Custom visualizations or a future Cloudberry-native UI?
> 
> This feels like it should be part of a **larger vision for observability in 
> Cloudberry**. Is that the intent?
> 
Perfmon stores the data in the gpperfmon database using several tables and 
functions.
- system usage tables: system_*, diskspace_*, network_interface_*
- queries_now table for monitor live queries. 
- queries_history table for historical queries.

and one function `pg_query_state`
Users can directly use SQL to fetch them.  Other features of GPCC is  outside 
perfmon's scope. 

> ### 📚 Background for Contributors New to `gpperfmon`
> Not everyone in the Apache Cloudberry community may be familiar with 
> Greenplum's `gpperfmon`, so including some context would help the proposal 
> reach a broader audience.
> 
> Historically, `gpperfmon` provided:
> 
> * **System-level metrics** (`system_now`, `system_history`): CPU, memory, 
> disk, and network usage across segments
> * **Query-level stats** (`queries_now`, `queries_history`): execution time, 
> rows processed, query text, spill usage, error codes
> * **Segment insights** (`segment_history`): resource usage by host and segment
> * **Agent infrastructure**: `gpsmon` ran per-segment, with collected metrics 
> stored in the `gpperfmon` database
> * **Integration with GPCC**: GPCC consumed these metrics via SQL and exposed 
> them graphically to end users
> 
> Installation required:
> 
> * Creating the `gpperfmon` database via `gpperfmon_install`
> * Enabling metrics in `postgresql.conf` (`gp_enable_gpperfmon=on`, 
> `gpperfmon_port=8888`)
> * Creating a `gpmon` user and `.pgpass` entry for internal connectivity
> * Configuring `gpperfmon.conf` to tune thresholds like `min_query_time`
> 
> In summary, `gpperfmon` offered a structured, extensible monitoring backend 
> that was tied into a broader tooling ecosystem. If Cloudberry's `perfmon` is 
> modeled after this, outlining that full vision—including backend schema, 
> operational flow, and potential visualization plans—will help the community 
> evaluate, adopt, and contribute effectively.



GitHub link: 
https://github.com/apache/cloudberry/discussions/1087#discussioncomment-13060572

----
This is an automatically sent email for dev@cloudberry.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: dev-h...@cloudberry.apache.org

Re: [D] Introducing the [perfmon] Extension for Cloudberry Database Monitoring​ [cloudberry]

Reply via email to

Re: [D] Introducing the [perfmon] Extension for Cloudberry Database Monitoring [cloudberry]