On Sat, Nov 30, 2024 at 11:07 PM Ed Espino <esp...@apache.org> wrote: > > Hi everyone, > > I'd like to start a discussion about implementing public metrics tracking > for our GitHub Actions workflows. Having previously worked with the > Greenplum team, I saw firsthand how valuable build and test metrics can be > for project health monitoring. When Greenplum moved from their original CI > system to Concourse CI, we lost this capability. I believe now is a good > time to reintroduce this kind of metrics tracking for Cloudberry, with an > emphasis on making it publicly accessible to benefit our entire community. > > As context, being part of the Apache Software Foundation means we need to > be thoughtful about resource usage, leveraging free resources where > possible. We currently use GitHub-hosted runners for our container > execution, which comes with certain resource constraints. Understanding our > resource utilization patterns could help us: > > - Identify environment-related issues > - Optimize test execution within resource limits > - Detect product performance regressions > - Highlight test inefficiencies > - Make informed decisions about infrastructure needs > > Proposed Benefits: > > - Transparent view of project health for users and contributors > - Track test stability over time > - Identify problematic or flaky tests > - Monitor build performance trends > - Support data-driven decisions about test infrastructure > - Enable community members to investigate test failures > - Generate metrics for project health reporting > - Optimize resource usage within GitHub-hosted runner constraints > > Data Collection Overview: > > We propose tracking the following categories of information: > > System & Environment Data: > > - OS environments and versions > - Container images and versions > - Build configurations > - Resource metrics (memory, disk usage, execution time limits) > - GitHub runner resource constraints and utilization > > Workflow-Level Metrics: > > - Build timestamps and duration > - Overall workflow status > - Branch type (main vs feature branches) > - Type of trigger (merge, PR, manual) > - Resource consumption patterns > > Build Metrics: > > - Build status and duration > - Artifact generation success > - Configuration details > - Resource utilization > - Memory and disk space usage > - Build timeouts or resource-related failures > > Test Suite Metrics: > > - Suite name and configuration > - Total/passed/failed/ignored test counts > - Test duration > - Categories of test failures > - System resource metrics during test runs > - Resource constraint impacts > > What We Explicitly Won't Track: > > - Individual committer names or IDs > - PR author information > - Blame/attribution data > - Individual developer metrics > > The goal is to focus on systemic patterns and project health: > > - Identify unstable test patterns > - Track performance trends > - Monitor resource utilization > - Detect infrastructure issues > - Support release quality metrics > - Optimize resource usage > > This data would allow us to answer questions like: > > - Which test suites have become less stable over time? > - Do certain configurations consistently show problems? > - Are there patterns in test failures across different environments? > - How do infrastructure changes impact build performance? > - What are our most resource-intensive tests? > - Where are we hitting GitHub-hosted runner limits? > - Which tests are most affected by resource constraints? > > Technical Implementation: > > - Store metrics in PostgreSQL database (via ASF Infra) > - Public read access through a web dashboard > - Metrics collection from GitHub-hosted runner workflows > - Estimated storage needs: ~250MB initially, ~100MB annual growth > - Data retention: Full history preserved > - Access: Public read access, write access limited to GitHub Actions > > Given our project's expertise with PostgreSQL, we're well-positioned to > implement and maintain this system. We could also share our experience with > other ASF projects interested in similar public metrics collection, > particularly those also operating under resource constraints. > > Questions for Discussion: > > 1. Would this kind of public metrics tracking be valuable to you and our > user community? > 2. What specific metrics would be most useful for users and contributors? > 3. How would you envision the community using this data? > 4. Any concerns about implementation, maintenance, or data visibility? > 5. Ideas for making the metrics more accessible and useful to the > community? > 6. Suggestions for dashboard features that would benefit users and > contributors? > 7. What resource utilization metrics would be most helpful to track? > > If there's support for this initiative, I'll submit an ASF Infra request > for the required PostgreSQL database. > > For reference, here's our current GitHub Actions workflow: > https://github.com/apache/cloudberry/blob/main/.github/workflows/build-cloudberry.yml > > Looking forward to your thoughts and suggestions on making our project > metrics more transparent and accessible to everyone.
This is a fantastic idea! That said, are there any hosted tools (like any GH features, etc?) that would allow us to skip maintaining custom PostgreSQL DB? Thanks, Roman. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org For additional commands, e-mail: dev-h...@cloudberry.apache.org