RAT audit complete and PR #1066 open for review — preparing for release
Hi all, I’ve completed a full Apache RAT license audit pass on the Cloudberry (Incubating) codebase. The audit now passes cleanly with zero unknown licenses. Apache RAT (Release Audit Tool) helps ensure that all source files in a project have appropriate license headers and that the codebase complies with ASF release policies. It’s a required step in preparing for an official Apache source release. A corresponding pull request has been opened here: *Add RAT license audit config and compliance metadata for ASF release* PR #1066 – https://github.com/apache/cloudberry/pull/1066 This PR introduces custom RAT matchers for all permissive licenses in use, handles legacy Greenplum-originated headers (EMC, VMware, Pivotal, Broadcom), and excludes the GPL-licensed pylint-0.21.0.tar.gz, which is pending removal. This completes a key requirement toward preparing for an ASF-compliant 2.0.0 release. *Next step:* If you have any outstanding development work, doc updates, or unresolved release blockers, please reply to the relevant dev threads so we can triage and close them out as needed. Thanks all, -=e -- Ed Espino Apache Cloudberry (Incubating) & MADlib
Re: [DISCUSS] Questions About deploy/ Directory and Vendor-Specific Artifacts
Hi Ed, Thanks for bringing this up — I’d like to contribute some context and suggestions as follows. Best, Dianjin Wang On Fri, Apr 25, 2025 at 5:32 AM Ed Espino wrote: > > - Is the `deploy/` directory still in active use or maintained? Files in `deploy/build/` were originally scattered under the top-level dir in Greenplum to guide users in building the project. We moved them here for better organization. These scripts and files are used to help users build Cloudberry on Linux and macOS. However, they do need updates — including bash compatibility, dependency management, and OS support. We might cherry-pick updates from Greenplum or revise them ourselves. The `vagrant/` directory, previously under `src/tools/vagrant/` in Greenplum, was also moved here. However, these files have not been maintained upstream for over 5–6 years and use VirtualBox as the provider, which has poor support for Apple silicon in my previous testing (I'm not sure how it goes now). > - Was it contributed primarily to support internal workflows, or is it > intended to serve as a general deployment guide for the community? Files under `k8s/`, `docker/`, `script/`, as well as `cbdb_deploy.sh` and `cbdb_env.sh`, were initially to support internal workflows. As you can see, some internal URLs referenced in the files are not usable by the general public. To my knowledge, these are also not used in the public Cloudberry workflows. > - If it isn’t currently usable without access to internal resources, should > we clarify that in the README or revisit how it’s presented? For legacy and unused files, I propose we remove them entirely: - `docker/` - `k8s/` - `script` - `vagrant` - `cbdb_deploy.sh` - `cbdb_env.sh` > - Would it be valuable to provide a community-accessible deployment path > using standard open tools (e.g., Docker Compose, Helm charts, Terraform)? +1. Starting with Docker Compose would be a great first step. It could help both new users and contributors get started more easily. > - Does the current setup align with ASF expectations around transparency > and community-driven development? As suggested above, once we clean up the internal and deprecated files, it should be more aligned with ASF’s expectations. > While it’s entirely understandable that early artifacts might reflect their > original environment, continued reliance on internal infrastructure could > make onboarding harder for external users — and may not reflect the Apache > spirit of openness and independence. Agree, I acknowledged with the vendor engineers that these files are not used anymore in their internal workflows. So we can clean up these files for better openness and public access. ~~~ BTW, I started a GitHub Discussion on cleanup of `deploy` dir[1] back in Feb, but we can continue the deep discussion on this mailing thread to gather more feedback and suggestions. [1] https://github.com/apache/cloudberry/discussions/948 - To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org For additional commands, e-mail: dev-h...@cloudberry.apache.org
Re: [DISCUSS] Questions About deploy/ Directory and Vendor-Specific Artifacts
Hi Dianjin, Thanks for the detailed context and thoughtful suggestions. I agree with your assessment that much of the content under deploy/—particularly docker/, k8s/, script/, vagrant, cbdb_deploy.sh, and cbdb_env.sh—is outdated and not aligned with Cloudberry's current or future direction as a community-led Apache project. Given that these artifacts rely on internal or deprecated setups and are not part of any active public workflow, I propose we proceed with deleting the entire deploy/ directory. This would remove ambiguity for new contributors and make the codebase cleaner and more compliant with ASF’s principles of openness and transparency. We can leave a placeholder issue or GitHub Discussion open to encourage community contributions toward a modern, accessible deployment workflow. Docker Compose seems like a natural starting point, and it would be great to see contributors take the lead in shaping tools that better reflect real-world usage scenarios across platforms. Let’s continue to gather input from others, but unless there are objections, we could aim to move forward with the cleanup in the next couple of weeks. Best, -=e -=e -- Ed Espino Apache Cloudberry (Incubating) & MADlib On Fri, Apr 25, 2025 at 1:28 AM Dianjin Wang wrote: > Hi Ed, > > Thanks for bringing this up — I’d like to contribute some context and > suggestions as follows. > > Best, > Dianjin Wang > > > On Fri, Apr 25, 2025 at 5:32 AM Ed Espino wrote: > > > > - Is the `deploy/` directory still in active use or maintained? > > Files in `deploy/build/` were originally scattered under the top-level > dir in Greenplum to guide users in building the project. We moved them > here for better organization. These scripts and files are used to help > users build Cloudberry on Linux and macOS. However, they do need > updates — including bash compatibility, dependency management, and OS > support. We might cherry-pick updates from Greenplum or revise them > ourselves. > > The `vagrant/` directory, previously under `src/tools/vagrant/` in > Greenplum, was also moved here. However, these files have not been > maintained upstream for over 5–6 years and use VirtualBox as the > provider, which has poor support for Apple silicon in my previous > testing (I'm not sure how it goes now). > > > - Was it contributed primarily to support internal workflows, or is it > > intended to serve as a general deployment guide for the community? > > Files under `k8s/`, `docker/`, `script/`, as well as `cbdb_deploy.sh` > and `cbdb_env.sh`, were initially to support internal workflows. As > you can see, some internal URLs referenced in the files are not usable > by the general public. > > To my knowledge, these are also not used in the public Cloudberry > workflows. > > > - If it isn’t currently usable without access to internal resources, > should > > we clarify that in the README or revisit how it’s presented? > > For legacy and unused files, I propose we remove them entirely: > - `docker/` > - `k8s/` > - `script` > - `vagrant` > - `cbdb_deploy.sh` > - `cbdb_env.sh` > > > - Would it be valuable to provide a community-accessible deployment path > > using standard open tools (e.g., Docker Compose, Helm charts, Terraform)? > > +1. Starting with Docker Compose would be a great first step. It could > help both new users and contributors get started more easily. > > > - Does the current setup align with ASF expectations around transparency > > and community-driven development? > > As suggested above, once we clean up the internal and deprecated > files, it should be more aligned with ASF’s expectations. > > > While it’s entirely understandable that early artifacts might reflect > their > > original environment, continued reliance on internal infrastructure could > > make onboarding harder for external users — and may not reflect the > Apache > > spirit of openness and independence. > > Agree, I acknowledged with the vendor engineers that these files are > not used anymore in their internal workflows. So we can clean up these > files for better openness and public access. > > ~~~ > > BTW, I started a GitHub Discussion on cleanup of `deploy` dir[1] back > in Feb, but we can continue the deep discussion on this mailing thread > to gather more feedback and suggestions. > > [1] https://github.com/apache/cloudberry/discussions/948 > > - > To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org > For additional commands, e-mail: dev-h...@cloudberry.apache.org > >
Re: Discussion: Change PAX Plugin to Be Disabled by Default?
Hi Dianjin, +1 — I agree with disabling PAX by default for the reasons you outlined. Keeping it consistent with how other contrib modules are handled makes a lot of sense, and avoiding potential build issues is always a plus. On a related note, now that PAX is merged, how do you envision it being incorporated into the user documentation? Will there be a dedicated section under optional extensions or something integrated into the standard build/setup guide with notes on --enable-pax and submodule fetching? Would be great to make sure users have a clear path if they do want to opt in. Thanks, -=e -- Ed Espino Apache Cloudberry (Incubating) & MADlib On Thu, Apr 24, 2025 at 11:23 PM Dianjin Wang wrote: > Hi all, > > I’d like to bring up a point regarding the PAX recently merged into > the main branch. > > Currently, the PAX is enabled by default in configure, and users need > to explicitly disable it via `--disable-pax` option. However, this > behavior is inconsistent with most of the other extensions under the > contrib/ or gpcontrib/ dir, which are typically disabled by default > unless explicitly enabled. > > I believe it would be more user-friendly and consistent to change the > default behavior of PAX to disabled, requiring users to opt-in via > `--enable-pax`. > > Here’s why: > > 1. Build Consistency with Other contrib Modules: Most contrib plugins > are not enabled by default. Changing PAX to follow this pattern aligns > with user expectations, especially for long-time Greenplum users. > > 2. Reduce Build Failures: PAX currently requires downloading several > submodules during the build. For users who install Cloudberry from > source without prior knowledge of this requirement, this will lead to > build failures, which can be confusing and frustrating. > > 3. Optional Nature of the Feature: PAX, while valuable, is not a core > feature that every user will need. Letting users opt in to building it > makes the installation process simpler for the general case. > > We can also update the build instructions and documentation to clearly > indicate how to enable PAX if needed (--enable-pax) and how to fetch > its required submodules. > > I’d love to hear your thoughts on this. If there’s a consensus, that > would be better to make this change before our release 2.0. > > Best, > Dianjin Wang > > - > To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org > For additional commands, e-mail: dev-h...@cloudberry.apache.org > >