I just managed to get CI to pass. All I had to do was to move two suites to the top of the greenplum schedule: https://github.com/edespino/cloudberry/actions/runs/18274708119
-=e Ed Espino 925.389.4640 On Mon, Oct 6, 2025 at 1:02 AM Leonid Borchuk <[email protected]> wrote: > Hi, folk! > > I tried to investigate the issue too. Anyway, I can't finish PR without > tests. > > What I see: > > 1. There is no "evil" file or commit that broke the tests. What we stumbled > upon - Out of space. I saw some generated coredump files in tests, but they > also seemed to be a consequence of space exhaustion. > > df at the end of tests: > > Show disk usage info > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1015 > >Filesystem > Type Size Used Avail Use% Mounted on > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1016 > >overlay > overlay 73G 73G 99M 100% / > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1017 > >tmpfs > tmpfs 64M 0 64M 0% /dev > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1018 > >shm > tmpfs 2.0G 68K 2.0G 1% /dev/shm > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1019 > >/dev/root > ext4 73G 73G 99M 100% /__w > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1020 > >tmpfs > tmpfs 3.2G 9.2M 3.2G 1% /run/docker.sock > > 2. The top space consumer is cloudberry (One could find debug output in my > repo > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548 > ). > All looks like we had been growing a little bit with each additional test, > and at the end has reached the limit of space. > > sudo du -c / | sort -n | tail -150 shows: > > 621320 /usr/lib64 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2459 > >655388 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/pg_wal > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2460 > >655388 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/pg_wal > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2461 > >655392 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/pg_wal > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2462 > >655396 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/pg_wal > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2463 > >663736 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/standby/base > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2464 > >664192 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/base > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2465 > >720932 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_wal > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2466 > >720936 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/pg_wal > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2467 > >856204 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/base/17018 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2468 > >856996 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/base/17018 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2469 > >904508 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/pg_distributedlog > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2470 > >910092 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_distributedlog > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2471 > >984940 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base/17018 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2472 > >985876 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/base/17018 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2473 > >1049388 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base/17018 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2474 > >1049444 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/base/17018 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2475 > >1225744 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/base > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2476 > >1226736 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/base > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2477 > >1277720 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/standby > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2478 > >1354452 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2479 > >1355584 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/base > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2480 > >1420728 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2481 > >1420796 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/base > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2482 > >1639176 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/pg_subtrans > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2483 > >1888592 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2484 > >1888596 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2485 > >1893504 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2486 > >1893508 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2487 > >1904984 > /usr > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2488 > >2017300 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2489 > >2017304 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2490 > >2023864 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2491 > >2023868 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2492 > >2549268 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_subtrans > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2493 > >3049412 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2494 > >3049416 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2495 > >3133184 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0 > > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2496 > >3133188 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2497 > >5735508 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2498 > >5735512 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1 > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2499 > >21019248 > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2500 > >21019320 > /__w/cloudberry/cloudberry/gpAux/gpdemo > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2501 > >21020984 > /__w/cloudberry/cloudberry/gpAux > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2502 > >22435396 > /__w/cloudberry/cloudberry > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2503 > >22435400 > /__w/cloudberry > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2504 > >22454024 > /__w > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2505 > >25110408 > / > < > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2506 > >25110408 > total > > Some observations from here: > > A. The total space size of one test is ~25Gb > B. pg_subtrans on segments ~ 2,5 Gb and more then data size > C. pg_distributedlog on segments ~ 1 Gb > D. pg_wal on segments ~700 mb > > So the envelope math here is: > > I. We have 3 heavy tests: ic-good-opt-on, ic-good-opt-off, ic-cbdb-parallel > II. All tests executed in parallel > III. Each test have 3 segments and 3 mirrors > IV. Total needed space size for tests 3 (the number of parallel tests) x > (pg_wal_size * 6 + pg_subtrans_size * 4 + pg_distributedlog * 2 ), which is > 3 * (0,7 * 6 + 2,5 * 4 + 1 * 2) = 48,6 Gb > > My thought and questions from here: > > 1. Could we set max-parallel in strategy to 2 (this will lengthen the > tests) > ? > 2. Could we set archive_command to /bin/true and do not store WAL files? > 3. How to understand why pg_subtrans is so big? There should be a long > transaction + a lot of subtransactions (savepoints?) - but 2,5 Gb ... > > On Sat, Oct 4, 2025 at 9:54 PM Ed Espino <[email protected]> wrote: > > > I have an updated mechanism to free unused space in the test > environments. > > Unfortunately, this is not resolving the testing issues. I will be > > attempting to isolate the issue to any recent code and CI changes. > > Additionally after a conversation with Tushar, I will be reaching out to > > the Apache Infrastructure team to identify necessary steps to use larger > CI > > resources (if possible). > > > > Stay tuned, > > -=e > > > > On Fri, Oct 3, 2025 at 2:50 AM Dianjin Wang <[email protected]> > wrote: > > > > > Cool, thanks Ed! > > > > > > > > > > > > Best, > > > Dianjin Wang > > > > > > > > > Ed Espino <[email protected]>于2025年10月3日 周五17:08写道: > > > > > > > I have determined that the test container is running out of disk > space > > > and > > > > this is leading to the testing issues. I am trying to determine if it > > is > > > > possible to clean up unused artifacts in the test container prior to > > test > > > > execution. > > > > > > > > -=e > > > > > > > > On Thu, Oct 2, 2025 at 9:51 PM Ed Espino <[email protected]> wrote: > > > > > > > > > I'll take a look. > > > > > > > > > > -=e > > > > > > > > > > -- > > > > > Ed Espino > > > > > Apache Cloudberry (Incubating) & MADlib > > > > > > > > > > On Thu, Oct 2, 2025 at 8:50 PM Dianjin Wang <[email protected] > > > > > > wrote: > > > > > > > > > >> Hi, > > > > >> > > > > >> I’m wondering if the CI might be having issues. On my PR #1358, > the > > > > >> jobs `ic-good-opt-off`, `ic-good-opt-on`, and `ic-cbdb-parallel` > > have > > > > >> been failing consistently, even after multiple reruns. > > > > >> > > > > >> I also noticed similar failures happening on other PRs. Could > > someone > > > > >> help check if the CI is currently down or unstable? > > > > >> > > > > >> > > > > >> Best, > > > > >> Dianjin Wang > > > > >> > > > > >> > > --------------------------------------------------------------------- > > > > >> To unsubscribe, e-mail: [email protected] > > > > >> For additional commands, e-mail: [email protected] > > > > >> > > > > >> > > > > > > > > > > > > > -- > > Ed Espino > > Apache Cloudberry (Incubating) & MADlib > > >
