Great, is it reproducible? I mean, restart it two more times to be sure it's not an accident.
On Mon, Oct 6, 2025 at 12:44 PM Ed Espino <[email protected]> wrote: > I just managed to get CI to pass. All I had to do was to move two suites to > the top of the greenplum schedule: > https://github.com/edespino/cloudberry/actions/runs/18274708119 > > > -=e > > Ed Espino > 925.389.4640 > > > On Mon, Oct 6, 2025 at 1:02 AM Leonid Borchuk <[email protected]> > wrote: > > > Hi, folk! > > > > I tried to investigate the issue too. Anyway, I can't finish PR without > > tests. > > > > What I see: > > > > 1. There is no "evil" file or commit that broke the tests. What we > stumbled > > upon - Out of space. I saw some generated coredump files in tests, but > they > > also seemed to be a consequence of space exhaustion. > > > > df at the end of tests: > > > > Show disk usage info > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1015 > > >Filesystem > > Type Size Used Avail Use% Mounted on > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1016 > > >overlay > > overlay 73G 73G 99M 100% / > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1017 > > >tmpfs > > tmpfs 64M 0 64M 0% /dev > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1018 > > >shm > > tmpfs 2.0G 68K 2.0G 1% /dev/shm > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1019 > > >/dev/root > > ext4 73G 73G 99M 100% /__w > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1020 > > >tmpfs > > tmpfs 3.2G 9.2M 3.2G 1% /run/docker.sock > > > > 2. The top space consumer is cloudberry (One could find debug output in > my > > repo > > > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548 > > ). > > All looks like we had been growing a little bit with each additional > test, > > and at the end has reached the limit of space. > > > > sudo du -c / | sort -n | tail -150 shows: > > > > 621320 /usr/lib64 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2459 > > >655388 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/pg_wal > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2460 > > >655388 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/pg_wal > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2461 > > >655392 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/pg_wal > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2462 > > >655396 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/pg_wal > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2463 > > >663736 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/standby/base > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2464 > > >664192 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/base > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2465 > > >720932 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_wal > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2466 > > >720936 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/pg_wal > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2467 > > >856204 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/base/17018 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2468 > > >856996 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/base/17018 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2469 > > >904508 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/pg_distributedlog > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2470 > > >910092 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_distributedlog > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2471 > > >984940 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base/17018 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2472 > > >985876 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/base/17018 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2473 > > >1049388 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base/17018 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2474 > > >1049444 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/base/17018 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2475 > > >1225744 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/base > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2476 > > >1226736 > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/base > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2477 > > >1277720 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/standby > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2478 > > >1354452 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2479 > > >1355584 > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/base > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2480 > > >1420728 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2481 > > >1420796 > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/base > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2482 > > >1639176 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/pg_subtrans > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2483 > > >1888592 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2484 > > >1888596 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2485 > > >1893504 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2486 > > >1893508 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2487 > > >1904984 > > /usr > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2488 > > >2017300 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2489 > > >2017304 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2490 > > >2023864 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2491 > > >2023868 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2492 > > >2549268 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_subtrans > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2493 > > >3049412 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2494 > > >3049416 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2495 > > >3133184 > > > > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0 > > > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2496 > > >3133188 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2497 > > >5735508 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2498 > > >5735512 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1 > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2499 > > >21019248 > > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2500 > > >21019320 > > /__w/cloudberry/cloudberry/gpAux/gpdemo > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2501 > > >21020984 > > /__w/cloudberry/cloudberry/gpAux > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2502 > > >22435396 > > /__w/cloudberry/cloudberry > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2503 > > >22435400 > > /__w/cloudberry > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2504 > > >22454024 > > /__w > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2505 > > >25110408 > > / > > < > > > https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2506 > > >25110408 > > total > > > > Some observations from here: > > > > A. The total space size of one test is ~25Gb > > B. pg_subtrans on segments ~ 2,5 Gb and more then data size > > C. pg_distributedlog on segments ~ 1 Gb > > D. pg_wal on segments ~700 mb > > > > So the envelope math here is: > > > > I. We have 3 heavy tests: ic-good-opt-on, ic-good-opt-off, > ic-cbdb-parallel > > II. All tests executed in parallel > > III. Each test have 3 segments and 3 mirrors > > IV. Total needed space size for tests 3 (the number of parallel tests) x > > (pg_wal_size * 6 + pg_subtrans_size * 4 + pg_distributedlog * 2 ), which > is > > 3 * (0,7 * 6 + 2,5 * 4 + 1 * 2) = 48,6 Gb > > > > My thought and questions from here: > > > > 1. Could we set max-parallel in strategy to 2 (this will lengthen the > > tests) > > ? > > 2. Could we set archive_command to /bin/true and do not store WAL files? > > 3. How to understand why pg_subtrans is so big? There should be a long > > transaction + a lot of subtransactions (savepoints?) - but 2,5 Gb ... > > > > On Sat, Oct 4, 2025 at 9:54 PM Ed Espino <[email protected]> wrote: > > > > > I have an updated mechanism to free unused space in the test > > environments. > > > Unfortunately, this is not resolving the testing issues. I will be > > > attempting to isolate the issue to any recent code and CI changes. > > > Additionally after a conversation with Tushar, I will be reaching out > to > > > the Apache Infrastructure team to identify necessary steps to use > larger > > CI > > > resources (if possible). > > > > > > Stay tuned, > > > -=e > > > > > > On Fri, Oct 3, 2025 at 2:50 AM Dianjin Wang <[email protected]> > > wrote: > > > > > > > Cool, thanks Ed! > > > > > > > > > > > > > > > > Best, > > > > Dianjin Wang > > > > > > > > > > > > Ed Espino <[email protected]>于2025年10月3日 周五17:08写道: > > > > > > > > > I have determined that the test container is running out of disk > > space > > > > and > > > > > this is leading to the testing issues. I am trying to determine if > it > > > is > > > > > possible to clean up unused artifacts in the test container prior > to > > > test > > > > > execution. > > > > > > > > > > -=e > > > > > > > > > > On Thu, Oct 2, 2025 at 9:51 PM Ed Espino <[email protected]> > wrote: > > > > > > > > > > > I'll take a look. > > > > > > > > > > > > -=e > > > > > > > > > > > > -- > > > > > > Ed Espino > > > > > > Apache Cloudberry (Incubating) & MADlib > > > > > > > > > > > > On Thu, Oct 2, 2025 at 8:50 PM Dianjin Wang < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > >> Hi, > > > > > >> > > > > > >> I’m wondering if the CI might be having issues. On my PR #1358, > > the > > > > > >> jobs `ic-good-opt-off`, `ic-good-opt-on`, and `ic-cbdb-parallel` > > > have > > > > > >> been failing consistently, even after multiple reruns. > > > > > >> > > > > > >> I also noticed similar failures happening on other PRs. Could > > > someone > > > > > >> help check if the CI is currently down or unstable? > > > > > >> > > > > > >> > > > > > >> Best, > > > > > >> Dianjin Wang > > > > > >> > > > > > >> > > > --------------------------------------------------------------------- > > > > > >> To unsubscribe, e-mail: [email protected] > > > > > >> For additional commands, e-mail: [email protected] > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > -- > > > Ed Espino > > > Apache Cloudberry (Incubating) & MADlib > > > > > >
