I just managed to get CI to pass. All I had to do was to move two suites to
the top of the greenplum schedule:
https://github.com/edespino/cloudberry/actions/runs/18274708119


-=e

Ed Espino
925.389.4640


On Mon, Oct 6, 2025 at 1:02 AM Leonid Borchuk <[email protected]> wrote:

> Hi, folk!
>
> I tried to investigate the issue too. Anyway, I can't finish PR without
> tests.
>
> What I see:
>
> 1. There is no "evil" file or commit that broke the tests. What we stumbled
> upon - Out of space. I saw some generated coredump files in tests, but they
> also seemed to be a consequence of space exhaustion.
>
> df at the end of tests:
>
> Show disk usage info
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1015
> >Filesystem
> Type Size Used Avail Use% Mounted on
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1016
> >overlay
> overlay 73G 73G 99M 100% /
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1017
> >tmpfs
> tmpfs 64M 0 64M 0% /dev
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1018
> >shm
> tmpfs 2.0G 68K 2.0G 1% /dev/shm
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1019
> >/dev/root
> ext4 73G 73G 99M 100% /__w
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1020
> >tmpfs
> tmpfs 3.2G 9.2M 3.2G 1% /run/docker.sock
>
> 2. The top space consumer is cloudberry (One could find debug output in my
> repo
>
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548
> ).
> All looks like we had been growing a little bit with each additional test,
> and at the end has reached the limit of space.
>
> sudo du -c / | sort -n | tail -150 shows:
>
> 621320 /usr/lib64
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2459
> >655388
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/pg_wal
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2460
> >655388
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/pg_wal
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2461
> >655392
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/pg_wal
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2462
> >655396
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/pg_wal
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2463
> >663736
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/standby/base
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2464
> >664192
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/base
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2465
> >720932
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_wal
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2466
> >720936
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/pg_wal
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2467
> >856204
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/base/17018
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2468
> >856996
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/base/17018
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2469
> >904508
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/pg_distributedlog
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2470
> >910092
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_distributedlog
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2471
> >984940
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base/17018
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2472
> >985876
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/base/17018
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2473
> >1049388
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base/17018
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2474
> >1049444
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/base/17018
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2475
> >1225744
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/base
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2476
> >1226736
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/base
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2477
> >1277720
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/standby
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2478
> >1354452
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2479
> >1355584
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/base
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2480
> >1420728
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2481
> >1420796
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/base
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2482
> >1639176
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/pg_subtrans
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2483
> >1888592
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2484
> >1888596
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2485
> >1893504
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2486
> >1893508
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2487
> >1904984
> /usr
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2488
> >2017300
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2489
> >2017304
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2490
> >2023864
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2491
> >2023868
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2492
> >2549268
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_subtrans
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2493
> >3049412
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2494
> >3049416
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2495
> >3133184
>
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
>
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2496
> >3133188
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2497
> >5735508
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2498
> >5735512
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2499
> >21019248
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2500
> >21019320
> /__w/cloudberry/cloudberry/gpAux/gpdemo
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2501
> >21020984
> /__w/cloudberry/cloudberry/gpAux
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2502
> >22435396
> /__w/cloudberry/cloudberry
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2503
> >22435400
> /__w/cloudberry
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2504
> >22454024
> /__w
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2505
> >25110408
> /
> <
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2506
> >25110408
> total
>
> Some observations from here:
>
> A. The total space size of one test is ~25Gb
> B. pg_subtrans on segments ~ 2,5 Gb and more then data size
> C. pg_distributedlog on segments ~ 1 Gb
> D. pg_wal on segments ~700 mb
>
> So the envelope math here is:
>
> I. We have 3 heavy tests: ic-good-opt-on, ic-good-opt-off, ic-cbdb-parallel
> II. All tests executed in parallel
> III. Each test have 3 segments and 3 mirrors
> IV. Total needed space size for tests 3 (the number of parallel tests) x
> (pg_wal_size * 6 + pg_subtrans_size * 4 + pg_distributedlog * 2 ), which is
> 3 * (0,7 * 6 + 2,5 * 4 + 1 * 2) = 48,6 Gb
>
> My thought and questions from here:
>
> 1. Could we set max-parallel in strategy to 2 (this will lengthen the
> tests)
> ?
> 2. Could we set archive_command to /bin/true and do not store WAL files?
> 3. How to understand why pg_subtrans is so big? There should be a long
> transaction + a lot of subtransactions (savepoints?) - but 2,5 Gb ...
>
> On Sat, Oct 4, 2025 at 9:54 PM Ed Espino <[email protected]> wrote:
>
> > I have an updated mechanism to free unused space in the test
> environments.
> > Unfortunately, this is not resolving the testing issues. I will be
> > attempting to isolate the issue to any recent code and CI changes.
> > Additionally after a conversation with Tushar, I will be reaching out to
> > the Apache Infrastructure team to identify necessary steps to use larger
> CI
> > resources (if possible).
> >
> > Stay tuned,
> > -=e
> >
> > On Fri, Oct 3, 2025 at 2:50 AM Dianjin Wang <[email protected]>
> wrote:
> >
> > > Cool, thanks Ed!
> > >
> > >
> > >
> > > Best,
> > > Dianjin Wang
> > >
> > >
> > > Ed Espino <[email protected]>于2025年10月3日 周五17:08写道:
> > >
> > > > I have determined that the test container is running out of disk
> space
> > > and
> > > > this is leading to the testing issues. I am trying to determine if it
> > is
> > > > possible to clean up unused artifacts in the test container prior to
> > test
> > > > execution.
> > > >
> > > > -=e
> > > >
> > > > On Thu, Oct 2, 2025 at 9:51 PM Ed Espino <[email protected]> wrote:
> > > >
> > > > > I'll take a look.
> > > > >
> > > > > -=e
> > > > >
> > > > > --
> > > > > Ed Espino
> > > > > Apache Cloudberry (Incubating) & MADlib
> > > > >
> > > > > On Thu, Oct 2, 2025 at 8:50 PM Dianjin Wang <[email protected]
> >
> > > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I’m wondering if the CI might be having issues. On my PR #1358,
> the
> > > > >> jobs `ic-good-opt-off`, `ic-good-opt-on`, and `ic-cbdb-parallel`
> > have
> > > > >> been failing consistently, even after multiple reruns.
> > > > >>
> > > > >> I also noticed similar failures happening on other PRs. Could
> > someone
> > > > >> help check if the CI is currently down or unstable?
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Dianjin Wang
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: [email protected]
> > > > >> For additional commands, e-mail: [email protected]
> > > > >>
> > > > >>
> > > >
> > >
> >
> >
> > --
> > Ed Espino
> > Apache Cloudberry (Incubating) & MADlib
> >
>

Reply via email to