Great, is it reproducible? I mean, restart it two more times to be sure it's
not an accident.

On Mon, Oct 6, 2025 at 12:44 PM Ed Espino <[email protected]> wrote:

> I just managed to get CI to pass. All I had to do was to move two suites to
> the top of the greenplum schedule:
> https://github.com/edespino/cloudberry/actions/runs/18274708119
>
>
> -=e
>
> Ed Espino
> 925.389.4640
>
>
> On Mon, Oct 6, 2025 at 1:02 AM Leonid Borchuk <[email protected]>
> wrote:
>
> > Hi, folk!
> >
> > I tried to investigate the issue too. Anyway, I can't finish PR without
> > tests.
> >
> > What I see:
> >
> > 1. There is no "evil" file or commit that broke the tests. What we
> stumbled
> > upon - Out of space. I saw some generated coredump files in tests, but
> they
> > also seemed to be a consequence of space exhaustion.
> >
> > df at the end of tests:
> >
> > Show disk usage info
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1015
> > >Filesystem
> > Type Size Used Avail Use% Mounted on
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1016
> > >overlay
> > overlay 73G 73G 99M 100% /
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1017
> > >tmpfs
> > tmpfs 64M 0 64M 0% /dev
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1018
> > >shm
> > tmpfs 2.0G 68K 2.0G 1% /dev/shm
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1019
> > >/dev/root
> > ext4 73G 73G 99M 100% /__w
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:1020
> > >tmpfs
> > tmpfs 3.2G 9.2M 3.2G 1% /run/docker.sock
> >
> > 2. The top space consumer is cloudberry (One could find debug output in
> my
> > repo
> >
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548
> > ).
> > All looks like we had been growing a little bit with each additional
> test,
> > and at the end has reached the limit of space.
> >
> > sudo du -c / | sort -n | tail -150 shows:
> >
> > 621320 /usr/lib64
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2459
> > >655388
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/pg_wal
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2460
> > >655388
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/pg_wal
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2461
> > >655392
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/pg_wal
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2462
> > >655396
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/pg_wal
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2463
> > >663736
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/standby/base
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2464
> > >664192
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/base
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2465
> > >720932
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_wal
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2466
> > >720936
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/pg_wal
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2467
> > >856204
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/base/17018
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2468
> > >856996
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/base/17018
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2469
> > >904508
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/pg_distributedlog
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2470
> > >910092
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_distributedlog
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2471
> > >984940
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base/17018
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2472
> > >985876
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/base/17018
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2473
> > >1049388
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base/17018
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2474
> > >1049444
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/base/17018
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2475
> > >1225744
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2/base
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2476
> > >1226736
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2/base
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2477
> > >1277720
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/standby
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2478
> > >1354452
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2479
> > >1355584
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1/base
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2480
> > >1420728
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2481
> > >1420796
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/base
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2482
> > >1639176
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/pg_subtrans
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2483
> > >1888592
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2484
> > >1888596
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror3
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2485
> > >1893504
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2486
> > >1893508
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast3
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2487
> > >1904984
> > /usr
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2488
> > >2017300
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2489
> > >2017304
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror2
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2490
> > >2023864
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2491
> > >2023868
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast2
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2492
> > >2549268
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0/pg_subtrans
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2493
> > >3049412
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2494
> > >3049416
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/qddir
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2495
> > >3133184
> >
> >
> /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
> >
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2496
> > >3133188
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast_mirror1
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2497
> > >5735508
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2498
> > >5735512
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs/dbfast1
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2499
> > >21019248
> > /__w/cloudberry/cloudberry/gpAux/gpdemo/datadirs
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2500
> > >21019320
> > /__w/cloudberry/cloudberry/gpAux/gpdemo
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2501
> > >21020984
> > /__w/cloudberry/cloudberry/gpAux
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2502
> > >22435396
> > /__w/cloudberry/cloudberry
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2503
> > >22435400
> > /__w/cloudberry
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2504
> > >22454024
> > /__w
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2505
> > >25110408
> > /
> > <
> >
> https://github.com/open-gpdb/cloudberry/actions/runs/18248898786/job/51961002548#step:14:2506
> > >25110408
> > total
> >
> > Some observations from here:
> >
> > A. The total space size of one test is ~25Gb
> > B. pg_subtrans on segments ~ 2,5 Gb and more then data size
> > C. pg_distributedlog on segments ~ 1 Gb
> > D. pg_wal on segments ~700 mb
> >
> > So the envelope math here is:
> >
> > I. We have 3 heavy tests: ic-good-opt-on, ic-good-opt-off,
> ic-cbdb-parallel
> > II. All tests executed in parallel
> > III. Each test have 3 segments and 3 mirrors
> > IV. Total needed space size for tests 3 (the number of parallel tests) x
> > (pg_wal_size * 6 + pg_subtrans_size * 4 + pg_distributedlog * 2 ), which
> is
> > 3 * (0,7 * 6 + 2,5 * 4 + 1 * 2) = 48,6 Gb
> >
> > My thought and questions from here:
> >
> > 1. Could we set max-parallel in strategy to 2 (this will lengthen the
> > tests)
> > ?
> > 2. Could we set archive_command to /bin/true and do not store WAL files?
> > 3. How to understand why pg_subtrans is so big? There should be a long
> > transaction + a lot of subtransactions (savepoints?) - but 2,5 Gb ...
> >
> > On Sat, Oct 4, 2025 at 9:54 PM Ed Espino <[email protected]> wrote:
> >
> > > I have an updated mechanism to free unused space in the test
> > environments.
> > > Unfortunately, this is not resolving the testing issues. I will be
> > > attempting to isolate the issue to any recent code and CI changes.
> > > Additionally after a conversation with Tushar, I will be reaching out
> to
> > > the Apache Infrastructure team to identify necessary steps to use
> larger
> > CI
> > > resources (if possible).
> > >
> > > Stay tuned,
> > > -=e
> > >
> > > On Fri, Oct 3, 2025 at 2:50 AM Dianjin Wang <[email protected]>
> > wrote:
> > >
> > > > Cool, thanks Ed!
> > > >
> > > >
> > > >
> > > > Best,
> > > > Dianjin Wang
> > > >
> > > >
> > > > Ed Espino <[email protected]>于2025年10月3日 周五17:08写道:
> > > >
> > > > > I have determined that the test container is running out of disk
> > space
> > > > and
> > > > > this is leading to the testing issues. I am trying to determine if
> it
> > > is
> > > > > possible to clean up unused artifacts in the test container prior
> to
> > > test
> > > > > execution.
> > > > >
> > > > > -=e
> > > > >
> > > > > On Thu, Oct 2, 2025 at 9:51 PM Ed Espino <[email protected]>
> wrote:
> > > > >
> > > > > > I'll take a look.
> > > > > >
> > > > > > -=e
> > > > > >
> > > > > > --
> > > > > > Ed Espino
> > > > > > Apache Cloudberry (Incubating) & MADlib
> > > > > >
> > > > > > On Thu, Oct 2, 2025 at 8:50 PM Dianjin Wang <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > >> Hi,
> > > > > >>
> > > > > >> I’m wondering if the CI might be having issues. On my PR #1358,
> > the
> > > > > >> jobs `ic-good-opt-off`, `ic-good-opt-on`, and `ic-cbdb-parallel`
> > > have
> > > > > >> been failing consistently, even after multiple reruns.
> > > > > >>
> > > > > >> I also noticed similar failures happening on other PRs. Could
> > > someone
> > > > > >> help check if the CI is currently down or unstable?
> > > > > >>
> > > > > >>
> > > > > >> Best,
> > > > > >> Dianjin Wang
> > > > > >>
> > > > > >>
> > > ---------------------------------------------------------------------
> > > > > >> To unsubscribe, e-mail: [email protected]
> > > > > >> For additional commands, e-mail: [email protected]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > Ed Espino
> > > Apache Cloudberry (Incubating) & MADlib
> > >
> >
>

Reply via email to