On Tue, Apr 22, 2025 at 9:53 AM Maciej Jesionowski <yav...@gmail.com> wrote:
> Hi, > Hi Maciej, > Are these servers running multiple builds at a time, or is a Windows build > given the full host resources, i.e. 8c/16t and 64GB of RAM? If you can > monitor the resources in real time, it would be interesting to confirm if > indeed the CPU utilization is significantly lower than 100%, meaning > something else is bottlenecking it. > There is a reasonable probability that the systems are processing multiple builds at any given moment. During the normal working hours in Europe that is almost guaranteed to be the case. I am almost 100% certain that the Krita build times on Windows are being adversely affected by Docker on Windows inefficiencies, with speedups likely to be significant for Krita if what I saw with Craft builds was anything to go by. (Craft builds would essentially be unworkable without the punch through we currently provide) > > I'm not sure what's the expected build time of a native (i.e. no > docker/VM/etc.) build on these servers, but for reference, I'm seeing a bit > less than 13 minutes on a stock Ryzen 9 9950X (a developer build, including > test apps). I can see this number go way up with less cores, but still, > over 100 minutes is very long. > Is that a clean build or an incremental build? These servers are https://www.hetzner.com/dedicated-rootserver/ax52/ for the record. > Thanks, > Maciej. > Cheers, Ben > > On Mon, Apr 21, 2025 at 9:15 PM Ben Cooksley <bcooks...@kde.org> wrote: > >> On Tue, Apr 22, 2025 at 5:57 AM Dmitry Kazakov <dimul...@gmail.com> >> wrote: >> >>> Hi, Ben! >>> >> >> Hey Dmitry, >> >> >>> >>> As for Krita, most of CI time is spent on the Windows pipeline, which >>> build extremely slowly due to done obscure filesystem issues (searching >>> includes is extremely slow). I personally don't know how to fix it. I >>> tried: 1) PCH builds, 2) relative includes, 3) split debug info (dwo). The >>> only solution left is to rewrite a huge portion of Krita to reduce amount >>> of includes. Which is, obviously, not an option atm. >>> >> >> This is probably at least in part due to Windows on Docker having >> extremely poor file system performance even vs. straight NTFS (which isn't >> great to begin with). >> That will be fixed by VM based CI (progress update - I have most of the >> tool that will manage the underlying base images written now, just need to >> finish the VM provisioning part and give it some serious testing) >> >> >>> >>> Another point that requires extra build time for Krita is an >>> inappropriate timeout on 100 minutes. A lot of our windows builds are >>> terminated at around 95% completion because of this timeout, so we have to >>> rerun them and, effectively, consume more and more CI time. >>> >> >> Have you got a list of these so I can have a look to see if the timeout >> is set too low? >> Increasing the timeout is only a temporary fix though - we will need to >> find a solution to why the build time is taking so long. >> >> >>> >>> --- >>> Dmitry Kazakov >>> >> >> Cheers, >> Ben >> >> >>> >>> пт, 18 апр. 2025 г., 21:27 Ben Cooksley <bcooks...@kde.org>: >>> >>>> Hi all, >>>> >>>> Over the past week or two there have been a number of complaints >>>> regarding CI builder availability which i've done some investigating into >>>> this morning. >>>> >>>> Part of this is related to the Windows CI builders falling offline due >>>> to OOM events, however the rest is simply due to a lack of builder time >>>> availability (which is what this email is focused on). >>>> >>>> Given we have 6 Hetzner AX51 servers connected to Gitlab (each equipped >>>> with a Ryzen 7 7700 CPU, 64GB RAM and NVMe storage) the issue is not >>>> available build power - it is the number of builds and the length of those >>>> builds that is at issue. >>>> >>>> This morning I ran a basic query to ascertain the top 20 projects for >>>> CI time utilisation on invent.kde.org which revealed the following: >>>> >>>> full_path | time_used | job_count >>>> ------------------------------+------------------+----------- >>>> plasma/kwin | 320:47:04.966412 | 2387 >>>> graphics/krita | 178:03:19.080763 | 423 >>>> multimedia/kdenlive | 174:08:09.876842 | 697 >>>> network/ruqola | 173:17:47.311305 | 555 >>>> plasma/plasma-workspace | 155:10:03.618929 | 660 >>>> network/neochat | 138:03:23.926652 | 1546 >>>> education/kstars | 129:49:17.74229 | 329 >>>> sysadmin/ci-management | 111:21:09.739792 | 154 >>>> plasma/plasma-desktop | 108:56:52.849433 | 776 >>>> kde-linux/kde-linux-packages | 81:00:10.001937 | 33 >>>> kdevelop/kdevelop | 59:40:51.54474 | 217 >>>> office/kmymoney | 54:32:00.24623 | 271 >>>> frameworks/kio | 53:54:19.046685 | 690 >>>> education/labplot | 52:36:30.343671 | 245 >>>> murveit/kstars | 52:32:56.882728 | 128 >>>> frameworks/kirigami | 47:07:19.172935 | 1627 >>>> system/dolphin | 46:09:58.02836 | 705 >>>> kde-linux/kde-linux | 39:25:54.052469 | 46 >>>> utilities/kate | 36:09:22.18958 | 356 >>>> wreissenberger/kstars | 35:58:14.120515 | 105 >>>> >>>> If we look closely, KStars has three spots on this list (totalling 216 >>>> hours of time used, making it the biggest app user of CI time). >>>> >>>> Projects on the above list are asked to please review their jobs and >>>> how they are conducting development to ensure CI time is used efficiently >>>> and appropriately. >>>> >>>> Other projects should also please review their usage and optimise >>>> accordingly even if they're not on this list as there is efficiencies to be >>>> found in all projects. >>>> >>>> When reviewing the list of CI builds projects have enabled, it is >>>> important to consider to what degree your project benefits from having >>>> various builds enabled. One common pattern i've seen is having Alpine, SUSE >>>> Qt 6.9 and SUSE Qt 6.10 all enabled. >>>> >>>> If you need to verify building on Alpine / MUSL type systems and wish >>>> to monitor for Qt Next regressions then you probably shouldn't have a >>>> conventional Linux Qt stable build as those two jobs between them already >>>> cover that list of permutations. >>>> >>>> I've taken a quick look at some of these and can suggest the following: >>>> >>>> KWin: it has two conventional Linux jobs (suse_qt69 and suse_qt610) >>>> plus a custom reduced feature set job. It seems like one of these >>>> conventional Linux jobs should be dropped. >>>> >>>> KStars: Appears to have a custom Linux job in addition to a >>>> conventional Linux job. Choose one please. >>>> >>>> Ruqola: Appears to be conducting a development process whereby changes >>>> are made in stable then immediately merged to master in a ever continuing >>>> loop. Please discontinue this behaviour and only periodically merge stable >>>> to master. >>>> >>>> Also needs to drop one of it's Linux jobs as they're duplicating >>>> functionality as noted above. >>>> >>>> Plasma Workspace/Desktop: At least in part this seems to be driven by >>>> Appium tests. Please reduce the number of these and/or streamline the >>>> process for running an Appium test. Consideration should be given to >>>> enabling the CI option use-ccache as well. >>>> >>>> KDevelop: Please enable the CI option use-ccache. >>>> >>>> Labplot: Appears to have a strange customisation in place to the >>>> standard jobs which shouldn't be necessary as flags in .kde-ci.yml should >>>> permit that to be done. >>>> >>>> Thanks, >>>> Ben >>>> >>>