Problem solved! Running git gc inside all of the git repos under /var/lib/go-server/pipelines/flyweight/ (as the go user) totally fixed this. Server CPU usage is back down to the expected single-digit range, and the git material updates are fast.
Now I know that running git gc in this case is safe (it's not safe for the config repo, per https://docs.gocd.org/current/advanced_usage/config_repo.html), and now I know what these flyweight directories are (it seems they are the git repos used for polling each distinct git material, though I haven't yet found documentation to confirm). Next I'll probably look at adding git gc on these repos as a daily or weekly cron. Thanks for your help and work on GoCD! On Wednesday, January 11, 2023 at 7:08:37 PM UTC-8 Chad Wilson wrote: > Interesting - those are large repos and do agree with your assessment > about it seeming CPU bound. > > GoCD does git gc repos, but that's possibly not the case for the > "flyweight" ones on the server which are used to monitor for changes, and > list out the changes between revisions inside GoCD. I believe you should > consider it safe to manually git gc all those repos (as long as the file > permissions aren't affected - do so as the correct user). > > On the other hand, with a fresh server, I'm surprised those clones would > have a large amount of garbage. I'm not a git internals expert, but > understood that such garbage accumulates over time rather than starting > that way. Did your old server perhaps have some old workaround > cron/supervisor job that git gced all of these manually I wonder to avoid > you getting into this state? > > You could also entirely remove those flyweights and let GoCD recreate them > to see if they get recreated in an "efficient" state without needing a git > gc. Perhaps test on a couple first. Obviously if you remove *all* of them > it will put a bit of load on your repository manager as GoCD/git re-clones > everything for a while. > > -Chad > > On Thu, Jan 12, 2023 at 2:50 AM Brandon V <[email protected]> wrote: > >> Ashwanth: looking at `iostat -k 2` I don't see any particularly high >> volume of disk io. The disk is 99% idle. The disk is an SSD, and it's the >> same kind that was used on our old GoCD server. The disk is an AWS EBS io2 >> volume to be specific, and so I've used the AWS metrics to check for any >> signs of disk latency and the disk seems to be performing smoothly. >> >> Chad: That's correct, we don't run agents on the GoCD server. For extra >> info about the rebuild methodology, we used the GoCD backup instructions as >> you linked to backup the old server, then restored from that backup on a >> new machine. Config files such as wrapper-properties.conf were restored, >> but it's theoretically possible that there were edits made in a way that >> wouldn't be captured in this backup/restore (e.g. a global git config >> file). Thanks for the extra clues about git material polling - I'll look >> into what config options are available. >> >> A bit more debugging: >> >> I would have to guess this git command is cpu-bound. Using `lsof -u go | >> grep git` to figure out what git is doing, it looks like the git commands >> are busy reading files like these: >> >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-47a1b4f9c1abd793f7783d61b3719520e8ee174e.pack >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-caef4ebaebbc1fccca5a2d637996f72dce2babb4.pack >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-bf49db07ef11d2f696b8b933b3443c7f185fadd4.pack >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-7755613d3efe8fdfcd63bfc16a62a8194b508c74.idx >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-3fcca0d6c0dd729a6a650b87583ec5424ec7c103.idx >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-0c24d29c94e20367351971b219ded2455aa496d3.idx >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-bf49db07ef11d2f696b8b933b3443c7f185fadd4.idx >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-caef4ebaebbc1fccca5a2d637996f72dce2babb4.idx >> >> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/packed-refs >> >> These look like git repos under /var/lib/go-server/pipelines/flyweight/ >> where GoCD is running these `git branch -r --contains <commit>` commands. I >> was able to copy one of these git repos and inspect it. Its size is 333M on >> disk, and has about 130000 commits. I confirmed that running `git branch -r >> --contains <commit>` with a recent commit id took about 30 seconds. Then I >> tried `git gc`. After that completed, the same `git branch -r --contains >> <commit>` took less than 1 second. >> >> On Wednesday, January 11, 2023 at 8:29:04 AM UTC-8 Chad Wilson wrote: >> >>> Yeah, I would start by investigating what is different about raw git >>> performance as Ashwanth alludes to. Just to clarify, you don't run agents >>> on the same machine as the server, do you? >>> >>> These are forked processes from GoCD, and the speed at which a given git >>> operation runs isn't really affected by GoCD in any relevant way I can >>> think of. If there is a difference in the performance of the location >>> where GoCD server is storing its temporary artifacts >>> <https://docs.gocd.org/current/installation/install/server/linux.html> >>> between new and old (/var/lib-go-server mount, normally), that might be >>> something to look at. >>> >>> The thread dumps you post are generally just indicative of the material >>> subsystem being very slow and not being able to get through the number >>> requiring material checks/polls in the allotted time. There are many posts >>> and such about slow performance around the place e.g, here >>> <https://github.com/gocd/gocd/issues/10480>, here >>> <https://github.com/gocd/gocd/issues/9588> and elsewhere in GitHub >>> going back a well - perhaps can mine the existing resources for ideas. >>> >>> As to why things appear to have degraded since your previous server >>> version, it's a bit difficult to tell. What was your rebuild methodology? >>> Did you follow >>> https://docs.gocd.org/current/advanced_usage/one_click_backup.html#restoring-gocd-using-backup? >>> >>> Are you sure you restored the config folder (wrapper-properties.conf and >>> similar) and any overrides to GoCD memory allocation, system properties >>> etc? It's conceptually possible that your previous configuration had tuned >>> the material subsystem to either do more in parallel or to poll less >>> frequently and such, or that you had some global git config which somehow >>> affected git performance - but without more detail, it's probably just idle >>> speculation. >>> >>> -Chad >>> >>> >>> >>> On Wed, Jan 11, 2023 at 7:27 PM 'Ashwanth Kumar' via go-cd < >>> [email protected]> wrote: >>> >>>> Can you check the avg. disk io on the machine? Something like: >>>> >>>> $ iostat -k 2 >>>> >>>> Also are you using SSDs or HDDs on the GoCD Server? If you're using >>>> HDDs see if you can run the same setup on SSDs to see if there is a >>>> difference. >>>> >>>> Thanks, >>>> >>>> >>>> On Wed, 11 Jan 2023 at 12:28, Brandon V <[email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> Our GoCD server is acting slow. We have >1000 pipelines that trigger >>>>> automatically on updates to various branches of a Git repository. For >>>>> each >>>>> branch that needs to be picked up by GoCD, we have a distinct git >>>>> material >>>>> with shallow clone set to true. The time to trigger from Git updates is >>>>> quite slow. The UI is also a bit sluggish in general. >>>>> >>>>> Environment info: >>>>> Linux >>>>> GoCD server 21.1.0 >>>>> PostgreSQL 12.8 database >>>>> >>>>> The server was recently recreated updating the OS from Ubuntu 16.04 to >>>>> Ubuntu 20.04. The most important pieces of configuration were restored >>>>> from >>>>> the previous server. However, it's possible some configuration was lost, >>>>> or >>>>> some package version was changed, or something like that. I'd appreciate >>>>> any pointers. >>>>> >>>>> While trying to debug this, I noticed a few things: >>>>> >>>>> CPU utilization on the machine is hovering around 60%, when there are >>>>> no jobs running. >>>>> >>>>> The top consumers of CPU on the machine are git commands run by the >>>>> GoCD server. At any given time there are about 8 instances of git >>>>> processes >>>>> like "git branch -r --contains <commit-sha>". Each of these git commands >>>>> can be using a whole CPU. >>>>> >>>>> Looking at things related to git, I noticed: >>>>> >>>>> The GoCD server logs in /var/log/go-server/go-server.log have a lot >>>>> these messages (referencing the different deployment branches): >>>>> >>>>> 2023-01-11 06:44:04,168 WARN [ThreadPoolTaskScheduler-7] >>>>> MaterialUpdateService:204 - [Material Update] Skipping update of material >>>>> GitMaterial{[email protected]:repo/app.git, branch='abranch', >>>>> shallowClone=true, submoduleFolder='null'} which has been in-progress >>>>> since >>>>> Wed Jan 11 06:43:04 UTC 2023 >>>>> >>>>> Using jstack to get thread dumps from the server, this seems to be the >>>>> java stack trace where those git processes are launched: >>>>> >>>>> "130@MessageListener for MaterialUpdateListener" #130 daemon prio=5 >>>>> os_prio=0 cpu=14618.25ms elapsed=100474.11s tid=0x00007fa2ded9dca0 >>>>> nid=0xea6f in Object.wait() [0x00007fa20c5ad000] >>>>> java.lang.Thread.State: WAITING (on object monitor) >>>>> at java.lang.Object.wait([email protected]/Native Method) >>>>> - waiting on <no object reference available> >>>>> at java.lang.Object.wait([email protected]/Unknown Source) >>>>> at java.lang.ProcessImpl.waitFor([email protected]/Unknown Source) >>>>> - locked <0x00000005370e58e0> (a java.lang.ProcessImpl) >>>>> at >>>>> com.thoughtworks.go.util.ProcessWrapper.waitForExit(ProcessWrapper.java:54) >>>>> at >>>>> com.thoughtworks.go.util.command.CommandLine.runOrBomb(CommandLine.java:354) >>>>> at >>>>> com.thoughtworks.go.util.command.CommandLine.runOrBomb(CommandLine.java:378) >>>>> at >>>>> com.thoughtworks.go.domain.materials.SCMCommand.runOrBomb(SCMCommand.java:38) >>>>> at >>>>> com.thoughtworks.go.domain.materials.git.GitCommand.containsRevisionInBranch(GitCommand.java:364) >>>>> at >>>>> com.thoughtworks.go.config.materials.git.GitMaterial.modificationsSince(GitMaterial.java:132) >>>>> at >>>>> com.thoughtworks.go.server.service.materials.GitPoller.modificationsSince(GitPoller.java:35) >>>>> at >>>>> com.thoughtworks.go.server.service.materials.GitPoller.modificationsSince(GitPoller.java:26) >>>>> at >>>>> com.thoughtworks.go.server.service.MaterialService.modificationsSince(MaterialService.java:134) >>>>> at >>>>> com.thoughtworks.go.server.materials.ScmMaterialUpdater.insertLatestOrNewModifications(ScmMaterialUpdater.java:56) >>>>> at >>>>> com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.insertLatestOrNewModifications(MaterialDatabaseUpdater.java:157) >>>>> at >>>>> com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterialWithNewRevisions(MaterialDatabaseUpdater.java:149) >>>>> at >>>>> com.thoughtworks.go.server.materials.MaterialDatabaseUpdater$2.doInTransaction(MaterialDatabaseUpdater.java:108) >>>>> at >>>>> com.thoughtworks.go.server.transaction.TransactionCallback.doWithExceptionHandling(TransactionCallback.java:23) >>>>> at >>>>> com.thoughtworks.go.server.transaction.TransactionTemplate.lambda$executeWithExceptionHandling$2(TransactionTemplate.java:43) >>>>> at >>>>> com.thoughtworks.go.server.transaction.TransactionTemplate$$Lambda$1842/0x00000008045df9c8.doInTransaction(Unknown >>>>> >>>>> Source) >>>>> at >>>>> org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133) >>>>> at >>>>> com.thoughtworks.go.server.transaction.TransactionTemplate.executeWithExceptionHandling(TransactionTemplate.java:40) >>>>> at >>>>> com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterial(MaterialDatabaseUpdater.java:105) >>>>> - locked <0x0000000537f34488> (a java.lang.String) >>>>> at >>>>> com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:64) >>>>> at >>>>> com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:32) >>>>> at >>>>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83) >>>>> at >>>>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63) >>>>> at java.lang.Thread.run([email protected]/Unknown Source) >>>>> >>>>> >>>>> >>>>> Any help is appreciated, thanks! >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "go-cd" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/go-cd/3c3fe87e-5639-4316-9dfa-bc3dcf68d901n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/go-cd/3c3fe87e-5639-4316-9dfa-bc3dcf68d901n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> >>>> >>>> -- >>>> >>>> Ashwanth Kumar / ashwanthkumar.in >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "go-cd" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/go-cd/CAD9m7CyPznfmbZ5wSd7yDSBEAibS9uuFexoL3DTjfbdj-Dm%3DDQ%40mail.gmail.com >>>> >>>> <https://groups.google.com/d/msgid/go-cd/CAD9m7CyPznfmbZ5wSd7yDSBEAibS9uuFexoL3DTjfbdj-Dm%3DDQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "go-cd" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/go-cd/0ef45ded-a2f2-4c39-964b-dd7c1b1fe8a8n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/go-cd/0ef45ded-a2f2-4c39-964b-dd7c1b1fe8a8n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/5ecdeedd-1afb-44c6-bb5c-dc74d88d3575n%40googlegroups.com.
