Problem solved! Running git gc inside all of the git repos under 
/var/lib/go-server/pipelines/flyweight/ (as the go user) totally fixed 
this. Server CPU usage is back down to the expected single-digit range, and 
the git material updates are fast.

Now I know that running git gc in this case is safe (it's not safe for the 
config repo, per 
https://docs.gocd.org/current/advanced_usage/config_repo.html), and now I 
know what these flyweight directories are (it seems they are the git repos 
used for polling each distinct git material, though I haven't yet found 
documentation to confirm).

Next I'll probably look at adding git gc on these repos as a daily or 
weekly cron.

Thanks for your help and work on GoCD!

On Wednesday, January 11, 2023 at 7:08:37 PM UTC-8 Chad Wilson wrote:

> Interesting - those are large repos and do agree with your assessment 
> about it seeming CPU bound.
>
> GoCD does git gc repos, but that's possibly not the case for the 
> "flyweight" ones on the server which are used to monitor for changes, and 
> list out the changes between revisions inside GoCD. I believe you should 
> consider it safe to manually git gc all those repos (as long as the file 
> permissions aren't affected - do so as the correct user).
>
> On the other hand, with a fresh server, I'm surprised those clones would 
> have a large amount of garbage. I'm not a git internals expert, but 
> understood that such garbage accumulates over time rather than starting 
> that way. Did your old server perhaps have some old workaround 
> cron/supervisor job that git gced all of these manually I wonder to avoid 
> you getting into this state?
>
> You could also entirely remove those flyweights and let GoCD recreate them 
> to see if they get recreated in an "efficient" state without needing a git 
> gc. Perhaps test on a couple first. Obviously if you remove *all* of them 
> it will put a bit of load on your repository manager as GoCD/git re-clones 
> everything for a while.
>
> -Chad
>
> On Thu, Jan 12, 2023 at 2:50 AM Brandon V <[email protected]> wrote:
>
>> Ashwanth: looking at `iostat -k 2` I don't see any particularly high 
>> volume of disk io. The disk is 99% idle. The disk is an SSD, and it's the 
>> same kind that was used on our old GoCD server. The disk is an AWS EBS io2 
>> volume to be specific, and so I've used the AWS metrics to check for any 
>> signs of disk latency and the disk seems to be performing smoothly. 
>>
>> Chad: That's correct, we don't run agents on the GoCD server. For extra 
>> info about the rebuild methodology, we used the GoCD backup instructions as 
>> you linked to backup the old server, then restored from that backup on a 
>> new machine. Config files such as wrapper-properties.conf were restored, 
>> but it's theoretically possible that there were edits made in a way that 
>> wouldn't be captured in this backup/restore (e.g. a global git config 
>> file). Thanks for the extra clues about git material polling - I'll look 
>> into what config options are available.
>>
>> A bit more debugging:
>>
>> I would have to guess this git command is cpu-bound. Using `lsof -u go | 
>> grep git` to figure out what git is doing, it looks like the git commands 
>> are busy reading files like these:
>>
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-47a1b4f9c1abd793f7783d61b3719520e8ee174e.pack
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-caef4ebaebbc1fccca5a2d637996f72dce2babb4.pack
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-bf49db07ef11d2f696b8b933b3443c7f185fadd4.pack
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-7755613d3efe8fdfcd63bfc16a62a8194b508c74.idx
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-3fcca0d6c0dd729a6a650b87583ec5424ec7c103.idx
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-0c24d29c94e20367351971b219ded2455aa496d3.idx
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-bf49db07ef11d2f696b8b933b3443c7f185fadd4.idx
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/objects/pack/pack-caef4ebaebbc1fccca5a2d637996f72dce2babb4.idx
>>
>> /var/lib/go-server/pipelines/flyweight/97f4c8ef-1be5-49b6-a47d-05d8db2935b3/.git/packed-refs
>>
>> These look like git repos under /var/lib/go-server/pipelines/flyweight/ 
>> where GoCD is running these `git branch -r --contains <commit>` commands. I 
>> was able to copy one of these git repos and inspect it. Its size is 333M on 
>> disk, and has about 130000 commits. I confirmed that running `git branch -r 
>> --contains <commit>` with a recent commit id took about 30 seconds. Then I 
>> tried `git gc`. After that completed, the same `git branch -r --contains 
>> <commit>` took less than 1 second.
>>
>> On Wednesday, January 11, 2023 at 8:29:04 AM UTC-8 Chad Wilson wrote:
>>
>>> Yeah, I would start by investigating what is different about raw git 
>>> performance as Ashwanth alludes to. Just to clarify, you don't run agents 
>>> on the same machine as the server, do you?
>>>
>>> These are forked processes from GoCD, and the speed at which a given git 
>>> operation runs isn't really affected by GoCD in any relevant way I can 
>>> think of. If there is a difference in the performance of the location 
>>> where GoCD server is storing its temporary artifacts 
>>> <https://docs.gocd.org/current/installation/install/server/linux.html> 
>>> between new and old (/var/lib-go-server mount, normally), that might be 
>>> something to look at.
>>>
>>> The thread dumps you post are generally just indicative of the material 
>>> subsystem being very slow and not being able to get through the number 
>>> requiring material checks/polls in the allotted time. There are many posts 
>>> and such about slow performance around the place e.g, here 
>>> <https://github.com/gocd/gocd/issues/10480>, here 
>>> <https://github.com/gocd/gocd/issues/9588> and elsewhere in GitHub 
>>> going back a well - perhaps can mine the existing resources for ideas.
>>>
>>> As to why things appear to have degraded since your previous server 
>>> version, it's a bit difficult to tell. What was your rebuild methodology? 
>>> Did you follow 
>>> https://docs.gocd.org/current/advanced_usage/one_click_backup.html#restoring-gocd-using-backup?
>>>  
>>> Are you sure you restored the config folder (wrapper-properties.conf and 
>>> similar) and any overrides to GoCD memory allocation, system properties 
>>> etc? It's conceptually possible that your previous configuration had tuned 
>>> the material subsystem to either do more in parallel or to poll less 
>>> frequently and such, or that you had some global git config which somehow 
>>> affected git performance - but without more detail, it's probably just idle 
>>> speculation.
>>>
>>> -Chad
>>>
>>>
>>>
>>> On Wed, Jan 11, 2023 at 7:27 PM 'Ashwanth Kumar' via go-cd <
>>> [email protected]> wrote:
>>>
>>>> Can you check the avg. disk io on the machine?  Something like:
>>>>
>>>> $ iostat -k 2
>>>>
>>>> Also are you using SSDs or HDDs on the GoCD Server? If you're using 
>>>> HDDs see if you can run the same setup on SSDs to see if there is a 
>>>> difference.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> On Wed, 11 Jan 2023 at 12:28, Brandon V <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Our GoCD server is acting slow. We have >1000 pipelines that trigger 
>>>>> automatically on updates to various branches of a Git repository. For 
>>>>> each 
>>>>> branch that needs to be picked up by GoCD, we have a distinct git 
>>>>> material 
>>>>> with shallow clone set to true. The time to trigger from Git updates is 
>>>>> quite slow. The UI is also a bit sluggish in general.
>>>>>
>>>>> Environment info:
>>>>> Linux
>>>>> GoCD server 21.1.0
>>>>> PostgreSQL 12.8 database
>>>>>
>>>>> The server was recently recreated updating the OS from Ubuntu 16.04 to 
>>>>> Ubuntu 20.04. The most important pieces of configuration were restored 
>>>>> from 
>>>>> the previous server. However, it's possible some configuration was lost, 
>>>>> or 
>>>>> some package version was changed, or something like that. I'd appreciate 
>>>>> any pointers.
>>>>>
>>>>> While trying to debug this, I noticed a few things:
>>>>>
>>>>> CPU utilization on the machine is hovering around 60%, when there are 
>>>>> no jobs running.
>>>>>
>>>>> The top consumers of CPU on the machine are git commands run by the 
>>>>> GoCD server. At any given time there are about 8 instances of git 
>>>>> processes 
>>>>> like "git branch -r --contains <commit-sha>". Each of these git commands 
>>>>> can be using a whole CPU.
>>>>>
>>>>> Looking at things related to git, I noticed:
>>>>>
>>>>> The GoCD server logs in /var/log/go-server/go-server.log have a lot 
>>>>> these messages (referencing the different deployment branches):
>>>>>
>>>>> 2023-01-11 06:44:04,168 WARN  [ThreadPoolTaskScheduler-7] 
>>>>> MaterialUpdateService:204 - [Material Update] Skipping update of material 
>>>>> GitMaterial{[email protected]:repo/app.git, branch='abranch', 
>>>>> shallowClone=true, submoduleFolder='null'} which has been in-progress 
>>>>> since 
>>>>> Wed Jan 11 06:43:04 UTC 2023
>>>>>
>>>>> Using jstack to get thread dumps from the server, this seems to be the 
>>>>> java stack trace where those git processes are launched:
>>>>>
>>>>> "130@MessageListener for MaterialUpdateListener" #130 daemon prio=5 
>>>>> os_prio=0 cpu=14618.25ms elapsed=100474.11s tid=0x00007fa2ded9dca0 
>>>>> nid=0xea6f in Object.wait()  [0x00007fa20c5ad000]
>>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>>         at java.lang.Object.wait([email protected]/Native Method)
>>>>>         - waiting on <no object reference available>
>>>>>         at java.lang.Object.wait([email protected]/Unknown Source)
>>>>>         at java.lang.ProcessImpl.waitFor([email protected]/Unknown Source)
>>>>>         - locked <0x00000005370e58e0> (a java.lang.ProcessImpl)
>>>>>         at 
>>>>> com.thoughtworks.go.util.ProcessWrapper.waitForExit(ProcessWrapper.java:54)
>>>>>         at 
>>>>> com.thoughtworks.go.util.command.CommandLine.runOrBomb(CommandLine.java:354)
>>>>>         at 
>>>>> com.thoughtworks.go.util.command.CommandLine.runOrBomb(CommandLine.java:378)
>>>>>         at 
>>>>> com.thoughtworks.go.domain.materials.SCMCommand.runOrBomb(SCMCommand.java:38)
>>>>>         at 
>>>>> com.thoughtworks.go.domain.materials.git.GitCommand.containsRevisionInBranch(GitCommand.java:364)
>>>>>         at 
>>>>> com.thoughtworks.go.config.materials.git.GitMaterial.modificationsSince(GitMaterial.java:132)
>>>>>         at 
>>>>> com.thoughtworks.go.server.service.materials.GitPoller.modificationsSince(GitPoller.java:35)
>>>>>         at 
>>>>> com.thoughtworks.go.server.service.materials.GitPoller.modificationsSince(GitPoller.java:26)
>>>>>         at 
>>>>> com.thoughtworks.go.server.service.MaterialService.modificationsSince(MaterialService.java:134)
>>>>>         at 
>>>>> com.thoughtworks.go.server.materials.ScmMaterialUpdater.insertLatestOrNewModifications(ScmMaterialUpdater.java:56)
>>>>>         at 
>>>>> com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.insertLatestOrNewModifications(MaterialDatabaseUpdater.java:157)
>>>>>         at 
>>>>> com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterialWithNewRevisions(MaterialDatabaseUpdater.java:149)
>>>>>         at 
>>>>> com.thoughtworks.go.server.materials.MaterialDatabaseUpdater$2.doInTransaction(MaterialDatabaseUpdater.java:108)
>>>>>         at 
>>>>> com.thoughtworks.go.server.transaction.TransactionCallback.doWithExceptionHandling(TransactionCallback.java:23)
>>>>>         at 
>>>>> com.thoughtworks.go.server.transaction.TransactionTemplate.lambda$executeWithExceptionHandling$2(TransactionTemplate.java:43)
>>>>>         at 
>>>>> com.thoughtworks.go.server.transaction.TransactionTemplate$$Lambda$1842/0x00000008045df9c8.doInTransaction(Unknown
>>>>>  
>>>>> Source)
>>>>>         at 
>>>>> org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
>>>>>         at 
>>>>> com.thoughtworks.go.server.transaction.TransactionTemplate.executeWithExceptionHandling(TransactionTemplate.java:40)
>>>>>         at 
>>>>> com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterial(MaterialDatabaseUpdater.java:105)
>>>>>         - locked <0x0000000537f34488> (a java.lang.String)
>>>>>         at 
>>>>> com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:64)
>>>>>         at 
>>>>> com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:32)
>>>>>         at 
>>>>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:83)
>>>>>         at 
>>>>> com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:63)
>>>>>         at java.lang.Thread.run([email protected]/Unknown Source)
>>>>>
>>>>>
>>>>>
>>>>> Any help is appreciated, thanks!
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "go-cd" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/go-cd/3c3fe87e-5639-4316-9dfa-bc3dcf68d901n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/go-cd/3c3fe87e-5639-4316-9dfa-bc3dcf68d901n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> -- 
>>>>
>>>> Ashwanth Kumar / ashwanthkumar.in
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "go-cd" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/go-cd/CAD9m7CyPznfmbZ5wSd7yDSBEAibS9uuFexoL3DTjfbdj-Dm%3DDQ%40mail.gmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/go-cd/CAD9m7CyPznfmbZ5wSd7yDSBEAibS9uuFexoL3DTjfbdj-Dm%3DDQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "go-cd" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/go-cd/0ef45ded-a2f2-4c39-964b-dd7c1b1fe8a8n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/go-cd/0ef45ded-a2f2-4c39-964b-dd7c1b1fe8a8n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/go-cd/5ecdeedd-1afb-44c6-bb5c-dc74d88d3575n%40googlegroups.com.

Reply via email to