This is an odd one. For reference, this is a customer machine, Windows
server 2016, compiled with go1.20.11. The application just hangs after a
number of days; windows minidump reveals that most threads are doing this:
Goroutine 462 - User: unicode/utf16/utf16.go:106 unicode/utf16.Decode
(0xe6e391) [semacquire]
0 0x0000000000e2d116 in runtime.gopark
at runtime/proc.go:382
1 0x0000000000e3df5c in runtime.goparkunlock
at runtime/proc.go:387
2 0x0000000000e3df5c in runtime.semacquire1
at runtime/sema.go:160
3 0x0000000000e0ac2f in runtime.semacquire
at runtime/sema.go:111
4 0x0000000000e0ac2f in runtime.gcMarkDone
at runtime/mgc.go:787
5 0x0000000000128c10 in ???
at ?:-1
6 0x0000000000dfe7da in runtime.deductAssistCredit
at runtime/malloc.go:1217
7 0x0000000000dfdff0 in runtime.mallocgc
at runtime/malloc.go:932
8 0x0000000000e3f972 in runtime.makeslice
at runtime/slice.go:103
9 0x0000000000e6e391 in unicode/utf16.Decode
at unicode/utf16/utf16.go:106
10 0x0000000000e72a7b in syscall.UTF16ToString
at syscall/syscall_windows.go:63
11 0x0000000000eb7a67 in os.(*File).readdir
at os/dir_windows.go:43
12 0x0000000000eb72c5 in os.(*File).Readdirnames
at os/dir.go:70
13 0x0000000000fb623a in path/filepath.glob
at path/filepath/match.go:346
14 0x0000000000fb5ea5 in path/filepath.globWithLimit
at path/filepath/match.go:273
15 0x00000000031de255 in path/filepath.Glob
at path/filepath/match.go:243
Multiple threads, all waiting on a semaphore inside mallocgc. The actual
reason we're waiting for memory changes, obviously (sometimes a string
method, or hashmap allocation, or logging call...). This behavior has
remained consistent across multiple hangs across a number of weeks. After a
bit of digging into the minidump, this (I think) is the thread that's
holding the semaphore(s), at least based on the line numbers:
Goroutine 37 - User: :0 ??? (0x7ffb0e056974) (thread 5628)
0 0x00007ffb0e056974 in ???
at ?:-1
1 0x0000000000e5c5a0 in runtime.systemstack_switch
at runtime/asm_amd64.s:463
2 0x0000000000e0ade5 in runtime.gcMarkDone
at runtime/mgc.go:855
3 0x0000000000128c10 in ???
at ?:-1
4 0x0000000000e5e881 in runtime.goexit
at runtime/asm_amd64.s:1598
Note that the precise point in gcMarkDone can change. I have another
minidump showing a hang at a different point in the same method:
Goroutine 19 - Go: :0 ??? (0x208351516f8) (thread 7164) [unknown wait
reason 30]
0 0x00007ffe66e66974 in ???
at ?:-1
1 0x0000000000e9c5a0 in runtime.systemstack_switch
at runtime/asm_amd64.s:463
2 0x0000000000e4acff in runtime.gcMarkDone
at runtime/mgc.go:807
3 0x0000000000128c10 in ???
at ?:-1
4 0x0000000000e9e881 in runtime.goexit
at runtime/asm_amd64.s:1598
The commonality between these treads appears to be the `stacktrace()`
method.
Does anyone have any ideas for how to further debug this? Has anyone seen
anything like this? I've never seen anything like this before, and I can't
really reproduce it, as the only behavior is "let the application sit and
run for a few days." The only vaguely similar issue I've found is this,
which mentions interference from AV software:
https://github.com/golang/go/issues/52178.
Right now I'm waiting for the results of a gctrace log, as well as setting
GODEBUG=asyncpreemptoff=1, but other than that, I'm a bit out of ideas.
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/9c78ef6f-0190-4981-ab1f-ed18ad2f1080n%40googlegroups.com.