Running into a strange issue that just popped up on a few servers in my
environment and was wondering if the community had any insight into as to
what could be causing the issue.
First, a bit of background. I am running Postgres 10.11 on Windows (but
have seen similar issue on a server running 11.6)
Windows Version:
Major Minor Build Revision
- - -
10 0 14393 0
I have the following query that was on average running in ~2ms suddenly
jump up to on average ~25ms. This query is called millions of time per day
and there were cases of the query taking 20-30 seconds. Below is the
explain analyze of one such example.
When seeing this issue, the server was under some CPU pressure but even
with that, I would not think it should get as slow as shown below as we are
using SSDs and none of the windows disk counters (IOPS, queue length) show
any value that would be of concern.
explain (analyze,buffers) SELECT
tabledata.uuid_id,tabledata.int_id,tabledata.timestamp_date,tabledata.int_otherid,tabledata.float_value,tabledata.int_otherid2,tabledata.int_otherid3,tabledata.int_rowver
FROM tabledata WHERE timestamp_date <= $1 AND int_otherid3 IN
($2,$3,$4,$5,$6,$7) AND tabledata.int_id=$8 ORDER BY timestamp_date DESC
LIMIT 1
QUERY PLAN
---
Limit (cost=0.71..139.67 rows=1 width=84) (actual
time=17719.076..17719.077 rows=1 loops=1)
Buffers: shared hit=12102 read=13259 written=111
-> Index Scan Backward using
ix_tabledata_intid_timestampdate_intotherid3_intotherid2 on tabledata
(cost=0.71..2112828.54 rows=15204 width=84) (actual
time=17719.056..17719.057 rows=1 loops=1)
Index Cond: ((int_id = 8149) AND (timestamp_date <=
'2020-02-24 03:05:00.013'::timestamp without time zone))
Filter: (int_otherid3 = ANY
('{3ad2b707-a068-42e8-b0f2-6c8570953760,4e1b1bfa-34e1-48df-8cf8-2b59caf076e2,00d394dd-c2f4-4f3a-a8d4-dc208dafa686,baa904a6-8302-4fa3-b8ae-8adce8fe4306,3c99d61b-21a1-42ea-92a8-3cc88d79f3f1,befe0f8b-5911-47b3-bfae-faa9f8b09d08}'::uuid[]))
Rows Removed by Filter: 91686
Buffers: shared hit=12102 read=13259 written=111
Planning time: 203.153 ms
Execution time: 17719.200 ms
(9 rows)
If I look at pg_stat_activity while the query is running all of the calls
to this query have the same wait event.
wait_event - DataFileRead
wait_event_type - IO
We took a perfview during the issue and below is the call stack from a
process running this query, two call paths are shown.
---
Name
ntoskrnl!ExpWaitForSpinLockExclusiveAndAcquire
+ ntoskrnl!MiChargeWsles
|+ ntoskrnl!MiObtainSystemCacheView
||+ ntoskrnl!MmMapViewInSystemCache
|| + ntoskrnl!CcGetVacbMiss
|| + ntoskrnl!CcGetVirtualAddress
|| + ntoskrnl!CcMapAndCopyFromCache
||+ ntoskrnl!CcCopyReadEx
|| + ntfs!NtfsCopyReadA
|| |+ fltmgr!FltpPerformFastIoCall
|| | + fltmgr!FltpPassThroughFastIo
|| | + fltmgr!FltpFastIoRead
|| | + ntoskrnl!NtReadFile
|| |+ ntdll!NtReadFile
|| | + kernelbase!ReadFile
|| | + msvcr120!_read_nolock
|| | + msvcr120!_read
|| |+ postgres!PathNameOpenFile
|| | + postgres!??mdclose
|| | + postgres!ScheduleBufferTagForWriteback
|| | + postgres!InitBufTable
|| |+ postgres!??PrefetchBuffer
|| ||+ postgres!index_getnext_tid
|| || + postgres!index_fetch_heap
|| || + postgres!ExecIndexEvalRuntimeKeys
|| || + postgres!ExecAssignScanProjectionInfoWithVarno
|| ||+ postgres!tupledesc_match
|| || + postgres!recompute_limits
|| || + postgres!CheckValidRowMarkRel
|| || + postgres!list_length
|| ||+ pg_stat_statements!pgss_ExecutorRun
|| || + postgres!PortalRunFetch
|| || + postgres!PortalStart
|| || + postgres!exec_bind_message
|| ||+ postgres!PostgresMain
|| || + postgres!BackendInitialize
|| || + postgres!ClosePostmasterPorts
|| || + postgres!main
|| ||+ postgres!_onexit
|| || + kernel32!BaseThreadInitThunk
|| || + ntdll!RtlUserThreadStart
Name
ntoskrnl!ExpWaitForSpinLockExclusiveAndAcquire
+ ntoskrnl!MiChargeWsles
|+ ntoskrnl!MiReleaseSystemCacheView
| + ntoskrnl!MmUnmapViewInSystemCache
| + ntoskrnl!CcUnmapVacb
| + ntoskrnl!CcUnmapVacbArray
|+ ntoskrnl!