Re: much slower query in production

2020-02-27 Thread Guillaume Cottenceau
Jeff Janes  writes:

> On Wed, Feb 26, 2020 at 1:02 PM Guillaume Cottenceau  wrote:
>
>  It is actually consistent with using a restored backup on the dev
>  computer, as my understanding is this comes out without any
>  garbage and like a perfectly vacuumed database.
>
> I think I got that backwards in my previous email. It is the
> dev that was restored, not the prod? But unless you went out of

Yes (prod was also restored not so long ago, when updating to pg
11.5 tho).

> your way to vacuum dev, it would not be perfectly vacuumed. If
> it were a logical restore, it would be perfectly unvacuumed,
> and if a physical restore would be in the same state of
> vacuuming as the database it was cloned from.
>
>  Btw do you have
>  any hint as to how to perform timings using production data which
>  are consistent with production? Backup/restore is maybe not the
>  way to go, but rather a block device level copy?
>
> block device copy seems like overkill, just using pg_basebackup should be 
> good enough.
>
>  Since postgresql 8, I have to say I rely entirely on autovacuum,
>  and did not notice it could really run too infrequently for the
>  work and create such difference. I see in documentation a default
>  autovacuum_vacuum_scale_factor = 0.2, is that something that is
>  typically lowered globally, e.g. maybe on a fairly active system?
>  I am worried that changing that configuration for that table to
>  0.005 would fix this query and similar ones, but later I might
>  face the same situation on other tables. Or how would you elect
>  tables for a lowered value configuration?
>
> The autovacuum system has never been redesigned with the needs of 
> index-only-scans in mind. If I have a table for which
> index-only scans are important, I'd set autovacuum_vacuum_scale_factor = 0 
> and set autovacuum_vacuum_threshold to about 5% of
> the number of blocks in the table. There is no syntax to say '5% of the 
> number of blocks in the table' so you have to compute
> it yourself and hardcode the result, which makes it unsuitable for a global 
> setting. And this still only addresses UPDATE and

It seems also difficult for us as this table grows over time (and
is trimmed only infrequently).

> DELETE operations, not INSERTs. If you have INSERT only or mostly table for 
> which index-only-scans are important, you might
> need to set up cron jobs to do vacuuming.

Thanks!

-- 
Guillaume Cottenceau




Many DataFileRead - IO waits

2020-02-27 Thread Ben Snaidero
Running into a strange issue that just popped up on a few servers in my
environment and was wondering if the community had any insight into as to
what could be causing the issue.

First, a bit of background. I am running Postgres 10.11 on Windows (but
have seen similar issue on a server running 11.6)
Windows Version:
Major  Minor  Build  Revision
-  -  -  
10 0  14393  0

I have the following query that was on average running in ~2ms suddenly
jump up to on average ~25ms.  This query is called millions of time per day
and there were cases of the query taking 20-30 seconds.  Below is the
explain analyze of one such example.
When seeing this issue, the server was under some CPU pressure but even
with that, I would not think it should get as slow as shown below as we are
using SSDs and none of the windows disk counters (IOPS, queue length) show
any value that would be of concern.

explain (analyze,buffers) SELECT
tabledata.uuid_id,tabledata.int_id,tabledata.timestamp_date,tabledata.int_otherid,tabledata.float_value,tabledata.int_otherid2,tabledata.int_otherid3,tabledata.int_rowver
FROM tabledata WHERE timestamp_date <= $1 AND int_otherid3 IN
($2,$3,$4,$5,$6,$7) AND tabledata.int_id=$8 ORDER BY timestamp_date DESC
LIMIT 1

QUERY PLAN


---
 Limit  (cost=0.71..139.67 rows=1 width=84) (actual
time=17719.076..17719.077 rows=1 loops=1)
   Buffers: shared hit=12102 read=13259 written=111
   ->  Index Scan Backward using
ix_tabledata_intid_timestampdate_intotherid3_intotherid2 on tabledata
 (cost=0.71..2112828.54 rows=15204 width=84) (actual
time=17719.056..17719.057 rows=1 loops=1)
 Index Cond: ((int_id = 8149) AND (timestamp_date <=
'2020-02-24 03:05:00.013'::timestamp without time zone))
 Filter: (int_otherid3 = ANY
('{3ad2b707-a068-42e8-b0f2-6c8570953760,4e1b1bfa-34e1-48df-8cf8-2b59caf076e2,00d394dd-c2f4-4f3a-a8d4-dc208dafa686,baa904a6-8302-4fa3-b8ae-8adce8fe4306,3c99d61b-21a1-42ea-92a8-3cc88d79f3f1,befe0f8b-5911-47b3-bfae-faa9f8b09d08}'::uuid[]))
 Rows Removed by Filter: 91686
 Buffers: shared hit=12102 read=13259 written=111
 Planning time: 203.153 ms
 Execution time: 17719.200 ms
(9 rows)

If I look at pg_stat_activity while the query is running all of the calls
to this query have the same wait event.
wait_event - DataFileRead
wait_event_type - IO

We took a perfview during the issue and below is the call stack from a
process running this query, two call paths are shown.
---

Name
 ntoskrnl!ExpWaitForSpinLockExclusiveAndAcquire
+ ntoskrnl!MiChargeWsles
|+ ntoskrnl!MiObtainSystemCacheView
||+ ntoskrnl!MmMapViewInSystemCache
|| + ntoskrnl!CcGetVacbMiss
||  + ntoskrnl!CcGetVirtualAddress
||   + ntoskrnl!CcMapAndCopyFromCache
||+ ntoskrnl!CcCopyReadEx
|| + ntfs!NtfsCopyReadA
|| |+ fltmgr!FltpPerformFastIoCall
|| | + fltmgr!FltpPassThroughFastIo
|| |  + fltmgr!FltpFastIoRead
|| |   + ntoskrnl!NtReadFile
|| |+ ntdll!NtReadFile
|| | + kernelbase!ReadFile
|| |  + msvcr120!_read_nolock
|| |   + msvcr120!_read
|| |+ postgres!PathNameOpenFile
|| | + postgres!??mdclose
|| |  + postgres!ScheduleBufferTagForWriteback
|| |   + postgres!InitBufTable
|| |+ postgres!??PrefetchBuffer
|| ||+ postgres!index_getnext_tid
|| || + postgres!index_fetch_heap
|| ||  + postgres!ExecIndexEvalRuntimeKeys
|| ||   + postgres!ExecAssignScanProjectionInfoWithVarno
|| ||+ postgres!tupledesc_match
|| || + postgres!recompute_limits
|| ||  + postgres!CheckValidRowMarkRel
|| ||   + postgres!list_length
|| ||+ pg_stat_statements!pgss_ExecutorRun
|| || + postgres!PortalRunFetch
|| ||  + postgres!PortalStart
|| ||   + postgres!exec_bind_message
|| ||+ postgres!PostgresMain
|| || + postgres!BackendInitialize
|| ||  + postgres!ClosePostmasterPorts
|| ||   + postgres!main
|| ||+ postgres!_onexit
|| || + kernel32!BaseThreadInitThunk
|| ||  + ntdll!RtlUserThreadStart


Name
 ntoskrnl!ExpWaitForSpinLockExclusiveAndAcquire
+ ntoskrnl!MiChargeWsles
|+ ntoskrnl!MiReleaseSystemCacheView
| + ntoskrnl!MmUnmapViewInSystemCache
|  + ntoskrnl!CcUnmapVacb
|   + ntoskrnl!CcUnmapVacbArray
|+ ntoskrnl!

Re: Many DataFileRead - IO waits

2020-02-27 Thread Michael Lewis
How big is ix_tabledata_intid_timestampdate_intotherid3_intotherid2 on
disk? If you create another index with same fields, how much space does it
take? Real question- are you vacuuming aggressively enough for your
workload? Your index name seems to indicate that intotherid3 would be the
third key, and yet the planner chose not to scan that deep and instead
filtered after it found the relevant tuples based on intid and
timestampdate. That seems peculiar to me.

The documentation discourages multi-column indexes because they have
limited application unless the same fields are always used. Personally, I
don't love reviewing the stats of indexscans or how many tuples were
fetched and having to guess how deeply the index was scanned for the
various queries involved.

I'd wonder if an index on only intid_timestampdate would be both much
smaller and also have a more right-leaning pattern of information being
added and accessed in terms of keeping frequently needing blocks in shared
buffers.

As a side note, that planning time seems high to me for such a simple
query. Have you increased default_statistics_target significantly perhaps?

>