Re: Duplicate WHERE condition changes performance and plan
> I don't *think* we are using SSDs but I'll need to confirm that though. Confirmed we are not using SSDs but '10K RPM SAS in RAID-10.' I've also been hunt for other queries that show this behaviour too, and I've found one. The PG settings/versions will be different in this example due to the earlier example being for our internal CI/CD tool which is hosted a on local instance of PG. This example is directly from our production servers. Query C (slow):- SELECT COUNT(1) FROM proposal.proposal INNER JOIN proposal.note ON proposal.note.proposal_reference = proposal.proposal.reference WHERE 1 = 1 AND proposal.proposal.system_id = 11 AND proposal.proposal.legacy_organisation_id IN (6, 7, 11, 16, 18, 44, 200, 218, 233, 237, 259, 47) AND proposal.proposal.has_been_anonymised = false AND proposal.note.legacy_read_by IS NULL AND proposal.note.type_id IN (1, 4, 9) AND proposal.note.entry_time > '2020-04-01' AND proposal.note.entry_time < '2020-05-01'; Query D (fast):- SELECT COUNT(1) FROM proposal.proposal INNER JOIN proposal.note ON proposal.note.proposal_reference = proposal.proposal.reference WHERE 1 = 1 AND proposal.proposal.system_id = 11 AND proposal.proposal.legacy_organisation_id IN (6, 7, 11, 16, 18, 44, 200, 218, 233, 237, 259, 47) AND proposal.proposal.has_been_anonymised = false AND proposal.proposal.has_been_anonymised = false AND proposal.note.legacy_read_by IS NULL AND proposal.note.type_id IN (1, 4, 9) AND proposal.note.entry_time > '2020-04-01' AND proposal.note.entry_time < '2020-05-01'; The EXPLAIN ANALYZE for both queries can be found here:- Query C: https://explain.depesz.com/s/5Mbu Query D: https://explain.depesz.com/s/jVnH The table definitions (including the indexes) can be found here:- proposal.proposal: https://gist.github.com/indy-singh/6ccd86ff859e7cdad2ec1bf73a61445c proposal.note: https://gist.github.com/indy-singh/6c1f85ad15cb92e138447a91d8cf3ecb Data stats:- proposal.proposal has 10,324,779 rows and once the table specific conditions are applied there are 39,223 rows left. proposal.note has 28,97,698 rows and once the table specific conditions are applied there are 54,359 rows left. PG version: PostgreSQL 9.5.17 on x86_64-pc-linux-gnu (Debian 9.5.17-1.pgdg90+1), compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit -- SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='TABLE_NAME' Table metadata:- proposal.proposal: https://gist.github.com/indy-singh/24e7ec8f3d4e2c3ac73f724cea52f9de proposal.note: https://gist.github.com/indy-singh/104d6ec7ef8179461eb4f91c121615e0 Index Stats:- proposal.proposal: https://gist.github.com/indy-singh/1d41d15addb543bcdafc8641b9d7f036 proposal.note: https://gist.github.com/indy-singh/7a698dec98dd8ef2808345d1802e6b6a Last Vacuum:- proposal.proposal: Never proposal.note: 2020-04-17 15:10:57.256013+01 Last Analyze: proposal.proposal: Never proposal.note: 2020-04-07 11:48:49.689622+01 Server Configuration: https://gist.github.com/indy-singh/b19134873f266ee6ce2b9815504d130c Indy
Please help! Query jumps from 1s -> 4m
Hi, Hoping someone can help with this performance issue that's been driving a few of us crazy :-) Any guidance greatly appreciated. A description of what you are trying to achieve and what results you expect.: - I'd like to get an understanding of why the following query (presented in full, but there are specific parts that are confusing me) starts off taking ~second in duration but 'switches' to taking over 4 minutes. - we initially saw this behaviour for the exact same sql with a different index that resulted in an index scan. To try and fix the issue we've created an additional index with additional included fields so we now have Index Only Scans, but are still seeing the same problem. - execution plan is from auto_explain output when it took just over 4 minutes to execute. The time is shared ~equally between these two index-only scans. - There are no checkpoints occurring concurrently with this (based on "checkpoint starting" and "checkpoint complete" in logs) - bloat on the index is about 30% Segments of interest: 1. -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias1 (cost=0.56..17.25 rows=1 width=60) (actual time=110.539..123828.134 rows=67000 loops=1) Index Cond: (col20 = $2005) Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY ((ARRAY[$1004, ..., $2003])::text[]))) Rows Removed by Filter: 2662652 Heap Fetches: 6940 Buffers: shared hit=46619 read=42784 written=52 2. -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual time=142.855..122034.039 rows=67000 loops=1) Index Cond: (col20 = $1001) Filter: ((col8)::text = ANY ((ARRAY[$1, ..., $1000])::text[])) Rows Removed by Filter: 2662652 Heap Fetches: 6891 Buffers: shared hit=47062 read=42331 written=37 If I run the same queries now: Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias1 (cost=0.56..2549.69 rows=69 width=36) (actual time=1.017..1221.375 rows=67000 loops=1) Heap Fetches: 24 Buffers: shared hit=2849 read=2483 buffers do look different - but still, reading 42k doesn't seem like it would cause a delay of 4m? Actually, here's another example of segment 2 from logs. Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual time=36.559..120649.742 rows=65000 loops=1) Index Cond: (col20 = $1001) Filter: ((col8)::text = ANY ((ARRAY[$1, $1000]::text[])) Rows Removed by Filter: 2664256 Heap Fetches: 6306 Buffers: shared hit=87712 read=1507 One note: I've replaced table/column names (sorry, a requirement). Full subquery execution plan (i've stripped out the view materialization from row 14 onwards but left the header in): https://explain.depesz.com/s/vsdH Full Sql: SELECT subquery.id FROM ( SELECT table1alias1.id, table1alias1.uniqueid, table1alias1.col16 AS order_by FROM table1 AS table1alias1 LEFT OUTER JOIN ( SELECT inlinealias1.id, inlinealias1.uniqueid, inlinealias1.col4, inlinealias1.col5, inlinealias1.col6, inlinealias1.col7 FROM ( SELECT table2alias.id, table2alias.uniqueid, table2alias.col3, table2alias.col4, table2alias.col5, table2alias.col6, row_number() OVER (PARTITION BY table2alias.uniqueid ORDER BY table2alias.col13 DESC, table2alias.col3 DESC, table2alias.id DESC) AS rn FROM table2 AS table2alias JOIN ( SELECT DISTINCT table1alias2.uniqueid FROM table1 AS table1alias2 WHERE (table1alias2.col8 IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78, $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93, $94, $95, $96, $97, $98, $99, $100, $101, $102, $103, $104, $105, $106, $107, $108, $109, $110, $111, $112, $113, $114, $115, $116, $117, $118, $119, $120, $121, $122, $123, $124, $125, $126, $127, $128, $129, $130, $131, $132, $133, $134, $135, $136, $137, $138, $139, $140, $141, $142, $143, $144, $145, $146, $147, $148, $149, $150, $151, $152, $153, $154, $155, $156, $157, $158, $159, $160, $161, $162, $163, $164, $165, $166, $167, $168, $169, $170, $171, $172, $173, $174, $175, $176, $177, $178, $179, $180, $181, $182, $183, $184, $185, $186, $187, $188, $189, $190, $191, $192, $193, $194, $195, $196, $197, $198, $199, $2
Re: Please help! Query jumps from 1s -> 4m
On Sun, May 03, 2020 at 09:58:27AM +0100, James Thompson wrote: > Hi, > > Hoping someone can help with this performance issue that's been driving a > few of us crazy :-) Any guidance greatly appreciated. > > A description of what you are trying to achieve and what results you > expect.: > - I'd like to get an understanding of why the following query (presented > in full, but there are specific parts that are confusing me) starts off > taking ~second in duration but 'switches' to taking over 4 minutes. Does it "switch" abruptly or do you get progressively slower queries ? If it's abrupt following the 5th execution, I guess you're hitting this: https://www.postgresql.org/message-id/[email protected] https://www.postgresql.org/message-id/a737b7a37273e048b164557adef4a58b50fb8...@ntex2010i.host.magwien.gv.at > - we initially saw this behaviour for the exact same sql with a different > index that resulted in an index scan. To try and fix the issue we've > created an additional index with additional included fields so we now have > Index Only Scans, but are still seeing the same problem. > Segments of interest: > 1. -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on > table1 table1alias1 (cost=0.56..17.25 rows=1 width=60) (actual > time=110.539..123828.134 rows=67000 loops=1) > Index Cond: (col20 = $2005) > Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY > ((ARRAY[$1004, ..., $2003])::text[]))) > Rows Removed by Filter: 2662652 > Heap Fetches: 6940 > Buffers: shared hit=46619 read=42784 written=52 > If I run the same queries now: > Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 > table1alias1 (cost=0.56..2549.69 rows=69 width=36) > (actual time=1.017..1221.375 rows=67000 loops=1) > Heap Fetches: 24 > Buffers: shared hit=2849 read=2483 It looks to me like you're getting good performance following a vacuum, when Heap Fetches is low. So you'd want to run vacuum more often, like: | ALTER TABLE table1 SET (autovacuum_vacuum_scale_factor=0.005). But maybe I've missed something - you showed the bad query plan, but not the good one, and I wonder if they may be subtly different, and that's maybe masked by the replaced identifiers. -- Justin
600 million rows of data. Bad hardware or need partitioning?
I have created the following table to duplicate my performance numbers, but I have simplified the table for this question. I'm running PostgreSQL 12 on the following hardware. Dual Xeon Quad-Core E5320 1.86GHz 4GB of RAM The table structure is id uuid address_api_url text check_timestamp timestamp with time zone address text Indexes: "new_table_pkey" PRIMARY KEY, btree (id) "test_table_check_timestamp_idx" btree (check_timestamp) Right now the table has 100 Million rows, but I expect it to reach about 600-700 Million. I am faced with slow updates/inserts and the issue is caused by the indices as it gets updates on each insert/update, If I remove the indexes the insert performance remains excellent with millions of rows. To demonstrate the update performance I have constructed the following query which updates the timestamp of 1 rows UPDATE test_table set check_timestamp = now() FROM(select id from test_table limit 1) AS subquery where test_table.id = subquery.id; That update took about 1 minute and 44 seconds Time: 104254.392 ms (01:44.254) Below is the EXPLAIN ANALYZE EXPLAIN ANALYZE UPDATE test_table set check_timestamp = now() FROM(select id from test_table limit 1) AS subquery where test_table.id = subquery.id; QUERY PLAN - Update on test_table (cost=0.57..28234.86 rows=1 width=160) (actual time=102081.905..102081.905 rows=0 loops=1) -> Nested Loop (cost=0.57..28234.86 rows=1 width=160) (actual time=32.286..101678.652 rows=1 loops=1) -> Subquery Scan on subquery (cost=0.00..514.96 rows=1 width=56) (actual time=0.048..45.127 rows=1 loops=1) -> Limit (cost=0.00..414.96 rows=1 width=16) (actual time=0.042..26.319 rows=1 loops=1) -> Seq Scan on test_table test_table_1 (cost=0.00..4199520.04 rows=101204004 width=16) (actual time=0.040..21.542 rows=1 loops=1) -> Index Scan using new_table_pkey on test_table (cost=0.57..2.77 rows=1 width=92) (actual time=10.160..10.160 rows=1 loops=1) Index Cond: (id = subquery.id) Planning Time: 0.319 ms Execution Time: 102081.967 ms (9 rows) Time: 102122.421 ms (01:42.122) with the right hardware can one partition handle 600 millions of rows with good insert/update performance? if so what kind of hardware should I be looking at? Or would I need to create partitions? I'd like to hear some recommendations.
Re: 600 million rows of data. Bad hardware or need partitioning?
What kinds of storage (ssd or old 5400 rpm)? What else is this machine running? What configs have been customized such as work_mem or random_page_cost?
Re: 600 million rows of data. Bad hardware or need partitioning?
On Sun, May 3, 2020 at 11:46 PM Michael Lewis wrote: > > What kinds of storage (ssd or old 5400 rpm)? What else is this machine > running? Not an SSD, but an old 1TB 7200 RPM HDD > What configs have been customized such as work_mem or random_page_cost? work_mem = 2403kB random_page_cost = 1.1
Re: 600 million rows of data. Bad hardware or need partitioning?
On Mon, 4 May 2020 at 15:52, Arya F wrote: > > On Sun, May 3, 2020 at 11:46 PM Michael Lewis wrote: > > > > What kinds of storage (ssd or old 5400 rpm)? What else is this machine > > running? > > Not an SSD, but an old 1TB 7200 RPM HDD > > > What configs have been customized such as work_mem or random_page_cost? > > work_mem = 2403kB > random_page_cost = 1.1 How long does it take if you first do: SET enable_nestloop TO off; If you find it's faster then you most likely have random_page_cost set unrealistically low. In fact, I'd say it's very unlikely that a nested loop join will be a win in this case when random pages must be read from a mechanical disk, but by all means, try disabling it with the above command and see for yourself. If you set random_page_cost so low to solve some other performance problem, then you may wish to look at the effective_cache_size setting. Having that set to something realistic should allow indexes to be used more in situations where they're likely to not require as much random I/O from the disk. David
