much slower query in production
Dear all, I am facing a much, much slower query in production than on my development computer using a restored production backup, and I don't understand why nor I see what I could do to speedup the query on production :/ Short data model explanation: one table stores tickets, one table stores multicards; tickets can belong to a multicard through a many-to-one relationship, or be independant from multicards. The query I am interested in, will be used to update a new column in the multicards table, containing the count of related tickets. On production: # EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from multicards; QUERY PLAN - Seq Scan on multicards (cost=0.00..1455177.30 rows=204548 width=12) (actual time=0.178..1694987.355 rows=204548 loops=1) SubPlan 1 -> Aggregate (cost=7.07..7.08 rows=1 width=8) (actual time=8.283..8.283 rows=1 loops=204548) -> Index Only Scan using tickets_multicard_uid on tickets (cost=0.43..7.05 rows=9 width=0) (actual time=1.350..8.280 rows=6 loops=204548) Index Cond: (multicard_uid = multicards.uid) Heap Fetches: 1174940 Planning Time: 1.220 ms Execution Time: 1695029.673 ms iostat in the middle of this execution: procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 1 479548 143652 79412 560929200 1986 116 999 1115 2 2 71 25 3 1 479548 134312 79412 561456400 2386 128 2832 3973 20 2 49 28 0 1 479548 142520 79428 561770800 1584 232 3440 3716 11 3 58 29 0 1 479548 161184 79024 559775600 1922 144 1249 1562 1 2 70 27 0 1 479548 161244 79048 560080400 1556 117 2138 3035 6 2 68 25 0 2 479548 158384 79048 560400800 1388 402 2970 4320 6 2 66 27 On my development computer: # EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from multicards; QUERY PLAN - Seq Scan on multicards (cost=0.00..538974.69 rows=204548 width=12) (actual time=0.055..451.691 rows=204548 loops=1) SubPlan 1 -> Aggregate (cost=2.57..2.58 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=204548) -> Index Only Scan using tickets_multicard_uid on tickets (cost=0.43..2.56 rows=7 width=0) (actual time=0.001..0.001 rows=6 loops=204548) Index Cond: (multicard_uid = multicards.uid) Heap Fetches: 0 Planning Time: 0.296 ms Execution Time: 456.677 ms The execution time ratio is a huge 3700. I guess the Heap Fetches difference is the most meaningful here; my understanding would be, that the index would easily fit in the shared_buffers after a few subselects, as configuration is 2GB shared_buffers and here's the index on disk size: # SELECT relname as index, reltuples as "rows estimate", pg_size_pretty(pg_table_size(quote_ident(relname))) as "on disk size" FROM pg_class, pg_namespace WHERE pg_namespace.oid = pg_class.relnamespace AND relkind = 'i' AND nspname = 'public' AND relname = 'tickets_multicard_uid'; index | rows estimate | on disk size ---+---+-- tickets_multicard_uid |7.2136e+06 | 161 MB Though it's not too clear for me what "heap fetches" are. It seems it might be the actual table data fetches (e.g. not an index fetch), but I don't really know why so many of them are needed here, and how to reduce that (if that's the explanation for the longer timing). It's true that production has constant activity, but not so much of it, load avg is typically about 0.5, most queries run very quickly, pg_stat_activity frequently reports no currently running query; here's iostat during normal production activity, for reference: procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 480712 151112 10144 566478411 3106900 9 1 88 3 0 0 480712 151312 10152 566487600 0 248 4911 7072 9 3 87 0 0 0 480720 165488 10072 56425640476 210 2019 2685 6 1 92 1 0 0 480720 165332
Re: much slower query in production
On Wed, Feb 26, 2020 at 05:17:21PM +0100, Guillaume Cottenceau wrote: > On production: > > # EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE > multicard_uid = multicards.uid) from multicards; > > QUERY PLAN > > - > Seq Scan on multicards (cost=0.00..1455177.30 rows=204548 width=12) (actual > time=0.178..1694987.355 rows=204548 loops=1) >SubPlan 1 > -> Aggregate (cost=7.07..7.08 rows=1 width=8) (actual > time=8.283..8.283 rows=1 loops=204548) >-> Index Only Scan using tickets_multicard_uid on tickets > (cost=0.43..7.05 rows=9 width=0) (actual time=1.350..8.280 rows=6 > loops=204548) > Index Cond: (multicard_uid = multicards.uid) > Heap Fetches: 1174940 > Planning Time: 1.220 ms > Execution Time: 1695029.673 ms > The execution time ratio is a huge 3700. I guess the Heap Fetches > difference is the most meaningful here; Yes, it's doing an "index only" scan, but not very effectively. Vacuum the tickets table to set relallvisible and see if that helps. If so, try to keep it better vacuumed with something like ALTER TABLE tickets SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005); -- Justin
Re: much slower query in production
By the way, I expect the time is cut in half while heap fetches stays similar because the index is now in OS cache on the second run and didn't need to be fetched from disk. Definitely need to check on vacuuming as Justin says. If you have a fairly active system, you would need to run this query many times in order to push other stuff out of shared_buffers and get this query to perform more like it does on dev. Do you have the option to re-write the query or is this generated by an ORM? You are forcing the looping as I read this query. If you aggregate before you join, then the system should be able to do a single scan of the index, aggregate, then join those relatively few rows to the multicards table records. SELECT transaction_uid, COALESCE( sub.count, 0 ) AS count FROM multicards LEFT JOIN (SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY multicard_uid ) AS sub ON sub.multicard_uid = multicards.uid;
Re: much slower query in production
Justin Pryzby writes: > On Wed, Feb 26, 2020 at 05:17:21PM +0100, Guillaume Cottenceau wrote: >> On production: >> >> # EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets >> WHERE multicard_uid = multicards.uid) from multicards; >> >> QUERY PLAN >> >> - >> Seq Scan on multicards (cost=0.00..1455177.30 rows=204548 width=12) >> (actual time=0.178..1694987.355 rows=204548 loops=1) >>SubPlan 1 >> -> Aggregate (cost=7.07..7.08 rows=1 width=8) (actual >> time=8.283..8.283 rows=1 loops=204548) >>-> Index Only Scan using tickets_multicard_uid on tickets >> (cost=0.43..7.05 rows=9 width=0) (actual time=1.350..8.280 rows=6 >> loops=204548) >> Index Cond: (multicard_uid = multicards.uid) >> Heap Fetches: 1174940 >> Planning Time: 1.220 ms >> Execution Time: 1695029.673 ms > >> The execution time ratio is a huge 3700. I guess the Heap Fetches >> difference is the most meaningful here; > > Yes, it's doing an "index only" scan, but not very effectively. > Vacuum the tickets table to set relallvisible and see if that helps. > > If so, try to keep it better vacuumed with something like > ALTER TABLE tickets SET (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005); Thanks for your reply! The effect is huge: # vacuum analyze tickets; VACUUM Time: 182850.756 ms (03:02.851) # EXPLAIN ANALYZE SELECT transaction_uid, (SELECT COUNT(*) FROM tickets WHERE multicard_uid = multicards.uid) from multicards; QUERY PLAN - Seq Scan on multicards (cost=0.00..947739.65 rows=204548 width=12) (actual time=15.579..5210.869 rows=204548 loops=1) SubPlan 1 -> Aggregate (cost=4.59..4.60 rows=1 width=8) (actual time=0.025..0.025 rows=1 loops=204548) -> Index Only Scan using tickets_multicard_uid on tickets (cost=0.43..4.57 rows=8 width=0) (actual time=0.022..0.024 rows=6 loops=204548) Index Cond: (multicard_uid = multicards.uid) Heap Fetches: 8 Planning Time: 71.469 ms Execution Time: 5223.408 ms (8 rows) Time: 5332.361 ms (00:05.332) (and subsequent executions are below 1 second) It is actually consistent with using a restored backup on the dev computer, as my understanding is this comes out without any garbage and like a perfectly vacuumed database. Btw do you have any hint as to how to perform timings using production data which are consistent with production? Backup/restore is maybe not the way to go, but rather a block device level copy? Since postgresql 8, I have to say I rely entirely on autovacuum, and did not notice it could really run too infrequently for the work and create such difference. I see in documentation a default autovacuum_vacuum_scale_factor = 0.2, is that something that is typically lowered globally, e.g. maybe on a fairly active system? I am worried that changing that configuration for that table to 0.005 would fix this query and similar ones, but later I might face the same situation on other tables. Or how would you elect tables for a lowered value configuration? Thanks! -- Guillaume Cottenceau
Re: much slower query in production
Michael Lewis writes: > By the way, I expect the time is cut in half while heap fetches stays similar > because the index is now in OS cache on the > second run and didn't need to be fetched from disk. Definitely need to check > on vacuuming as Justin says. If you have a fairly > active system, you would need to run this query many times in order to push > other stuff out of shared_buffers and get this > query to perform more like it does on dev. > > Do you have the option to re-write the query or is this generated by an ORM? > You are forcing the looping as I read this query. > If you aggregate before you join, then the system should be able to do a > single scan of the index, aggregate, then join those > relatively few rows to the multicards table records. > > SELECT transaction_uid, COALESCE( sub.count, 0 ) AS count FROM multicards > LEFT JOIN (SELECT multicard_uid, COUNT(*) AS count > FROM tickets GROUP BY multicard_uid ) AS sub ON sub.multicard_uid = > multicards.uid; Thanks for this hint! I always hit this fact that I never write good queries using explicit joins :/ Execution time (before vacuuming the table as adviced by Justin) down 38x to 44509ms using this query :) Real query was an UPDATE of the multicards table to set the count value. I rewrote this using your approach but I think I lack what coalesce did in your query, this would update only the rows where count >= 1 obviously: UPDATE multicards SET defacements = count FROM ( SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY multicard_uid ) AS sub WHERE uid = multicard_uid; Any hinted solution to do that in one pass? I could do a first pass setting defacements = 0, but that would produce more garbage :/ Thanks! -- Guillaume Cottenceau
Re: much slower query in production
> > UPDATE multicards >SET defacements = COALESCE( count, 0 ) > FROM ( SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY > multicard_uid ) AS sub > WHERE uid = multicard_uid OR multicard_uid is null; > I expect this should work. Not sure of performance of course.
Re: much slower query in production
Vacuum everything that you restored Sent from my iPhone > On Feb 26, 2020, at 1:19 PM, Michael Lewis wrote: > > >> UPDATE multicards >>SET defacements = COALESCE( count, 0 ) >> FROM ( SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY >> multicard_uid ) AS sub >> WHERE uid = multicard_uid OR multicard_uid is null; > > I expect this should work. Not sure of performance of course.
Re: much slower query in production
Michael Lewis writes: > UPDATE multicards > SET defacements = COALESCE( count, 0 ) > FROM ( SELECT multicard_uid, COUNT(*) AS count FROM tickets GROUP BY > multicard_uid ) AS sub > WHERE uid = multicard_uid OR multicard_uid is null; > > I expect this should work. Not sure of performance of course. This looked great but as it seems you suspected, it's very slow :/ I interrupted it after 5 minutes run on my dev computer. -- Guillaume Cottenceau
Re: much slower query in production
On Wed, Feb 26, 2020 at 11:17 AM Guillaume Cottenceau wrote: > Dear all, > > I am facing a much, much slower query in production than on my > development computer using a restored production backup, and I > don't understand why nor I see what I could do to speedup the > query on production :/ > You've already seen a VACUUM fixed it. This must have been a logical restore (from pg_dump), not a physical restore (from pg_basebackup for example) correct? A physical restore should have resulted in a database in the same state of vacuuming as its source was. A logical restore will not, and since it is an append-only process it will not trigger autovacuum to run, either. If you do a logical restore, probably the first should you do afterwards is a VACUUM ANALYZE. Cheers, Jeff
Re: much slower query in production
On Wed, Feb 26, 2020 at 1:02 PM Guillaume Cottenceau wrote: > > It is actually consistent with using a restored backup on the dev > computer, as my understanding is this comes out without any > garbage and like a perfectly vacuumed database. I think I got that backwards in my previous email. It is the dev that was restored, not the prod? But unless you went out of your way to vacuum dev, it would not be perfectly vacuumed. If it were a logical restore, it would be perfectly unvacuumed, and if a physical restore would be in the same state of vacuuming as the database it was cloned from. > Btw do you have > any hint as to how to perform timings using production data which > are consistent with production? Backup/restore is maybe not the > way to go, but rather a block device level copy? > block device copy seems like overkill, just using pg_basebackup should be good enough. > > Since postgresql 8, I have to say I rely entirely on autovacuum, > and did not notice it could really run too infrequently for the > work and create such difference. I see in documentation a default > autovacuum_vacuum_scale_factor = 0.2, is that something that is > typically lowered globally, e.g. maybe on a fairly active system? > I am worried that changing that configuration for that table to > 0.005 would fix this query and similar ones, but later I might > face the same situation on other tables. Or how would you elect > tables for a lowered value configuration? > The autovacuum system has never been redesigned with the needs of index-only-scans in mind. If I have a table for which index-only scans are important, I'd set autovacuum_vacuum_scale_factor = 0 and set autovacuum_vacuum_threshold to about 5% of the number of blocks in the table. There is no syntax to say '5% of the number of blocks in the table' so you have to compute it yourself and hardcode the result, which makes it unsuitable for a global setting. And this still only addresses UPDATE and DELETE operations, not INSERTs. If you have INSERT only or mostly table for which index-only-scans are important, you might need to set up cron jobs to do vacuuming. Cheers, Jeff
