Re: Window partial fetch optimization
On Tue, May 3, 2022 at 2:11 PM Levi Aul wrote: > I have a “temporal table” — a table where there are multiple “versions” of > entities, with each version having a distinct timestamp: > CREATE TABLE contract_balance_updates ( > block_id bigint NOT NULL, > block_signed_at timestamp(0) without time zone NOT NULL, > contract_address bytea NOT NULL, > holder_address bytea NOT NULL, > start_block_height bigint NOT NULL, > balance numeric NOT NULL > ) PARTITION BY RANGE (block_signed_at); > > -- one for each partition (applied by pg_partman from a template) > CREATE UNIQUE INDEX contract_balance_updates_pkey > ON contract_balance_updates( > holder_address bytea_ops, > contract_address bytea_ops, > start_block_height int8_ops DESC > ); > How does pg_partman deal with the fact that a unique index on a partitioned table must contain the partitioning key? It should be noted that your 3 queries don't return the same thing. The last one returns columns holder_address, contract_address, and balance, while the first returns all columns in the table. If you were to make the first query return just the three columns holder_address, contract_address, and balance and build a suitable index, then you could get it to use an index-only scan. This should be similar to (but probably faster than) your 3rd query, without all the kerfuffle of extra scans and dummy syntax. The index needed would be: (holder_address bytea_ops, contract_address bytea_ops, start_block_height, balance); Note that in theory it could do a better job of using the index you already have. It could compute the row_number using only the data available in the index, then go fetch the table tuple for just the rows which pass the row_number filter. But it just isn't smart enough to do that. (By separating the WHERE clause from the select list into different queries, that is essentially what your third query is tricking it into doing) Cheers, Jeff
Why is there a Sort after an Index Only Scan?
Quick(?) question... why is there a Sort node after an Index Only Scan? Shouldn't the index already spit out sorted tuples? CREATE INDEX ON orders_test(shipping_date, order_id); EXPLAIN ANALYZE SELECT FROM orders_test WHERE TRUE AND shipping_date >= '2022-05-01' AND shipping_date <= '2022-05-01' ORDER BY order_id LIMIT 50; Limit (cost=8.46..8.46 rows=1 width=4) (actual time=0.031..0.032 rows=0 loops=1) -> Sort (cost=8.46..8.46 rows=1 width=4) (actual time=0.025..0.025 rows=0 loops=1) Sort Key: order_id Sort Method: quicksort Memory: 25kB -> Index Only Scan using orders_test_shipping_date_order_id_idx on orders_test (cost=0.43..8.45 rows=1 width=4) (actual time=0.017..0.018 rows=0 loops=1) Index Cond: ((shipping_date >= '2022-05-01'::date) AND (shipping_date <= '2022-05-01'::date)) Heap Fetches: 0 Fiddle: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=7a3bc2421b5de5a2a377bd39b78d1c d5 I am actually asking because when I skew the distribution a little and repeat the query I get a rather unfortunate plan: INSERT INTO orders_test SELECT generate_series(201, 210), '2022-05-01'; ANALYZE orders_test; Limit (cost=0.43..37.05 rows=50 width=4) (actual time=1186.565..1186.593 rows=50 loops=1) -> Index Scan using orders_test_pkey on orders_test (cost=0.43..74336.43 rows=101502 width=4) (actual time=1186.562..1186.584 rows=50 loops=1) Filter: ((shipping_date >= '2022-05-01'::date) AND (shipping_date <= '2022-05-01'::date)) Rows Removed by Filter: 200 Postgres here uses the primary key to get the sort order, so I'm wondering if there is anything about my index that precludes its use for ORDER BY.
Re: Why is there a Sort after an Index Only Scan?
On Wed, May 4, 2022 at 7:15 PM André Hänsel wrote: > Quick(?) question... why is there a Sort node after an Index Only Scan? > Shouldn't the index already spit out sorted tuples? > > CREATE INDEX ON orders_test(shipping_date, order_id); > > EXPLAIN ANALYZE SELECT > FROM orders_test > WHERE TRUE > AND shipping_date >= '2022-05-01' > AND shipping_date <= '2022-05-01' > ORDER BY order_id > LIMIT 50; > They are sorted by order_id only within sets of the same shipping_date, which is not good enough. (It would be good enough if it were smart enough to know that there is only one possible shipping date to satisfy your weird range condition.) Cheers, Jeff
Re: Why is there a Sort after an Index Only Scan?
On Thu, 5 May 2022 at 11:15, André Hänsel wrote: > > Quick(?) question... why is there a Sort node after an Index Only Scan? > Shouldn't the index already spit out sorted tuples? > > CREATE INDEX ON orders_test(shipping_date, order_id); > > EXPLAIN ANALYZE SELECT > FROM orders_test > WHERE TRUE > AND shipping_date >= '2022-05-01' > AND shipping_date <= '2022-05-01' > ORDER BY order_id > LIMIT 50; Unfortunately, the query planner is not quite smart enough to realise that your shipping_date clauses can only match a single value. There's quite a bit more we could do with the planner's EquivalanceClasses. There is a patch around to help improve things in this area but it requires some more infrastructure to make it more practical to do from a performance standpoint in the planner. You'll get the plan you want if you requite the query and replace your date range with shipping_date = '2022-05-01'. Your use of WHERE TRUE indicates to me that you might be building this query in an application already, so maybe you can just tweak that application to test if the start and end dates are the same and use equality when they are. David [1] https://commitfest.postgresql.org/38/3524/
RE: Why is there a Sort after an Index Only Scan?
> They are sorted by order_id only within sets of the same shipping_date, which > is not good enough. Ah yes, that totally makes sense for the general case. > so maybe you can just tweak that application to test if the start and end > dates are the same and use equality when they are. I definitely can. But now I have a followup question, which probably should have been a separate question all along. I have modified the example a bit to have a more natural date distribution and I got rid of the weird shipping_date condition and actually made it different dates, so the index order is out of the picture. I also added some statistics so Postgres knows about the relationship between the columns. https://dbfiddle.uk/?rdbms=postgres_14&fiddle=54c7774432e896e3c0e89d8084c4b194 After inserting more rows, Postgres still chooses a scan on the primary key instead of using the index. Limit (cost=0.43..296.63 rows=50 width=4) (actual time=1052.692..1052.737 rows=50 loops=1) -> Index Scan using orders_test_pkey on orders_test (cost=0.43..71149.43 rows=12010 width=4) (actual time=1052.690..1052.728 rows=50 loops=1) Filter: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= '2022-05-01'::date)) Rows Removed by Filter: 1998734 By setting the CPU costs to 0 (last block in the fiddle) I can force the use of the previous plan and as I already suspected it is much better: Limit (cost=101.00..101.00 rows=50 width=4) (actual time=4.835..4.843 rows=50 loops=1) -> Sort (cost=101.00..101.00 rows=12010 width=4) (actual time=4.833..4.837 rows=50 loops=1) Sort Key: order_id Sort Method: top-N heapsort Memory: 27kB -> Index Scan using orders_test_shipping_date_idx on orders_test (cost=0.00..101.00 rows=12010 width=4) (actual time=0.026..3.339 rows=11266 loops=1) Index Cond: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= '2022-05-01'::date)) Is it overestimating the cost of the sorting?
Re: Why is there a Sort after an Index Only Scan?
=?UTF-8?Q?Andr=C3=A9_H=C3=A4nsel?= writes: > Limit (cost=0.43..296.63 rows=50 width=4) (actual time=1052.692..1052.737 > rows=50 loops=1) > -> Index Scan using orders_test_pkey on orders_test (cost=0.43..71149.43 > rows=12010 width=4) (actual time=1052.690..1052.728 rows=50 loops=1) > Filter: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= > '2022-05-01'::date)) > Rows Removed by Filter: 1998734 > Is it overestimating the cost of the sorting? No, but it's guessing it will hit 50 rows that satisfy the filter before it's gone very far in this index. If the shipping date and pkey are correlated in the wrong direction, that could be a very overoptimistic guess. I don't think we have adequate stats yet to detect this sort of problem. regards, tom lane
