Delete tables difference involves seq scan
Hello. I want to remove rows from first table, that exist in second (equality is done using PK). However I experience seq scan on second table, which counters my intuition - I think it should be index-only. Because tables are large, performance of query is very bad. However I got mixed results when trying to reproduce this behavior on syntetic tables. Here I'll show 3 different plans, which I got for the same query. 1. Setup is: --- create table diff (id uuid constraint diff_pkey primary key); create table origin (id uuid constraint origin_pkey primary key); --- The query generates correct plan, which performs only index scans: explain delete from origin where exists (select id from diff where origin.id = diff.id); QUERY PLAN --- Delete on origin (cost=0.30..105.56 rows=1850 width=12) -> Merge Semi Join (cost=0.30..105.56 rows=1850 width=12) Merge Cond: (origin.id = diff.id) -> Index Scan using origin_pkey on origin (cost=0.15..38.90 rows=1850 width=22) -> Index Scan using diff_pkey on diff (cost=0.15..38.90 rows=1850 width=22) (5 rows) 2. Setup is: create table origin (id uuid constraint origin_pkey primary key, data jsonb); create table diff (id uuid constraint diff_pkey primary key, data jsonb); The query generates plan with two seq scans: explain delete from origin where exists (select id from diff where origin.id = diff.id); QUERY PLAN --- Delete on origin (cost=34.08..69.49 rows=1070 width=12) -> Hash Semi Join (cost=34.08..69.49 rows=1070 width=12) Hash Cond: (origin.id = diff.id) -> Seq Scan on origin (cost=0.00..20.70 rows=1070 width=22) -> Hash (cost=20.70..20.70 rows=1070 width=22) -> Seq Scan on diff (cost=0.00..20.70 rows=1070 width=22) (6 rows) 3. My real `origin` table has 26 fields and 800 billion rows, real `diff` table has 12 million rows and the query generates plan with nested loop and seq scan on `diff` table: explain delete from drug_refills origin where exists (select id from drug_refills_diff diff where origin.id = diff.id); QUERY PLAN -- Delete on drug_refills origin (cost=0.57..22049570.11 rows=11975161 width=12) -> Nested Loop (cost=0.57..22049570.11 rows=11975161 width=12) -> Seq Scan on drug_refills_diff diff (cost=0.00..720405.61 rows=11975161 width=22) -> Index Scan using drug_refills_pkey on drug_refills origin (cost=0.57..1.77 rows=1 width=22) Index Cond: (id = diff.id) (5 rows) I have run ANALYZE on both tables, but it didn't help. Here are column types in origin and diff (same schema), if that matters: uuid timestamp with time zone timestamp with time zone character varying(255) character varying(255) character varying(1024) numeric(10,4) integer numeric(14,8) numeric(14,8) numeric(14,8) numeric(14,8) numeric(14,8) character varying(16) character varying(16) character varying(16) character varying(16) character varying(16) character varying(16) date jsonb text[] uuid uuid uuid uuid
Re: Delete tables difference involves seq scan
Oh, sorry, this happens on Postgresql 9.6.6. I've checked that on Postgresql 10.0 query plan from setup (1) now uses two seq scans, like in setup (2). 2017-12-01 15:03 GMT+02:00 Danylo Hlynskyi : > Hello. I want to remove rows from first table, that exist in second > (equality is done using PK). However I experience seq scan on second table, > which counters my intuition - I think it should be index-only. Because > tables are large, performance of query is very bad. > > However I got mixed results when trying to reproduce this behavior on > syntetic tables. Here I'll show 3 different plans, which I got for the same > query. > > 1. Setup is: > --- > create table diff (id uuid constraint diff_pkey primary key); > create table origin (id uuid constraint origin_pkey primary key); > --- > > The query generates correct plan, which performs only index scans: > > explain delete from origin where exists (select id from diff where > origin.id = diff.id); > QUERY PLAN > > --- > Delete on origin (cost=0.30..105.56 rows=1850 width=12) >-> Merge Semi Join (cost=0.30..105.56 rows=1850 width=12) > Merge Cond: (origin.id = diff.id) > -> Index Scan using origin_pkey on origin (cost=0.15..38.90 > rows=1850 width=22) > -> Index Scan using diff_pkey on diff (cost=0.15..38.90 > rows=1850 width=22) > (5 rows) > > 2. Setup is: > > create table origin (id uuid constraint origin_pkey primary key, data > jsonb); > create table diff (id uuid constraint diff_pkey primary key, data jsonb); > > > The query generates plan with two seq scans: > > explain delete from origin where exists (select id from diff where > origin.id = diff.id); > QUERY PLAN > > --- > Delete on origin (cost=34.08..69.49 rows=1070 width=12) >-> Hash Semi Join (cost=34.08..69.49 rows=1070 width=12) > Hash Cond: (origin.id = diff.id) > -> Seq Scan on origin (cost=0.00..20.70 rows=1070 width=22) > -> Hash (cost=20.70..20.70 rows=1070 width=22) >-> Seq Scan on diff (cost=0.00..20.70 rows=1070 width=22) > (6 rows) > > 3. My real `origin` table has 26 fields and 800 billion rows, real `diff` > table has 12 million rows and the query generates plan with nested loop and > seq scan on `diff` table: > > explain delete from drug_refills origin where exists (select id from > drug_refills_diff diff where origin.id = diff.id); > QUERY PLAN > > -- > Delete on drug_refills origin (cost=0.57..22049570.11 rows=11975161 > width=12) >-> Nested Loop (cost=0.57..22049570.11 rows=11975161 width=12) > -> Seq Scan on drug_refills_diff diff (cost=0.00..720405.61 > rows=11975161 width=22) > -> Index Scan using drug_refills_pkey on drug_refills origin > (cost=0.57..1.77 rows=1 width=22) >Index Cond: (id = diff.id) > (5 rows) > > I have run ANALYZE on both tables, but it didn't help. Here are column > types in origin and diff (same schema), if that matters: > > uuid > timestamp with time zone > timestamp with time zone > character varying(255) > character varying(255) > character varying(1024) > numeric(10,4) > integer > numeric(14,8) > numeric(14,8) > numeric(14,8) > numeric(14,8) > numeric(14,8) > character varying(16) > character varying(16) > character varying(16) > character varying(16) > character varying(16) > character varying(16) > date > jsonb > text[] > uuid > uuid > uuid > uuid > >
Re: Delete tables difference involves seq scan
I was able to speedup original query a lot by using CTE. It still uses seq scan on `diff` table, but looks like it does this once: explain with diff as (select id from drug_refills_diff) delete from drug_refills where id in (select id from diff); QUERY PLAN --- Delete on drug_refills (cost=989844.94..990366.86 rows=45636 width=46) CTE diff -> Seq Scan on drug_refills_diff (cost=0.00..720404.88 rows=11975088 width=16) -> Nested Loop (cost=269440.05..269961.98 rows=45636 width=46) -> HashAggregate (cost=269439.48..269441.48 rows=200 width=56) Group Key: diff.id -> CTE Scan on diff (cost=0.00..239501.76 rows=11975088 width=56) -> Index Scan using drug_refills_pkey on drug_refills (cost=0.57..2.59 rows=1 width=22) Index Cond: (id = diff.id) (9 rows) 2017-12-01 15:17 GMT+02:00 Danylo Hlynskyi : > Oh, sorry, this happens on Postgresql 9.6.6. I've checked that on > Postgresql 10.0 query plan from setup (1) > now uses two seq scans, like in setup (2). > > > 2017-12-01 15:03 GMT+02:00 Danylo Hlynskyi : > >> Hello. I want to remove rows from first table, that exist in second >> (equality is done using PK). However I experience seq scan on second table, >> which counters my intuition - I think it should be index-only. Because >> tables are large, performance of query is very bad. >> >> However I got mixed results when trying to reproduce this behavior on >> syntetic tables. Here I'll show 3 different plans, which I got for the same >> query. >> >> 1. Setup is: >> --- >> create table diff (id uuid constraint diff_pkey primary key); >> create table origin (id uuid constraint origin_pkey primary key); >> --- >> >> The query generates correct plan, which performs only index scans: >> >> explain delete from origin where exists (select id from diff where >> origin.id = diff.id); >> QUERY PLAN >> >> --- >> Delete on origin (cost=0.30..105.56 rows=1850 width=12) >>-> Merge Semi Join (cost=0.30..105.56 rows=1850 width=12) >> Merge Cond: (origin.id = diff.id) >> -> Index Scan using origin_pkey on origin (cost=0.15..38.90 >> rows=1850 width=22) >> -> Index Scan using diff_pkey on diff (cost=0.15..38.90 >> rows=1850 width=22) >> (5 rows) >> >> 2. Setup is: >> >> create table origin (id uuid constraint origin_pkey primary key, data >> jsonb); >> create table diff (id uuid constraint diff_pkey primary key, data jsonb); >> >> >> The query generates plan with two seq scans: >> >> explain delete from origin where exists (select id from diff where >> origin.id = diff.id); >> QUERY PLAN >> >> --- >> Delete on origin (cost=34.08..69.49 rows=1070 width=12) >>-> Hash Semi Join (cost=34.08..69.49 rows=1070 width=12) >> Hash Cond: (origin.id = diff.id) >> -> Seq Scan on origin (cost=0.00..20.70 rows=1070 width=22) >> -> Hash (cost=20.70..20.70 rows=1070 width=22) >>-> Seq Scan on diff (cost=0.00..20.70 rows=1070 width=22) >> (6 rows) >> >> 3. My real `origin` table has 26 fields and 800 billion rows, real `diff` >> table has 12 million rows and the query generates plan with nested loop and >> seq scan on `diff` table: >> >> explain delete from drug_refills origin where exists (select id from >> drug_refills_diff diff where origin.id = diff.id); >> QUERY PLAN >> >> -- >> Delete on drug_refills origin (cost=0.57..22049570.11 rows=11975161 >> width=12) >>-> Nested Loop (cost=0.57..22049570.11 rows=11975161 width=12) >> -> Seq Scan on drug_refills_diff diff (cost=0.00..720405.61 >> rows=11975161 width=22) >> -> Index Scan using drug_refills_pkey on drug_refills origin >> (cost=0.57..1.77 rows=1 width=22) >>Index Cond: (id = diff.id) >> (5 rows) >> >> I have run ANALYZE on both tables, but it didn't help. Here are column >> types in origin and diff (same schema), if that matters: >> >> uuid >> timestamp with time zone >> timestamp with time zone >> character varying(255) >> character varying(255) >> character varying(1024) >> numeric(10,4) >> integer >> numeric(14,8) >> numeric(14,8) >> numeric(14,8) >> numeric(14,8) >> numeric(14,8) >> character varying(16) >> character varying(16) >> character varying(16) >> character varying(16) >> character varying(16) >> character varying
Bad plan for ltree predicate <@
Hi, I have a problem on 9.3.14 with a query that accesses table: Size: (retrieved by query https://gist.github.com/romank0/74f9d1d807bd3f41c0729d0fc6126749) schemaname |relname| size | toast | associated_idx_size | total +---+++-+- public | document_head | 275 MB | 630 MB | 439 MB | 1345 MB Definition: Table "public.document_head" Column| Type | Modifiers -+--+- snapshot_id | character varying(36)| not null id | character varying(36)| not null base_type | character varying(50)| not null is_cascade | boolean | not null default false parent_id | character varying(36)| fileplan_node_id| character varying(36)| state | character varying(10)| default 'ACTIVE'::character varying title | character varying(4096) | not null properties | text | not null properties_cache| hstore | not null serial_number | integer | not null major_version | integer | not null minor_version | integer | not null version_description | text | sensitivity | integer | not null default 10 id_path | ltree| path_name | character varying(4096) | collate C not null ltx_id | bigint | not null created_by | integer | not null created_date| timestamp with time zone | not null modified_by | integer | not null modified_date | timestamp with time zone | not null responsible_user_ids| integer[]| origin_id | character varying(36)| origin_snapshot_id | character varying(36)| ssn | character varying(64)| record_physical_location| text | record_physical_location_id | text | record_created_date | timestamp with time zone | record_aggregated_date | timestamp with time zone | record_last_review_comment | text | record_last_review_date | timestamp with time zone | record_next_review_date | timestamp with time zone | record_originated_date | timestamp with time zone | record_is_vital | boolean | not null default false storage_plan_state | text | not null default 'New'::text cut_off_date| timestamp with time zone | dispose_date| timestamp with time zone | archive_date| timestamp with time zone | Indexes: "document_head__id__uniq_key" PRIMARY KEY, btree (id) "document_head__parent_id__path_name__unq_idx" UNIQUE, btree (parent_id, path_name) WHERE state::text = 'ACTIVE'::text "document_head__snapshot_id__unq" UNIQUE, btree (snapshot_id) "document_head__base_type__idx" btree (base_type) WHERE state::text <> 'DELETED'::text "document_head__fileplan_node_id__idx" btree (fileplan_node_id) "document_head__id__idx" btree (id) WHERE state::text <> 'DELETED'::text "document_head__id_path__btree__idx" btree (id_path) WHERE state::text <> 'DELETED'::text "document_head__id_path__gist__idx" gist (id_path) "document_head__ltx_id__idx" btree (ltx_id) "document_head__origin_id__hotfix__idx" btree (origin_id) WHERE origin_id IS NOT NULL "document_head__origin_id__idx" btree (origin_id) WHERE state::text <> 'DELETED'::text AND origin_id IS NOT NULL "document_head__parent_id__idx" btree (parent_id) "document_head__properties_cache__contact_username_idx" btree ((properties_cache -> 'person_meta_info.username'::text)) WHERE base_type::text = 'Contact'::text AND exist(properties_cache, 'person_meta_info.username'::text) "document_head__properties_cache__emailmeta_message_id__idx" btree ((properties_cache -> 'emailmeta.message_id'::text)) WHERE base_type::text = 'File'::text AND exist(properties_cache, 'emailmeta.message_id'::text) "document_head__properties_cache__idx" gist (properties_cache) WHERE state::text <> 'DELETED'::text "document_head__properties_cache__project_identifier__idx" btree ((properties_cache -> 'project.identifier'::text)) WHERE base_type::text = 'Project'::text AND exist(properties_cache, 'project.identifier'::text) "document_head__properties_cache__published_origin__idx" btree ((properties_ca
Bitmap scan is undercosted?
Hi,
We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.
I've found some cases where similar issues with bitmap scans were
reported before:
https://www.postgresql.org/message-id/flat/1456154321.976561.528310154.6A623C0E%40webmail.messagingengine.com
https://www.postgresql.org/message-id/flat/CA%2BwPC0MRMhF_8fD9dc8%2BQWZQzUvHahPRSv%3DxMtCmsVLSsy-p0w%40mail.gmail.com
I've made a synthetic test, which kind of reproduces the issue:
shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB
set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;
drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 1000) id;
create index i1 on aaa (num);
create index i2 on aaa (flag);
analyze aaa;
select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";985;44248;226
"i1";985;27422;365
"i2";985;27422;365
I've been running the same query while enabling and disabling different
kinds of scans:
1) set enable_bitmapscan = on; set enable_indexscan = off; set
enable_seqscan = off;
2) set enable_bitmapscan = off; set enable_indexscan = on; set
enable_seqscan = off;
3) set enable_bitmapscan = off; set enable_indexscan = off; set
enable_seqscan = on;
The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;
Here are the results for PostgreSQL 9.6 (for 9.3 and 10.1 the results
are very similar):
1) Aggregate (cost=24821.70..24821.71 rows=1 width=8) (actual
time=184.591..184.591 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=47259
-> Bitmap Heap Scan on public.aaa (cost=13038.21..24796.22
rows=10189 width=0) (actual time=122.492..178.006 rows=10 loops=1)
Output: num, flag
Recheck Cond: (aaa.num = 1)
Filter: aaa.flag
Heap Blocks: exact=44248
Buffers: shared hit=47259
-> BitmapAnd (cost=13038.21..13038.21 rows=10189 width=0)
(actual time=110.699..110.699 rows=0 loops=1)
Buffers: shared hit=3011
-> Bitmap Index Scan on i1 (cost=0.00..1158.94
rows=99667 width=0) (actual time=19.600..19.600 rows=10 loops=1)
Index Cond: (aaa.num = 1)
Buffers: shared hit=276
-> Bitmap Index Scan on i2 (cost=0.00..11873.92
rows=1022332 width=0) (actual time=81.676..81.676 rows=100 loops=1)
Index Cond: (aaa.flag = true)
Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 184.988 ms
2) Aggregate (cost=67939.09..67939.10 rows=1 width=8) (actual
time=67.510..67.510 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=44524
-> Index Scan using i1 on public.aaa (cost=0.44..67910.95
rows=11256 width=0) (actual time=0.020..61.180 rows=10 loops=1)
Output: num, flag
Index Cond: (aaa.num = 1)
Filter: aaa.flag
Buffers: shared hit=44524
Planning time: 0.096 ms
Execution time: 67.543 ms
3) Aggregate (cost=169276.49..169276.50 rows=1 width=8) (actual
time=977.063..977.063 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=44248
-> Seq Scan on public.aaa (cost=0.00..169248.35 rows=11256 width=0)
(actual time=0.018..969.294 rows=10 loops=1)
Output: num, flag
Filter: (aaa.flag AND (aaa.num = 1))
Rows Removed by Filter: 990
Buffers: shared hit=44248
Planning time: 0.099 ms
Execution time: 977.094 ms
The bitmap scan version runs more than twice slower than the one with
index scan, while being costed at more than twice cheaper.
I've tried to increase cpu_tuple_cost and cpu_index_tuple_cost, and this
behavior remains after 6x increase in values. Although the difference in
costs becomes much less. After increasing the settings more than 6x,
PostgreSQL decides to use a different plan for bitmap scans, so it's
hard to make conclusions at that point.
Could such cases be fixed with tuning of cost settings, or that's just
how PostgreSQL estimates bitmap scans and this can't be fixed without
modifying the optimizer? Or am I missing something and that's the
expected behavior? Thoughts?
Regards,
Vitaliy
Re: Bitmap scan is undercosted?
On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> We recently had an issue in production, where a bitmap scan was chosen
> instead of an index scan. Despite being 30x slower, the bitmap scan had
> about the same cost as the index scan.
Me too, see also:
https://www.postgresql.org/message-id/flat/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA%40mail.gmail.com#cah2-wzkrtggiy_lkquu-oviyp6y_hhz-a1ywacpy4tcywv1...@mail.gmail.com
> drop table if exists aaa;
> create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
> generate_series(1, 1000) id;
> create index i1 on aaa (num);
> create index i2 on aaa (flag);
> analyze aaa;
>
> select relname, reltuples::bigint, relpages::bigint,
> (reltuples/relpages)::bigint tpp from pg_class where relname
> in('aaa','i1','i2') order by relname;
> "aaa";985;44248;226
> "i1";985;27422;365
> "i2";985;27422;365
>
> The query was:
> explain (analyze,verbose,costs,buffers)
> select count(*) from aaa where num = 1 and flag = true;
Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x. It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.
The reason why it's more than a bit slower is due to the "density" [0] of the
heap pages read. num=1 is more selective than flag=true, so it scans i1,
reading 1% of the whole table. But it's not reading the first 1% or
some other 1% of the table, it reads tuples evenly distributed across the
entire table (226*0.01 = ~2 rows of each page). Since the index was created
after the INSERT, the repeated keys (logical value: id%100) are read in
physical order on the heap, so this is basically doing a seq scan, but with the
additional overhead of reading the index, and maybe doing an fseek() before
each/some read()s, too. You could confirm that by connecting strace to the
backend before starting the query.
Since you did that using % and with indices added after the INSERT, you can't
improve it by reindexing (as I was able to for my case). That's an elegant
test case, so thanks.
I think shared_buffers=512MB is just small enough for this test to be painful
for 1e7 rows. I see the table+index is 559MB.
I don't know if that's really similar to your production use case, but I would
recommend trying BRIN indices, which always require a bitmap scan. Note that
some things (like max()) that can use an btree index cannot use brin. PG10.1
has WITH autosummarize, which was important for our use, since we rarely do
UPDATEs or DELETEs so tables are rarely vacuumed (only analyzed).
Justin
[0] I'm borrowing Jeff's language from here:
https://www.postgresql.org/message-id/CAMkU%3D1xwGn%2BO0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q%40mail.gmail.com
"density" wasn't our problem, but it's a perfect description of this issue.
Re: Bad plan for ltree predicate <@
Roman Konoval writes: > I have a problem on 9.3.14 with a query that accesses table: I think the root of the problem is your intermediate function: > CREATE OR REPLACE FUNCTION public.get_doc_path(document_id character varying) > RETURNS ltree > LANGUAGE plpgsql > STABLE > AS $function$ > DECLARE > path ltree; > BEGIN > select id_path into path from document_head where id = document_id; > RETURN path; > END $function$ This is quite expensive, as it involves another table search, but the planner doesn't know that since you've not marked it as having higher than normal cost. The seqscan formulation of the query results in evaluating this function afresh at most of the rows, whereas shoving it into an uncorrelated sub-select causes it to be evaluated only once. That, I think, and not the seqscan-vs-indexscan aspect, is what makes the bitmap formulation go faster. Certainly you'd not expect that a bitmap scan that has to hit most of the rows anyway is going to win over a seqscan. The fact that the planner goes for a bitmap scan in the second formulation is an artifact of the fact that it doesn't try to pre-evaluate sub-selects for selectivity estimation purposes, so you end up with a default estimate that says that the <@ condition only selects a small fraction of the rows. Not sure if we should try to change that or not. I'd suggest setting the function's cost to 1000 or so and seeing if that doesn't improve matters. (BTW, what tipped me off to this was that the "buffers hit" count for the seqscan node was so high, several times more than the actual size of the table. I couldn't account for that until I realized that the function itself would be adding a few buffer hits per execution.) regards, tom lane
Re: Bitmap scan is undercosted?
On 01/12/2017 20:34, Justin Pryzby wrote:
On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.
Me too, see also:
https://www.postgresql.org/message-id/flat/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA%40mail.gmail.com#cah2-wzkrtggiy_lkquu-oviyp6y_hhz-a1ywacpy4tcywv1...@mail.gmail.com
drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 1000) id;
create index i1 on aaa (num);
create index i2 on aaa (flag);
analyze aaa;
select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";985;44248;226
"i1";985;27422;365
"i2";985;27422;365
The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;
Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x. It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.
I don't think the planner is that smart to account for correlation
between values in different columns. When different values are used in
filter (num=2, num=39, num=74), the query actually runs faster, while
still being about twice slower than using an index scan. But the cost
does not change much. It jumps up and down for different values, but
it's still close to the initial value.
Aggregate (cost=24239.02..24239.03 rows=1 width=8) (actual
time=105.239..105.239 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=3011
-> Bitmap Heap Scan on public.aaa (cost=12812.05..24214.48
rows=9816 width=0) (actual time=105.236..105.236 rows=0 loops=1)
Output: num, flag
Recheck Cond: (aaa.num = 39)
Filter: aaa.flag
Buffers: shared hit=3011
-> BitmapAnd (cost=12812.05..12812.05 rows=9816 width=0)
(actual time=105.157..105.157 rows=0 loops=1)
Buffers: shared hit=3011
-> Bitmap Index Scan on i1 (cost=0.00..1134.94
rows=97667 width=0) (actual time=15.725..15.725 rows=10 loops=1)
Index Cond: (aaa.num = 39)
Buffers: shared hit=276
-> Bitmap Index Scan on i2 (cost=0.00..11671.96
rows=1005003 width=0) (actual time=77.920..77.920 rows=100 loops=1)
Index Cond: (aaa.flag = true)
Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 105.553 ms
Aggregate (cost=65785.99..65786.00 rows=1 width=8) (actual
time=48.587..48.587 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=44524
-> Index Scan using i1 on public.aaa (cost=0.44..65761.45 rows=9816
width=0) (actual time=48.583..48.583 rows=0 loops=1)
Output: num, flag
Index Cond: (aaa.num = 39)
Filter: aaa.flag
Rows Removed by Filter: 10
Buffers: shared hit=44524
Planning time: 0.097 ms
Execution time: 48.620 ms
The reason why it's more than a bit slower is due to the "density" [0] of the
heap pages read. num=1 is more selective than flag=true, so it scans i1,
reading 1% of the whole table. But it's not reading the first 1% or
some other 1% of the table, it reads tuples evenly distributed across the
entire table (226*0.01 = ~2 rows of each page). Since the index was created
after the INSERT, the repeated keys (logical value: id%100) are read in
physical order on the heap, so this is basically doing a seq scan, but with the
additional overhead of reading the index, and maybe doing an fseek() before
each/some read()s, too. You could confirm that by connecting strace to the
backend before starting the query.
Since you did that using % and with indices added after the INSERT, you can't
improve it by reindexing (as I was able to for my case). That's an elegant
test case, so thanks.
I think shared_buffers=512MB is just small enough for this test to be painful
for 1e7 rows. I see the table+index is 559MB.
table | ~count | size | toast | idx |
size + toast + idx
---+-++++
aaa | 994 | 346 MB | 0 bytes | 428 MB
| 774 MB
But the plan says all buffers are "shared hit", and none "read", so
that's probably not an issue.
I don't know if that's really similar to your production use case, but I would
recommend trying BRIN indices, which always require a bitmap scan. Note that
some things (like max()) that can use an btree index cannot use brin. PG10.1
has WITH autosummarize, which was important for our use, since we rarely do
UPDATEs or DELETEs so tables are rarely va
Re: Bitmap scan is undercosted?
I tried to reproduce this issue and couldn't, under PG95 and 10.1: On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote: > On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote: > > We recently had an issue in production, where a bitmap scan was chosen > > instead of an index scan. Despite being 30x slower, the bitmap scan had > > about the same cost as the index scan. > > > drop table if exists aaa; > > create table aaa as select (id%100)::int num, (id%10=1)::bool flag from > > generate_series(1, 1000) id; > > create index i1 on aaa (num); > > create index i2 on aaa (flag); > > analyze aaa; What is: effective_io_concurrency max_parallel_workers_per_gather (I gather you don't have this) Note: postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num'; correlation | 0.00710112 ..so this is different from the issue corrected by the patch I created while testing. > Note that id%100==1 implies flag='t', so the planner anticipates retrieving > fewer rows than it will ultimately read, probably by 2x. It makes sense that > causes the index scan to be more expensive than expected, but that's only > somewhat important, since there's no joins involved. I changed the query from COUNT(*) TO * for easier to read explain: CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM generate_series(1, 1000) id; CREATE INDEX i1 ON aaa(num); CREATE INDEX i2 ON aaa (flag); ANALYZE VERBOSE aaa; EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true; Bitmap Heap Scan on public.aaa (cost=20652.98..45751.75 rows=10754 width=5) (actual time=85.314..185.107 rows=10 loops=1) -> BitmapAnd (cost=20652.98..20652.98 rows=10754 width=0) (actual time=163.220..163.220 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..1965.93 rows=106333 width=0) (actual time=26.943..26.943 rows=10 loops=1) -> Bitmap Index Scan on i2 (cost=0.00..18681.42 rows=1011332 width=0) (actual time=133.804..133.804 rows=100 loops=1) ..which is what's wanted with no planner hints (PG10.1 here). Same on PG95: postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true; Bitmap Heap Scan on public.aaa (cost=19755.64..43640.32 rows=9979 width=5) (actual time=230.017..336.583 rows=10 loops=1) -> BitmapAnd (cost=19755.64..19755.64 rows=9979 width=0) (actual time=205.242..205.242 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..1911.44 rows=103334 width=0) (actual time=24.911..24.911 rows=10 loops=1) -> Bitmap Index Scan on i2 (cost=0.00..17838.96 rows=965670 width=0) (actual time=154.237..154.237 rows=100 loops=1) The rowcount is off, but not a critical issue without a join. Justin
Re: Bitmap scan is undercosted?
On 02/12/2017 01:11, Justin Pryzby wrote: I tried to reproduce this issue and couldn't, under PG95 and 10.1: On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote: On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote: We recently had an issue in production, where a bitmap scan was chosen instead of an index scan. Despite being 30x slower, the bitmap scan had about the same cost as the index scan. drop table if exists aaa; create table aaa as select (id%100)::int num, (id%10=1)::bool flag from generate_series(1, 1000) id; create index i1 on aaa (num); create index i2 on aaa (flag); analyze aaa; What is: effective_io_concurrency max_parallel_workers_per_gather (I gather you don't have this) effective_io_concurrency = 0 max_parallel_workers_per_gather = 0 Did you notice random_page_cost = 1.5 ? For this test I'm using SSD and Windows (if that matters). On production we also use SSD, hence lower random_page_cost. But with the default random_page_cost=4.0, the difference in cost between the index scan plan and the bitmap scan plan is even bigger. Note: postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num'; correlation | 0.00710112 ..so this is different from the issue corrected by the patch I created while testing. Note that id%100==1 implies flag='t', so the planner anticipates retrieving fewer rows than it will ultimately read, probably by 2x. It makes sense that causes the index scan to be more expensive than expected, but that's only somewhat important, since there's no joins involved. I changed the query from COUNT(*) TO * for easier to read explain: CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM generate_series(1, 1000) id; CREATE INDEX i1 ON aaa(num); CREATE INDEX i2 ON aaa (flag); ANALYZE VERBOSE aaa; EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true; Bitmap Heap Scan on public.aaa (cost=20652.98..45751.75 rows=10754 width=5) (actual time=85.314..185.107 rows=10 loops=1) -> BitmapAnd (cost=20652.98..20652.98 rows=10754 width=0) (actual time=163.220..163.220 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..1965.93 rows=106333 width=0) (actual time=26.943..26.943 rows=10 loops=1) -> Bitmap Index Scan on i2 (cost=0.00..18681.42 rows=1011332 width=0) (actual time=133.804..133.804 rows=100 loops=1) ..which is what's wanted with no planner hints (PG10.1 here). Yes, that's what you get without planner hints, but it's strange to get this plan, when there is another one, which runs 2-3 times faster, but happens to be estimated to be twice more costly than the one with bitmap scans: # set enable_bitmapscan = off; set enable_indexscan = on; set enable_seqscan = off; # explain analyze select * from aaa where num = 1 and flag = true; Index Scan using i1 on aaa (cost=0.44..66369.81 rows=10428 width=5) (actual time=0.020..57.765 rows=10 loops=1) vs. # set enable_bitmapscan = on; set enable_indexscan = off; set enable_seqscan = off; # explain analyze select * from aaa where num = 1 and flag = true; Bitmap Heap Scan on aaa (cost=13099.33..25081.40 rows=10428 width=5) (actual time=122.137..182.811 rows=10 loops=1) -> BitmapAnd (cost=13099.33..13099.33 rows=10428 width=0) (actual time=110.168..110.168 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..1181.44 rows=101667 width=0) (actual time=20.845..20.845 rows=10 loops=1) -> Bitmap Index Scan on i2 (cost=0.00..11912.43 rows=1025666 width=0) (actual time=80.323..80.323 rows=100 loops=1) Same on PG95: postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true; Bitmap Heap Scan on public.aaa (cost=19755.64..43640.32 rows=9979 width=5) (actual time=230.017..336.583 rows=10 loops=1) -> BitmapAnd (cost=19755.64..19755.64 rows=9979 width=0) (actual time=205.242..205.242 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..1911.44 rows=103334 width=0) (actual time=24.911..24.911 rows=10 loops=1) -> Bitmap Index Scan on i2 (cost=0.00..17838.96 rows=965670 width=0) (actual time=154.237..154.237 rows=100 loops=1) The rowcount is off, but not a critical issue without a join. Justin
Re: Bitmap scan is undercosted?
On Fri, Dec 01, 2017 at 05:11:04PM -0600, Justin Pryzby wrote: > I tried to reproduce this issue and couldn't, under PG95 and 10.1: I'm embarassed to say that I mis-read your message, despite you're amply clear subject. You're getting a bitmap scan but you'd prefer to get an index scan. I anticipated the opposite problem (which is what I've had issues with myself). > On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote: > > On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote: > > > We recently had an issue in production, where a bitmap scan was chosen > > > instead of an index scan. Despite being 30x slower, the bitmap scan had > > > about the same cost as the index scan. > > Note: > postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND > attname='num'; > correlation | 0.00710112 > > ..so this is different from the issue corrected by the patch I created while > testing. Actually, that the table is "not correlated" on "num" column is maybe the primary reason why PG avoids using an index scan. It (more or less correctly) deduces that it's going to have to "read" a large fraction of the pages (even if only to process a small fraction of the rows), which is costly, except it's all cached.. In your case, that overly-penalizes the index scan. This is cost_index() and cost_bitmap_heap_scan() in costsize.c. Since the index is uncorrelated, it's returning something close to max_IO_cost. It looks like effective_cache_size only affects index_pages_fetched(). I'm going to try to dig some more into it. Maybe there's evidence to re-evaluate one of these: cost_index() | run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost); or cost_bitmap_heap_scan() | cost_per_page = spc_random_page_cost - | (spc_random_page_cost - spc_seq_page_cost) | * sqrt(pages_fetched / T); Justin
Re: Bad plan for ltree predicate <@
Hi Tom,
Thanks for your help.
> On Dec 1, 2017, at 22:33, Tom Lane wrote:
>
>
> The seqscan formulation of the query results in evaluating
> this function afresh at most of the rows
The function is defined as STABLE. I though that means that there is no need
to reevaluate it on every row as input parameter is the same for every row and
return value will be the same during the same query execution. Do I understand
incorrectly what STABLE means?
Why is the function evaluated more than once?
> , whereas shoving it into an
> uncorrelated sub-select causes it to be evaluated only once. That, I
> think, and not the seqscan-vs-indexscan aspect, is what makes the bitmap
> formulation go faster. Certainly you'd not expect that a bitmap scan that
> has to hit most of the rows anyway is going to win over a seqscan.
>
> The fact that the planner goes for a bitmap scan in the second formulation
> is an artifact of the fact that it doesn't try to pre-evaluate sub-selects
> for selectivity estimation purposes, so you end up with a default estimate
> that says that the <@ condition only selects a small fraction of the rows.
> Not sure if we should try to change that or not.
>
> I'd suggest setting the function's cost to 1000 or so and seeing if that
> doesn't improve matters.
>
If I set function cost to 1000 I get slightly better plan but still 3.5 more
buffers are read when compared to bitmap scan which as you wrote one would
expect to be slower than seq scan.
Here is the plan:
QUERY PLAN
-
Aggregate (cost=216438.81..216438.82 rows=1 width=0) (actual
time=1262.244..1262.245 rows=1 loops=1)
Buffers: shared hit=169215
CTE trees
-> Index Scan using document_head__id_path__gist__idx on document_head d
(cost=2.91..212787.85 rows=162265 width=74) (actual time=0.115..727.119
rows=154854 loops=1)
Index Cond: (id_path <@
get_doc_path('78157c60-45bc-42c1-9aad-c5651995db5c'::character varying))
Filter: (((id)::text <>
'78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND ((state)::text <>
'DELETED'::text))
Rows Removed by Filter: 23
Buffers: shared hit=169215
-> CTE Scan on trees (cost=0.00..3245.30 rows=162265 width=0) (actual
time=0.119..1118.899 rows=154854 loops=1)
Buffers: shared hit=169215
Total runtime: 1277.010 ms
(11 rows)
My understanding is that the optimal plan in this case should read less data
than bitmap scan by the amount of buffers hit by bitmap index scan.
It should read roughly all buffers of the table itself. Something like the
query with predicate using ltree literal instead of function invocation:
explain (analyze, buffers)
with trees AS (
SELECT d.id, d.snapshot_id , NULL :: text[] AS permissions
FROM document_head AS d
WHERE (d.id_path <@
'869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree
AND d.id != '78157c60-45bc-42c1-9aad-c5651995db5c') AND d.state != 'DELETED'
)
SELECT COUNT(*) FROM trees;
QUERY PLAN
-
Aggregate (cost=42114.02..42114.03 rows=1 width=0) (actual
time=997.427..997.427 rows=1 loops=1)
Buffers: shared hit=35230
CTE trees
-> Seq Scan on document_head d (cost=0.00..38463.06 rows=162265
width=74) (actual time=0.013..593.082 rows=154854 loops=1)
Filter: ((id_path <@
'869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree)
AND ((id)::text <> '78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND
((state)::text <> 'DELETED'::text))
Rows Removed by Filter: 23357
Buffers: shared hit=35230
-> CTE Scan on trees (cost=0.00..3245.30 rows=162265 width=0) (actual
time=0.017..888.076 rows=154854 loops=1)
Buffers: shared hit=35230
Total runtime: 1011.565 ms
(10 rows)
The question is if it possible to get plan like that using function or some
other way to get ltree value for given document_head.id value in one query?
As an alternative I can get ltree value with the separate query but this would
require
1. a round-trip to postgres
2. me to change isolation level to REPEATABLE READ to make sure that I get
consistent result
so I would like to avoid that.
Regards,
Roman Konoval
Re: Bitmap scan is undercosted?
On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich < [email protected]> wrote: > On 02/12/2017 01:11, Justin Pryzby wrote: > >> I tried to reproduce this issue and couldn't, under PG95 and 10.1: >> >> On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote: >> >>> On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote: >>> We recently had an issue in production, where a bitmap scan was chosen instead of an index scan. Despite being 30x slower, the bitmap scan had about the same cost as the index scan. drop table if exists aaa; create table aaa as select (id%100)::int num, (id%10=1)::bool flag from generate_series(1, 1000) id; create index i1 on aaa (num); create index i2 on aaa (flag); analyze aaa; >>> What is: >> effective_io_concurrency >> max_parallel_workers_per_gather (I gather you don't have this) >> > effective_io_concurrency = 0 > max_parallel_workers_per_gather = 0 > > Did you notice random_page_cost = 1.5 ? > For the aaa.num = 39 case, the faster index scan actually does hit 15 times more buffers than the bitmap scan does. While 1.5 is lot lower than 4.0, it is still much higher than the true cost of reading a page from the buffer cache. This why the index scan is getting punished. You could lower random_page_cost and seq_page_cost to 0, to remove those considerations. (I'm not saying you should do this on your production system, but rather you should do it as a way to investigate the issue. But it might make sense on production as well) > For this test I'm using SSD and Windows (if that matters). On production > we also use SSD, hence lower random_page_cost. But with the default > random_page_cost=4.0, the difference in cost between the index scan plan > and the bitmap scan plan is even bigger. Since it is all shared buffers hits, it doesn't matter if you have SSD for this particular test case. Cheers, Jeff
Re: Bitmap scan is undercosted?
On Sat, Dec 02, 2017 at 01:54:09AM +0200, Vitaliy Garnashevich wrote: > On 02/12/2017 01:11, Justin Pryzby wrote: > >..which is what's wanted with no planner hints (PG10.1 here). > Yes, that's what you get without planner hints, but it's strange to get this > plan, when there is another one, which runs 2-3 times faster, but happens to > be estimated to be twice more costly than the one with bitmap scans: > > # set enable_bitmapscan = off; set enable_indexscan = on; set enable_seqscan > = off; > # explain analyze select * from aaa where num = 1 and flag = true; > Index Scan using i1 on aaa (cost=0.44..66369.81 rows=10428 width=5) (actual > time=0.020..57.765 rows=10 loops=1) > > vs. > > # set enable_bitmapscan = on; set enable_indexscan = off; set enable_seqscan > = off; > # explain analyze select * from aaa where num = 1 and flag = true; > Bitmap Heap Scan on aaa (cost=13099.33..25081.40 rows=10428 width=5) (actual > time=122.137..182.811 rows=10 loops=1) I was able to get an index plan with: SET random_page_cost=1; SET cpu_index_tuple_cost=.04; -- default: 0.005; see selfuncs.c postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true; Index Scan using i1 on public.aaa (cost=0.43..50120.71 rows=10754 width=5) (actual time=0.040..149.580 rows=10 loops=1) Or with: SET random_page_cost=1; SET cpu_operator_cost=0.03; -- default: 0.0025 see cost_bitmap_tree_node() EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag= true; Index Scan using i1 on public.aaa (cost=5.22..49328.00 rows=10754 width=5) (actual time=0.051..109.082 rows=10 loops=1) Or a combination trying to minimize the cost of the index scan: postgres=# SET random_page_cost=1; SET cpu_index_tuple_cost=.0017; SET cpu_operator_cost=0.03; EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag= true; Index Scan using i1 on public.aaa (cost=5.22..48977.10 rows=10754 width=5) (actual time=0.032..86.883 rows=10 loops=1) Not sure if that's reasonable, but maybe it helps to understand. Justin
Re: Bitmap scan is undercosted?
On 02/12/2017 07:51, Jeff Janes wrote: On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich mailto:[email protected]>> wrote: On 02/12/2017 01:11, Justin Pryzby wrote: I tried to reproduce this issue and couldn't, under PG95 and 10.1: On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote: On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote: We recently had an issue in production, where a bitmap scan was chosen instead of an index scan. Despite being 30x slower, the bitmap scan had about the same cost as the index scan. drop table if exists aaa; create table aaa as select (id%100)::int num, (id%10=1)::bool flag from generate_series(1, 1000) id; create index i1 on aaa (num); create index i2 on aaa (flag); analyze aaa; What is: effective_io_concurrency max_parallel_workers_per_gather (I gather you don't have this) effective_io_concurrency = 0 max_parallel_workers_per_gather = 0 Did you notice random_page_cost = 1.5 ? For the aaa.num = 39 case, the faster index scan actually does hit 15 times more buffers than the bitmap scan does. While 1.5 is lot lower than 4.0, it is still much higher than the true cost of reading a page from the buffer cache. This why the index scan is getting punished. You could lower random_page_cost and seq_page_cost to 0, to remove those considerations. (I'm not saying you should do this on your production system, but rather you should do it as a way to investigate the issue. But it might make sense on production as well) seq_page_cost = 1.0 random_page_cost = 1.0* *explain analyze select * from aaa where num = 2 and flag = true; Bitmap Heap Scan on aaa (cost=11536.74..20856.96 rows=10257 width=5) (actual time=108.338..108.338 rows=0 loops=1) -> BitmapAnd (cost=11536.74..11536.74 rows=10257 width=0) (actual time=108.226..108.226 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..1025.43 rows=10 width=0) (actual time=18.563..18.563 rows=10 loops=1) -> Bitmap Index Scan on i2 (cost=0.00..10505.93 rows=1025666 width=0) (actual time=78.493..78.493 rows=100 loops=1) Index Scan using i1 on aaa (cost=0.44..44663.58 rows=10257 width=5) (actual time=51.264..51.264 rows=0 loops=1) Here I've used the filter num = 2, which produces rows=0 at BitmapAnd, and thus avoids a lot of work at "Bitmap Heap Scan" node, while still leaving about the same proportion in bitmap vs index - the bitmap is twice slower but twice less costly. It does not matter much which value to use for the filter, if it's other than num = 1. seq_page_cost = 0.0 random_page_cost = 0.0 explain analyze select * from aaa where num = 2 and flag = true; Bitmap Heap Scan on aaa (cost=753.00..2003.00 rows=10257 width=5) (actual time=82.212..82.212 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..750.43 rows=10 width=0) (actual time=17.401..17.401 rows=10 loops=1) Index Scan using i1 on aaa (cost=0.44..1750.43 rows=10257 width=5) (actual time=49.766..49.766 rows=0 loops=1) The bitmap plan was reduced to use only one bitmap scan, and finally it costs more than the index plan. But I doubt that the settings seq_page_cost = random_page_cost = 0.0 should actually be used. Probably it should be instead something like 1.0/1.0 or 1.0/1.1, but other costs increased, to have more weight. # x4 tuple/operator costs - bitmap scan still a bit cheaper set seq_page_cost = 1.0; set random_page_cost = 1.0; set cpu_tuple_cost = 0.04; set cpu_index_tuple_cost = 0.02; set cpu_operator_cost = 0.01; Bitmap Heap Scan on aaa (cost=36882.97..46587.82 rows=10257 width=5) (actual time=106.045..106.045 rows=0 loops=1) -> BitmapAnd (cost=36882.97..36882.97 rows=10257 width=0) (actual time=105.966..105.966 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..3276.74 rows=10 width=0) (actual time=15.977..15.977 rows=10 loops=1) -> Bitmap Index Scan on i2 (cost=0.00..33584.72 rows=1025666 width=0) (actual time=79.208..79.208 rows=100 loops=1) Index Scan using i1 on aaa (cost=1.74..49914.89 rows=10257 width=5) (actual time=50.144..50.144 rows=0 loops=1) # x5 tuple/operator costs - switched to single bitmap index scan, but now it costs more than the index scan set seq_page_cost = 1.0; set random_page_cost = 1.0; set cpu_tuple_cost = 0.05; set cpu_index_tuple_cost = 0.025; set cpu_operator_cost = 0.0125; Bitmap Heap Scan on aaa (cost=4040.00..54538.00 rows=10257 width=5) (actual time=82.338..82.338 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..4027.18 rows=10 width=0) (actual time=19.541..19.541 rows=10 loops=1) Index Scan using i1 on aaa (cost=2.17..51665.32 rows=10257 wi
