date:20171201

Delete tables difference involves seq scan

2017-12-01 Thread Danylo Hlynskyi

Hello. I want to remove rows from first table, that exist in second
(equality is done using PK). However I experience seq scan on second table,
which counters my intuition - I think it should be index-only. Because
tables are large, performance of query is very bad.

However I got mixed results when trying to reproduce this behavior on
syntetic tables. Here I'll show 3 different plans, which I got for the same
query.

1. Setup is:
---
create table diff (id uuid constraint diff_pkey primary key);
create table origin (id uuid constraint origin_pkey primary key);
---

The query generates correct plan, which performs only index scans:

explain delete from origin where exists (select id from diff where origin.id
= diff.id);
QUERY PLAN
---
 Delete on origin  (cost=0.30..105.56 rows=1850 width=12)
   ->  Merge Semi Join  (cost=0.30..105.56 rows=1850 width=12)
 Merge Cond: (origin.id = diff.id)
 ->  Index Scan using origin_pkey on origin  (cost=0.15..38.90
rows=1850 width=22)
 ->  Index Scan using diff_pkey on diff  (cost=0.15..38.90
rows=1850 width=22)
(5 rows)

2. Setup is:

create table origin (id uuid constraint origin_pkey primary key, data
jsonb);
create table diff (id uuid constraint diff_pkey primary key, data jsonb);


The query generates plan with two seq scans:

explain delete from origin where exists (select id from diff where origin.id
= diff.id);
QUERY PLAN
---
 Delete on origin  (cost=34.08..69.49 rows=1070 width=12)
   ->  Hash Semi Join  (cost=34.08..69.49 rows=1070 width=12)
 Hash Cond: (origin.id = diff.id)
 ->  Seq Scan on origin  (cost=0.00..20.70 rows=1070 width=22)
 ->  Hash  (cost=20.70..20.70 rows=1070 width=22)
   ->  Seq Scan on diff  (cost=0.00..20.70 rows=1070 width=22)
(6 rows)

3. My real `origin` table has 26 fields and 800 billion rows, real `diff`
table has 12 million rows and the query generates plan with nested loop and
seq scan on `diff` table:

explain delete from drug_refills origin where exists (select id from
drug_refills_diff diff where origin.id = diff.id);
QUERY PLAN
--
 Delete on drug_refills origin  (cost=0.57..22049570.11 rows=11975161
width=12)
   ->  Nested Loop  (cost=0.57..22049570.11 rows=11975161 width=12)
 ->  Seq Scan on drug_refills_diff diff  (cost=0.00..720405.61
rows=11975161 width=22)
 ->  Index Scan using drug_refills_pkey on drug_refills origin
(cost=0.57..1.77 rows=1 width=22)
   Index Cond: (id = diff.id)
(5 rows)

I have run ANALYZE on both tables, but it didn't help. Here are column
types in origin and diff (same schema), if that matters:

uuid
timestamp with time zone
timestamp with time zone
character varying(255)
character varying(255)
character varying(1024)
numeric(10,4)
integer
numeric(14,8)
numeric(14,8)
numeric(14,8)
numeric(14,8)
numeric(14,8)
character varying(16)
character varying(16)
character varying(16)
character varying(16)
character varying(16)
character varying(16)
date
jsonb
text[]
uuid
uuid
uuid
uuid

Re: Delete tables difference involves seq scan

2017-12-01 Thread Danylo Hlynskyi

Oh, sorry, this happens on Postgresql 9.6.6. I've checked that on
Postgresql 10.0 query plan from setup (1)
now uses two seq scans, like in setup (2).


2017-12-01 15:03 GMT+02:00 Danylo Hlynskyi :

> Hello. I want to remove rows from first table, that exist in second
> (equality is done using PK). However I experience seq scan on second table,
> which counters my intuition - I think it should be index-only. Because
> tables are large, performance of query is very bad.
>
> However I got mixed results when trying to reproduce this behavior on
> syntetic tables. Here I'll show 3 different plans, which I got for the same
> query.
>
> 1. Setup is:
> ---
> create table diff (id uuid constraint diff_pkey primary key);
> create table origin (id uuid constraint origin_pkey primary key);
> ---
>
> The query generates correct plan, which performs only index scans:
>
> explain delete from origin where exists (select id from diff where
> origin.id = diff.id);
> QUERY PLAN
> 
> ---
>  Delete on origin  (cost=0.30..105.56 rows=1850 width=12)
>->  Merge Semi Join  (cost=0.30..105.56 rows=1850 width=12)
>  Merge Cond: (origin.id = diff.id)
>  ->  Index Scan using origin_pkey on origin  (cost=0.15..38.90
> rows=1850 width=22)
>  ->  Index Scan using diff_pkey on diff  (cost=0.15..38.90
> rows=1850 width=22)
> (5 rows)
>
> 2. Setup is:
> 
> create table origin (id uuid constraint origin_pkey primary key, data
> jsonb);
> create table diff (id uuid constraint diff_pkey primary key, data jsonb);
> 
>
> The query generates plan with two seq scans:
>
> explain delete from origin where exists (select id from diff where
> origin.id = diff.id);
> QUERY PLAN
> 
> ---
>  Delete on origin  (cost=34.08..69.49 rows=1070 width=12)
>->  Hash Semi Join  (cost=34.08..69.49 rows=1070 width=12)
>  Hash Cond: (origin.id = diff.id)
>  ->  Seq Scan on origin  (cost=0.00..20.70 rows=1070 width=22)
>  ->  Hash  (cost=20.70..20.70 rows=1070 width=22)
>->  Seq Scan on diff  (cost=0.00..20.70 rows=1070 width=22)
> (6 rows)
>
> 3. My real `origin` table has 26 fields and 800 billion rows, real `diff`
> table has 12 million rows and the query generates plan with nested loop and
> seq scan on `diff` table:
>
> explain delete from drug_refills origin where exists (select id from
> drug_refills_diff diff where origin.id = diff.id);
> QUERY PLAN
> 
> --
>  Delete on drug_refills origin  (cost=0.57..22049570.11 rows=11975161
> width=12)
>->  Nested Loop  (cost=0.57..22049570.11 rows=11975161 width=12)
>  ->  Seq Scan on drug_refills_diff diff  (cost=0.00..720405.61
> rows=11975161 width=22)
>  ->  Index Scan using drug_refills_pkey on drug_refills origin
> (cost=0.57..1.77 rows=1 width=22)
>Index Cond: (id = diff.id)
> (5 rows)
>
> I have run ANALYZE on both tables, but it didn't help. Here are column
> types in origin and diff (same schema), if that matters:
>
> uuid
> timestamp with time zone
> timestamp with time zone
> character varying(255)
> character varying(255)
> character varying(1024)
> numeric(10,4)
> integer
> numeric(14,8)
> numeric(14,8)
> numeric(14,8)
> numeric(14,8)
> numeric(14,8)
> character varying(16)
> character varying(16)
> character varying(16)
> character varying(16)
> character varying(16)
> character varying(16)
> date
> jsonb
> text[]
> uuid
> uuid
> uuid
> uuid
>
>

Re: Delete tables difference involves seq scan

2017-12-01 Thread Danylo Hlynskyi

I was able to speedup original query a lot by using CTE. It still uses seq
scan on `diff` table, but looks like it does this once:

explain
with
  diff as (select id from drug_refills_diff)
delete from drug_refills
where id in (select id from diff);
QUERY
PLAN
---
 Delete on drug_refills  (cost=989844.94..990366.86 rows=45636 width=46)
   CTE diff
 ->  Seq Scan on drug_refills_diff  (cost=0.00..720404.88 rows=11975088
width=16)
   ->  Nested Loop  (cost=269440.05..269961.98 rows=45636 width=46)
 ->  HashAggregate  (cost=269439.48..269441.48 rows=200 width=56)
   Group Key: diff.id
   ->  CTE Scan on diff  (cost=0.00..239501.76 rows=11975088
width=56)
 ->  Index Scan using drug_refills_pkey on drug_refills
(cost=0.57..2.59 rows=1 width=22)
   Index Cond: (id = diff.id)
(9 rows)



2017-12-01 15:17 GMT+02:00 Danylo Hlynskyi :

> Oh, sorry, this happens on Postgresql 9.6.6. I've checked that on
> Postgresql 10.0 query plan from setup (1)
> now uses two seq scans, like in setup (2).
>
>
> 2017-12-01 15:03 GMT+02:00 Danylo Hlynskyi :
>
>> Hello. I want to remove rows from first table, that exist in second
>> (equality is done using PK). However I experience seq scan on second table,
>> which counters my intuition - I think it should be index-only. Because
>> tables are large, performance of query is very bad.
>>
>> However I got mixed results when trying to reproduce this behavior on
>> syntetic tables. Here I'll show 3 different plans, which I got for the same
>> query.
>>
>> 1. Setup is:
>> ---
>> create table diff (id uuid constraint diff_pkey primary key);
>> create table origin (id uuid constraint origin_pkey primary key);
>> ---
>>
>> The query generates correct plan, which performs only index scans:
>>
>> explain delete from origin where exists (select id from diff where
>> origin.id = diff.id);
>> QUERY PLAN
>> 
>> ---
>>  Delete on origin  (cost=0.30..105.56 rows=1850 width=12)
>>->  Merge Semi Join  (cost=0.30..105.56 rows=1850 width=12)
>>  Merge Cond: (origin.id = diff.id)
>>  ->  Index Scan using origin_pkey on origin  (cost=0.15..38.90
>> rows=1850 width=22)
>>  ->  Index Scan using diff_pkey on diff  (cost=0.15..38.90
>> rows=1850 width=22)
>> (5 rows)
>>
>> 2. Setup is:
>> 
>> create table origin (id uuid constraint origin_pkey primary key, data
>> jsonb);
>> create table diff (id uuid constraint diff_pkey primary key, data jsonb);
>> 
>>
>> The query generates plan with two seq scans:
>>
>> explain delete from origin where exists (select id from diff where
>> origin.id = diff.id);
>> QUERY PLAN
>> 
>> ---
>>  Delete on origin  (cost=34.08..69.49 rows=1070 width=12)
>>->  Hash Semi Join  (cost=34.08..69.49 rows=1070 width=12)
>>  Hash Cond: (origin.id = diff.id)
>>  ->  Seq Scan on origin  (cost=0.00..20.70 rows=1070 width=22)
>>  ->  Hash  (cost=20.70..20.70 rows=1070 width=22)
>>->  Seq Scan on diff  (cost=0.00..20.70 rows=1070 width=22)
>> (6 rows)
>>
>> 3. My real `origin` table has 26 fields and 800 billion rows, real `diff`
>> table has 12 million rows and the query generates plan with nested loop and
>> seq scan on `diff` table:
>>
>> explain delete from drug_refills origin where exists (select id from
>> drug_refills_diff diff where origin.id = diff.id);
>> QUERY PLAN
>> 
>> --
>>  Delete on drug_refills origin  (cost=0.57..22049570.11 rows=11975161
>> width=12)
>>->  Nested Loop  (cost=0.57..22049570.11 rows=11975161 width=12)
>>  ->  Seq Scan on drug_refills_diff diff  (cost=0.00..720405.61
>> rows=11975161 width=22)
>>  ->  Index Scan using drug_refills_pkey on drug_refills origin
>> (cost=0.57..1.77 rows=1 width=22)
>>Index Cond: (id = diff.id)
>> (5 rows)
>>
>> I have run ANALYZE on both tables, but it didn't help. Here are column
>> types in origin and diff (same schema), if that matters:
>>
>> uuid
>> timestamp with time zone
>> timestamp with time zone
>> character varying(255)
>> character varying(255)
>> character varying(1024)
>> numeric(10,4)
>> integer
>> numeric(14,8)
>> numeric(14,8)
>> numeric(14,8)
>> numeric(14,8)
>> numeric(14,8)
>> character varying(16)
>> character varying(16)
>> character varying(16)
>> character varying(16)
>> character varying(16)
>> character varying

Bad plan for ltree predicate <@

2017-12-01 Thread Roman Konoval

Hi,


I have a problem on 9.3.14 with a query that accesses table:

Size: (retrieved by query 
https://gist.github.com/romank0/74f9d1d807bd3f41c0729d0fc6126749)

 schemaname |relname|  size  | toast  | associated_idx_size |  total
+---+++-+-
 public | document_head | 275 MB | 630 MB | 439 MB  | 1345 MB


Definition:
 Table "public.document_head"
   Column|   Type   |  Modifiers
-+--+-
 snapshot_id | character varying(36)| not null
 id  | character varying(36)| not null
 base_type   | character varying(50)| not null
 is_cascade  | boolean  | not null default false
 parent_id   | character varying(36)|
 fileplan_node_id| character varying(36)|
 state   | character varying(10)| default 
'ACTIVE'::character varying
 title   | character varying(4096)  | not null
 properties  | text | not null
 properties_cache| hstore   | not null
 serial_number   | integer  | not null
 major_version   | integer  | not null
 minor_version   | integer  | not null
 version_description | text |
 sensitivity | integer  | not null default 10
 id_path | ltree|
 path_name   | character varying(4096)  | collate C not null
 ltx_id  | bigint   | not null
 created_by  | integer  | not null
 created_date| timestamp with time zone | not null
 modified_by | integer  | not null
 modified_date   | timestamp with time zone | not null
 responsible_user_ids| integer[]|
 origin_id   | character varying(36)|
 origin_snapshot_id  | character varying(36)|
 ssn | character varying(64)|
 record_physical_location| text |
 record_physical_location_id | text |
 record_created_date | timestamp with time zone |
 record_aggregated_date  | timestamp with time zone |
 record_last_review_comment  | text |
 record_last_review_date | timestamp with time zone |
 record_next_review_date | timestamp with time zone |
 record_originated_date  | timestamp with time zone |
 record_is_vital | boolean  | not null default false
 storage_plan_state  | text | not null default 
'New'::text
 cut_off_date| timestamp with time zone |
 dispose_date| timestamp with time zone |
 archive_date| timestamp with time zone |
Indexes:
"document_head__id__uniq_key" PRIMARY KEY, btree (id)
"document_head__parent_id__path_name__unq_idx" UNIQUE, btree (parent_id, 
path_name) WHERE state::text = 'ACTIVE'::text
"document_head__snapshot_id__unq" UNIQUE, btree (snapshot_id)
"document_head__base_type__idx" btree (base_type) WHERE state::text <> 
'DELETED'::text
"document_head__fileplan_node_id__idx" btree (fileplan_node_id)
"document_head__id__idx" btree (id) WHERE state::text <> 'DELETED'::text
"document_head__id_path__btree__idx" btree (id_path) WHERE state::text <> 
'DELETED'::text
"document_head__id_path__gist__idx" gist (id_path)
"document_head__ltx_id__idx" btree (ltx_id)
"document_head__origin_id__hotfix__idx" btree (origin_id) WHERE origin_id 
IS NOT NULL
"document_head__origin_id__idx" btree (origin_id) WHERE state::text <> 
'DELETED'::text AND origin_id IS NOT NULL
"document_head__parent_id__idx" btree (parent_id)
"document_head__properties_cache__contact_username_idx" btree 
((properties_cache -> 'person_meta_info.username'::text)) WHERE base_type::text 
= 'Contact'::text AND exist(properties_cache, 'person_meta_info.username'::text)
"document_head__properties_cache__emailmeta_message_id__idx" btree 
((properties_cache -> 'emailmeta.message_id'::text)) WHERE base_type::text = 
'File'::text AND exist(properties_cache, 'emailmeta.message_id'::text)
"document_head__properties_cache__idx" gist (properties_cache) WHERE 
state::text <> 'DELETED'::text
"document_head__properties_cache__project_identifier__idx" btree 
((properties_cache -> 'project.identifier'::text)) WHERE base_type::text = 
'Project'::text AND exist(properties_cache, 'project.identifier'::text)
"document_head__properties_cache__published_origin__idx" btree 
((properties_ca

Bitmap scan is undercosted?

2017-12-01 Thread Vitaliy Garnashevich


Hi,

We recently had an issue in production, where a bitmap scan was chosen 
instead of an index scan. Despite being 30x slower, the bitmap scan had 
about the same cost as the index scan.


I've found some cases where similar issues with bitmap scans were 
reported before:


https://www.postgresql.org/message-id/flat/1456154321.976561.528310154.6A623C0E%40webmail.messagingengine.com

https://www.postgresql.org/message-id/flat/CA%2BwPC0MRMhF_8fD9dc8%2BQWZQzUvHahPRSv%3DxMtCmsVLSsy-p0w%40mail.gmail.com

I've made a synthetic test, which kind of reproduces the issue:

shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB

set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;

drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from 
generate_series(1, 1000) id;

create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint, 
(reltuples/relpages)::bigint tpp from pg_class where relname 
in('aaa','i1','i2') order by relname;

"aaa";985;44248;226
"i1";985;27422;365
"i2";985;27422;365

I've been running the same query while enabling and disabling different 
kinds of scans:


1) set enable_bitmapscan = on;  set enable_indexscan = off; set 
enable_seqscan = off;
2) set enable_bitmapscan = off; set enable_indexscan = on;  set 
enable_seqscan = off;
3) set enable_bitmapscan = off; set enable_indexscan = off; set 
enable_seqscan = on;


The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Here are the results for PostgreSQL 9.6 (for 9.3 and 10.1 the results 
are very similar):


1) Aggregate  (cost=24821.70..24821.71 rows=1 width=8) (actual 
time=184.591..184.591 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=47259
  ->  Bitmap Heap Scan on public.aaa  (cost=13038.21..24796.22 
rows=10189 width=0) (actual time=122.492..178.006 rows=10 loops=1)

    Output: num, flag
    Recheck Cond: (aaa.num = 1)
    Filter: aaa.flag
    Heap Blocks: exact=44248
    Buffers: shared hit=47259
    ->  BitmapAnd  (cost=13038.21..13038.21 rows=10189 width=0) 
(actual time=110.699..110.699 rows=0 loops=1)

  Buffers: shared hit=3011
  ->  Bitmap Index Scan on i1  (cost=0.00..1158.94 
rows=99667 width=0) (actual time=19.600..19.600 rows=10 loops=1)

    Index Cond: (aaa.num = 1)
    Buffers: shared hit=276
  ->  Bitmap Index Scan on i2  (cost=0.00..11873.92 
rows=1022332 width=0) (actual time=81.676..81.676 rows=100 loops=1)

    Index Cond: (aaa.flag = true)
    Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 184.988 ms

2) Aggregate  (cost=67939.09..67939.10 rows=1 width=8) (actual 
time=67.510..67.510 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=44524
  ->  Index Scan using i1 on public.aaa  (cost=0.44..67910.95 
rows=11256 width=0) (actual time=0.020..61.180 rows=10 loops=1)

    Output: num, flag
    Index Cond: (aaa.num = 1)
    Filter: aaa.flag
    Buffers: shared hit=44524
Planning time: 0.096 ms
Execution time: 67.543 ms

3) Aggregate  (cost=169276.49..169276.50 rows=1 width=8) (actual 
time=977.063..977.063 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=44248
  ->  Seq Scan on public.aaa  (cost=0.00..169248.35 rows=11256 width=0) 
(actual time=0.018..969.294 rows=10 loops=1)

    Output: num, flag
    Filter: (aaa.flag AND (aaa.num = 1))
    Rows Removed by Filter: 990
    Buffers: shared hit=44248
Planning time: 0.099 ms
Execution time: 977.094 ms


The bitmap scan version runs more than twice slower than the one with 
index scan, while being costed at more than twice cheaper.


I've tried to increase cpu_tuple_cost and cpu_index_tuple_cost, and this 
behavior remains after 6x increase in values. Although the difference in 
costs becomes much less. After increasing the settings more than 6x, 
PostgreSQL decides to use a different plan for bitmap scans, so it's 
hard to make conclusions at that point.


Could such cases be fixed with tuning of cost settings, or that's just 
how PostgreSQL estimates bitmap scans and this can't be fixed without 
modifying the optimizer? Or am I missing something and that's the 
expected behavior? Thoughts?


Regards,
Vitaliy

Re: Bitmap scan is undercosted?

2017-12-01 Thread Justin Pryzby

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> We recently had an issue in production, where a bitmap scan was chosen
> instead of an index scan. Despite being 30x slower, the bitmap scan had
> about the same cost as the index scan.

Me too, see also:
https://www.postgresql.org/message-id/flat/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA%40mail.gmail.com#cah2-wzkrtggiy_lkquu-oviyp6y_hhz-a1ywacpy4tcywv1...@mail.gmail.com

> drop table if exists aaa;
> create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
> generate_series(1, 1000) id;
> create index i1 on aaa  (num);
> create index i2 on aaa  (flag);
> analyze aaa;
> 
> select relname, reltuples::bigint, relpages::bigint,
> (reltuples/relpages)::bigint tpp from pg_class where relname
> in('aaa','i1','i2') order by relname;
> "aaa";985;44248;226
> "i1";985;27422;365
> "i2";985;27422;365
> 
> The query was:
> explain (analyze,verbose,costs,buffers)
> select count(*) from aaa where num = 1 and flag = true;

Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x.  It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.

The reason why it's more than a bit slower is due to the "density" [0] of the
heap pages read.  num=1 is more selective than flag=true, so it scans i1,
reading 1% of the whole table.  But it's not reading the first 1% or 
some other 1% of the table, it reads tuples evenly distributed across the
entire table (226*0.01 = ~2 rows of each page).  Since the index was created
after the INSERT, the repeated keys (logical value: id%100) are read in
physical order on the heap, so this is basically doing a seq scan, but with the
additional overhead of reading the index, and maybe doing an fseek() before
each/some read()s, too.  You could confirm that by connecting strace to the
backend before starting the query.

Since you did that using % and with indices added after the INSERT, you can't
improve it by reindexing (as I was able to for my case).  That's an elegant
test case, so thanks.

I think shared_buffers=512MB is just small enough for this test to be painful
for 1e7 rows.  I see the table+index is 559MB.

I don't know if that's really similar to your production use case, but I would
recommend trying BRIN indices, which always require a bitmap scan.  Note that
some things (like max()) that can use an btree index cannot use brin.  PG10.1
has WITH autosummarize, which was important for our use, since we rarely do
UPDATEs or DELETEs so tables are rarely vacuumed (only analyzed).

Justin

[0] I'm borrowing Jeff's language from here:
https://www.postgresql.org/message-id/CAMkU%3D1xwGn%2BO0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q%40mail.gmail.com
"density" wasn't our problem, but it's a perfect description of this issue.

Re: Bad plan for ltree predicate <@

2017-12-01 Thread Tom Lane

Roman Konoval  writes:
> I have a problem on 9.3.14 with a query that accesses table:

I think the root of the problem is your intermediate function:

> CREATE OR REPLACE FUNCTION public.get_doc_path(document_id character varying)
>  RETURNS ltree
>  LANGUAGE plpgsql
>  STABLE
> AS $function$
> DECLARE
> path ltree;
> BEGIN
> select id_path into path from document_head where id = document_id;
> RETURN path;
> END $function$

This is quite expensive, as it involves another table search, but the
planner doesn't know that since you've not marked it as having higher than
normal cost.  The seqscan formulation of the query results in evaluating
this function afresh at most of the rows, whereas shoving it into an
uncorrelated sub-select causes it to be evaluated only once.  That, I
think, and not the seqscan-vs-indexscan aspect, is what makes the bitmap
formulation go faster.  Certainly you'd not expect that a bitmap scan that
has to hit most of the rows anyway is going to win over a seqscan.

The fact that the planner goes for a bitmap scan in the second formulation
is an artifact of the fact that it doesn't try to pre-evaluate sub-selects
for selectivity estimation purposes, so you end up with a default estimate
that says that the <@ condition only selects a small fraction of the rows.
Not sure if we should try to change that or not.

I'd suggest setting the function's cost to 1000 or so and seeing if that
doesn't improve matters.

(BTW, what tipped me off to this was that the "buffers hit" count for
the seqscan node was so high, several times more than the actual size
of the table.  I couldn't account for that until I realized that the
function itself would be adding a few buffer hits per execution.)

regards, tom lane

Re: Bitmap scan is undercosted?

2017-12-01 Thread Vitaliy Garnashevich


On 01/12/2017 20:34, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.

Me too, see also:
https://www.postgresql.org/message-id/flat/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA%40mail.gmail.com#cah2-wzkrtggiy_lkquu-oviyp6y_hhz-a1ywacpy4tcywv1...@mail.gmail.com


drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 1000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint,
(reltuples/relpages)::bigint tpp from pg_class where relname
in('aaa','i1','i2') order by relname;
"aaa";985;44248;226
"i1";985;27422;365
"i2";985;27422;365

The query was:
explain (analyze,verbose,costs,buffers)
select count(*) from aaa where num = 1 and flag = true;

Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x.  It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.
I don't think the planner is that smart to account for correlation 
between values in different columns. When different values are used in 
filter (num=2, num=39, num=74), the query actually runs faster, while 
still being about twice slower than using an index scan. But the cost 
does not change much. It jumps up and down for different values, but 
it's still close to the initial value.


Aggregate  (cost=24239.02..24239.03 rows=1 width=8) (actual 
time=105.239..105.239 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=3011
  ->  Bitmap Heap Scan on public.aaa  (cost=12812.05..24214.48 
rows=9816 width=0) (actual time=105.236..105.236 rows=0 loops=1)

    Output: num, flag
    Recheck Cond: (aaa.num = 39)
    Filter: aaa.flag
    Buffers: shared hit=3011
    ->  BitmapAnd  (cost=12812.05..12812.05 rows=9816 width=0) 
(actual time=105.157..105.157 rows=0 loops=1)

  Buffers: shared hit=3011
  ->  Bitmap Index Scan on i1  (cost=0.00..1134.94 
rows=97667 width=0) (actual time=15.725..15.725 rows=10 loops=1)

    Index Cond: (aaa.num = 39)
    Buffers: shared hit=276
  ->  Bitmap Index Scan on i2  (cost=0.00..11671.96 
rows=1005003 width=0) (actual time=77.920..77.920 rows=100 loops=1)

    Index Cond: (aaa.flag = true)
    Buffers: shared hit=2735
Planning time: 0.104 ms
Execution time: 105.553 ms

Aggregate  (cost=65785.99..65786.00 rows=1 width=8) (actual 
time=48.587..48.587 rows=1 loops=1)

  Output: count(*)
  Buffers: shared hit=44524
  ->  Index Scan using i1 on public.aaa  (cost=0.44..65761.45 rows=9816 
width=0) (actual time=48.583..48.583 rows=0 loops=1)

    Output: num, flag
    Index Cond: (aaa.num = 39)
    Filter: aaa.flag
    Rows Removed by Filter: 10
    Buffers: shared hit=44524
Planning time: 0.097 ms
Execution time: 48.620 ms



The reason why it's more than a bit slower is due to the "density" [0] of the
heap pages read.  num=1 is more selective than flag=true, so it scans i1,
reading 1% of the whole table.  But it's not reading the first 1% or
some other 1% of the table, it reads tuples evenly distributed across the
entire table (226*0.01 = ~2 rows of each page).  Since the index was created
after the INSERT, the repeated keys (logical value: id%100) are read in
physical order on the heap, so this is basically doing a seq scan, but with the
additional overhead of reading the index, and maybe doing an fseek() before
each/some read()s, too.  You could confirm that by connecting strace to the
backend before starting the query.

Since you did that using % and with indices added after the INSERT, you can't
improve it by reindexing (as I was able to for my case).  That's an elegant
test case, so thanks.

I think shared_buffers=512MB is just small enough for this test to be painful
for 1e7 rows.  I see the table+index is 559MB.
   table   | ~count  |    size    |   toast |  idx   | 
size + toast + idx

---+-++++
 aaa   | 994 | 346 MB | 0 bytes    | 428 MB 
| 774 MB


But the plan says all buffers are "shared hit", and none "read", so 
that's probably not an issue.




I don't know if that's really similar to your production use case, but I would
recommend trying BRIN indices, which always require a bitmap scan.  Note that
some things (like max()) that can use an btree index cannot use brin.  PG10.1
has WITH autosummarize, which was important for our use, since we rarely do
UPDATEs or DELETEs so tables are rarely va

Re: Bitmap scan is undercosted?

2017-12-01 Thread Justin Pryzby

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
> On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> > We recently had an issue in production, where a bitmap scan was chosen
> > instead of an index scan. Despite being 30x slower, the bitmap scan had
> > about the same cost as the index scan.
> 
> > drop table if exists aaa;
> > create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
> > generate_series(1, 1000) id;
> > create index i1 on aaa  (num);
> > create index i2 on aaa  (flag);
> > analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

Note:
postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND 
attname='num';
correlation | 0.00710112

..so this is different from the issue corrected by the patch I created while
testing.

> Note that id%100==1 implies flag='t', so the planner anticipates retrieving
> fewer rows than it will ultimately read, probably by 2x.  It makes sense that
> causes the index scan to be more expensive than expected, but that's only
> somewhat important, since there's no joins involved.

I changed the query from COUNT(*) TO * for easier to read explain:

CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM 
generate_series(1, 1000) id;
CREATE INDEX i1 ON aaa(num);
CREATE INDEX i2 ON aaa (flag);
ANALYZE VERBOSE aaa;
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND 
flag=true;
 Bitmap Heap Scan on public.aaa  (cost=20652.98..45751.75 rows=10754 width=5) 
(actual time=85.314..185.107 rows=10 loops=1)
   ->  BitmapAnd  (cost=20652.98..20652.98 rows=10754 width=0) (actual 
time=163.220..163.220 rows=0 loops=1)
 ->  Bitmap Index Scan on i1  (cost=0.00..1965.93 rows=106333 width=0) 
(actual time=26.943..26.943 rows=10 loops=1)
 ->  Bitmap Index Scan on i2  (cost=0.00..18681.42 rows=1011332 
width=0) (actual time=133.804..133.804 rows=100 loops=1)

..which is what's wanted with no planner hints (PG10.1 here).

Same on PG95:
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE 
num=1 AND flag=true;
 Bitmap Heap Scan on public.aaa  (cost=19755.64..43640.32 rows=9979 width=5) 
(actual time=230.017..336.583 rows=10 loops=1)
   ->  BitmapAnd  (cost=19755.64..19755.64 rows=9979 width=0) (actual 
time=205.242..205.242 rows=0 loops=1)
 ->  Bitmap Index Scan on i1  (cost=0.00..1911.44 rows=103334 width=0) 
(actual time=24.911..24.911 rows=10 loops=1)
 ->  Bitmap Index Scan on i2  (cost=0.00..17838.96 rows=965670 width=0) 
(actual time=154.237..154.237 rows=100 loops=1)

The rowcount is off, but not a critical issue without a join.

Justin

Re: Bitmap scan is undercosted?

2017-12-01 Thread Vitaliy Garnashevich


On 02/12/2017 01:11, Justin Pryzby wrote:

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:

We recently had an issue in production, where a bitmap scan was chosen
instead of an index scan. Despite being 30x slower, the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
generate_series(1, 1000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For this test I'm using SSD and Windows (if that matters). On production 
we also use SSD, hence lower random_page_cost. But with the default 
random_page_cost=4.0, the difference in cost between the index scan plan 
and the bitmap scan plan is even bigger.


Note:
postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND 
attname='num';
correlation | 0.00710112

..so this is different from the issue corrected by the patch I created while
testing.


Note that id%100==1 implies flag='t', so the planner anticipates retrieving
fewer rows than it will ultimately read, probably by 2x.  It makes sense that
causes the index scan to be more expensive than expected, but that's only
somewhat important, since there's no joins involved.

I changed the query from COUNT(*) TO * for easier to read explain:

CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM 
generate_series(1, 1000) id;
CREATE INDEX i1 ON aaa(num);
CREATE INDEX i2 ON aaa (flag);
ANALYZE VERBOSE aaa;
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND 
flag=true;
  Bitmap Heap Scan on public.aaa  (cost=20652.98..45751.75 rows=10754 width=5) 
(actual time=85.314..185.107 rows=10 loops=1)
->  BitmapAnd  (cost=20652.98..20652.98 rows=10754 width=0) (actual 
time=163.220..163.220 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..1965.93 rows=106333 width=0) 
(actual time=26.943..26.943 rows=10 loops=1)
  ->  Bitmap Index Scan on i2  (cost=0.00..18681.42 rows=1011332 
width=0) (actual time=133.804..133.804 rows=100 loops=1)

..which is what's wanted with no planner hints (PG10.1 here).
Yes, that's what you get without planner hints, but it's strange to get 
this plan, when there is another one, which runs 2-3 times faster, but 
happens to be estimated to be twice more costly than the one with bitmap 
scans:


# set enable_bitmapscan = off; set enable_indexscan = on;  set 
enable_seqscan = off;

# explain analyze select * from aaa where num = 1 and flag = true;
Index Scan using i1 on aaa  (cost=0.44..66369.81 rows=10428 width=5) 
(actual time=0.020..57.765 rows=10 loops=1)


vs.

# set enable_bitmapscan = on;  set enable_indexscan = off; set 
enable_seqscan = off;

# explain analyze select * from aaa where num = 1 and flag = true;
Bitmap Heap Scan on aaa  (cost=13099.33..25081.40 rows=10428 width=5) 
(actual time=122.137..182.811 rows=10 loops=1)
  ->  BitmapAnd  (cost=13099.33..13099.33 rows=10428 width=0) (actual 
time=110.168..110.168 rows=0 loops=1)
    ->  Bitmap Index Scan on i1  (cost=0.00..1181.44 rows=101667 
width=0) (actual time=20.845..20.845 rows=10 loops=1)
    ->  Bitmap Index Scan on i2  (cost=0.00..11912.43 rows=1025666 
width=0) (actual time=80.323..80.323 rows=100 loops=1)




Same on PG95:
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE 
num=1 AND flag=true;
  Bitmap Heap Scan on public.aaa  (cost=19755.64..43640.32 rows=9979 width=5) 
(actual time=230.017..336.583 rows=10 loops=1)
->  BitmapAnd  (cost=19755.64..19755.64 rows=9979 width=0) (actual 
time=205.242..205.242 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..1911.44 rows=103334 width=0) 
(actual time=24.911..24.911 rows=10 loops=1)
  ->  Bitmap Index Scan on i2  (cost=0.00..17838.96 rows=965670 
width=0) (actual time=154.237..154.237 rows=100 loops=1)

The rowcount is off, but not a critical issue without a join.

Justin

Re: Bitmap scan is undercosted?

2017-12-01 Thread Justin Pryzby

On Fri, Dec 01, 2017 at 05:11:04PM -0600, Justin Pryzby wrote:
> I tried to reproduce this issue and couldn't, under PG95 and 10.1:

I'm embarassed to say that I mis-read your message, despite you're amply clear
subject.  You're getting a bitmap scan but you'd prefer to get an index scan.
I anticipated the opposite problem (which is what I've had issues with myself).

> On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
> > On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> > > We recently had an issue in production, where a bitmap scan was chosen
> > > instead of an index scan. Despite being 30x slower, the bitmap scan had
> > > about the same cost as the index scan.
> 
> Note:
> postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND 
> attname='num';
> correlation | 0.00710112
> 
> ..so this is different from the issue corrected by the patch I created while
> testing.

Actually, that the table is "not correlated" on "num" column is maybe the
primary reason why PG avoids using an index scan.  It (more or less correctly)
deduces that it's going to have to "read" a large fraction of the pages (even
if only to process a small fraction of the rows), which is costly, except it's
all cached..  In your case, that overly-penalizes the index scan.

This is cost_index() and cost_bitmap_heap_scan() in costsize.c.  Since the
index is uncorrelated, it's returning something close to max_IO_cost.  It looks
like effective_cache_size only affects index_pages_fetched().

I'm going to try to dig some more into it.  Maybe there's evidence to
re-evaluate one of these:

cost_index()
| run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
or
cost_bitmap_heap_scan()
| cost_per_page = spc_random_page_cost - 
|   (spc_random_page_cost - spc_seq_page_cost)
|   * sqrt(pages_fetched / T);

Justin

Re: Bad plan for ltree predicate <@

2017-12-01 Thread Roman Konoval

Hi Tom,

Thanks for your help.

> On Dec 1, 2017, at 22:33, Tom Lane  wrote:
> 
> 
>  The seqscan formulation of the query results in evaluating
> this function afresh at most of the rows

The function is defined as STABLE. I though that means that there is no need
 to reevaluate it on every row as input parameter is the same for every row and
 return value will be the same during the same query execution. Do I understand
 incorrectly what STABLE means?
Why is the function evaluated more than once?

> , whereas shoving it into an
> uncorrelated sub-select causes it to be evaluated only once.  That, I
> think, and not the seqscan-vs-indexscan aspect, is what makes the bitmap
> formulation go faster.  Certainly you'd not expect that a bitmap scan that
> has to hit most of the rows anyway is going to win over a seqscan.
> 
> The fact that the planner goes for a bitmap scan in the second formulation
> is an artifact of the fact that it doesn't try to pre-evaluate sub-selects
> for selectivity estimation purposes, so you end up with a default estimate
> that says that the <@ condition only selects a small fraction of the rows.
> Not sure if we should try to change that or not.
> 
> I'd suggest setting the function's cost to 1000 or so and seeing if that
> doesn't improve matters.
> 

If I set function cost to 1000 I get slightly better plan but still 3.5 more 
buffers are read when compared to bitmap scan which as you wrote one would 
expect to be slower than seq scan.
Here is the plan:

 QUERY PLAN
-
 Aggregate  (cost=216438.81..216438.82 rows=1 width=0) (actual 
time=1262.244..1262.245 rows=1 loops=1)
   Buffers: shared hit=169215
   CTE trees
 ->  Index Scan using document_head__id_path__gist__idx on document_head d  
(cost=2.91..212787.85 rows=162265 width=74) (actual time=0.115..727.119 
rows=154854 loops=1)
   Index Cond: (id_path <@ 
get_doc_path('78157c60-45bc-42c1-9aad-c5651995db5c'::character varying))
   Filter: (((id)::text <> 
'78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND ((state)::text <> 
'DELETED'::text))
   Rows Removed by Filter: 23
   Buffers: shared hit=169215
   ->  CTE Scan on trees  (cost=0.00..3245.30 rows=162265 width=0) (actual 
time=0.119..1118.899 rows=154854 loops=1)
 Buffers: shared hit=169215
 Total runtime: 1277.010 ms
(11 rows)

My understanding is that the optimal plan in this case should read less data 
than bitmap scan by the amount of buffers hit by bitmap index scan. 
It should read roughly all buffers of the table itself. Something like the 
query with predicate using ltree literal instead of function invocation:

explain (analyze, buffers)
with trees AS (
SELECT d.id, d.snapshot_id , NULL :: text[] AS permissions
  FROM document_head AS d
  WHERE (d.id_path <@ 
'869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree
 AND d.id != '78157c60-45bc-42c1-9aad-c5651995db5c') AND d.state != 'DELETED'
)
SELECT COUNT(*) FROM trees;

 QUERY PLAN
-
 Aggregate  (cost=42114.02..42114.03 rows=1 width=0) (actual 
time=997.427..997.427 rows=1 loops=1)
   Buffers: shared hit=35230
   CTE trees
 ->  Seq Scan on document_head d  (cost=0.00..38463.06 rows=162265 
width=74) (actual time=0.013..593.082 rows=154854 loops=1)
   Filter: ((id_path <@ 
'869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree)
 AND ((id)::text <> '78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND 
((state)::text <> 'DELETED'::text))
   Rows Removed by Filter: 23357
   Buffers: shared hit=35230
   ->  CTE Scan on trees  (cost=0.00..3245.30 rows=162265 width=0) (actual 
time=0.017..888.076 rows=154854 loops=1)
 Buffers: shared hit=35230
 Total runtime: 1011.565 ms
(10 rows)

The question is if it possible to get plan like that using function or some 
other way to get ltree value for given document_head.id value in one query?

As an alternative I can get ltree value with the separate query but this would 
require
1. a round-trip to postgres
2. me to change isolation level to REPEATABLE READ to make sure that I get 
consistent result 
so I would like to avoid that.

Regards,
Roman Konoval

Re: Bitmap scan is undercosted?

2017-12-01 Thread Jeff Janes

On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich <
[email protected]> wrote:

> On 02/12/2017 01:11, Justin Pryzby wrote:
>
>> I tried to reproduce this issue and couldn't, under PG95 and 10.1:
>>
>> On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
>>
>>> On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
>>>
 We recently had an issue in production, where a bitmap scan was chosen
 instead of an index scan. Despite being 30x slower, the bitmap scan had
 about the same cost as the index scan.
 drop table if exists aaa;
 create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
 generate_series(1, 1000) id;
 create index i1 on aaa  (num);
 create index i2 on aaa  (flag);
 analyze aaa;

>>> What is:
>> effective_io_concurrency
>> max_parallel_workers_per_gather (I gather you don't have this)
>>
> effective_io_concurrency = 0
> max_parallel_workers_per_gather = 0
>
> Did you notice random_page_cost = 1.5 ?
>

For the aaa.num = 39 case, the faster index scan actually does hit 15 times
more buffers than the bitmap scan does.  While 1.5 is lot lower than 4.0,
it is still much higher than the true cost of reading a page from the
buffer cache.   This why the index scan is getting punished.  You could
lower random_page_cost and  seq_page_cost to 0, to remove those
considerations.  (I'm not saying you should do this on your production
system, but rather you should do it as a way to investigate the issue.  But
it might make sense on production as well)

> For this test I'm using SSD and Windows (if that matters). On production
> we also use SSD, hence lower random_page_cost. But with the default
> random_page_cost=4.0, the difference in cost between the index scan plan
> and the bitmap scan plan is even bigger.

Since it is all shared buffers hits, it doesn't matter if you have SSD for
this particular test case.

Cheers,

Jeff

Re: Bitmap scan is undercosted?

2017-12-01 Thread Justin Pryzby

On Sat, Dec 02, 2017 at 01:54:09AM +0200, Vitaliy Garnashevich wrote:
> On 02/12/2017 01:11, Justin Pryzby wrote:
> >..which is what's wanted with no planner hints (PG10.1 here).
> Yes, that's what you get without planner hints, but it's strange to get this
> plan, when there is another one, which runs 2-3 times faster, but happens to
> be estimated to be twice more costly than the one with bitmap scans:
> 
> # set enable_bitmapscan = off; set enable_indexscan = on;  set enable_seqscan 
> = off;
> # explain analyze select * from aaa where num = 1 and flag = true;
> Index Scan using i1 on aaa  (cost=0.44..66369.81 rows=10428 width=5) (actual 
> time=0.020..57.765 rows=10 loops=1)
> 
> vs.
> 
> # set enable_bitmapscan = on;  set enable_indexscan = off; set enable_seqscan 
> = off;
> # explain analyze select * from aaa where num = 1 and flag = true;
> Bitmap Heap Scan on aaa  (cost=13099.33..25081.40 rows=10428 width=5) (actual 
> time=122.137..182.811 rows=10 loops=1)

I was able to get an index plan with:

SET random_page_cost=1; SET cpu_index_tuple_cost=.04; -- default: 0.005; see 
selfuncs.c
postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE 
num=1 AND flag=true; 
 Index Scan using i1 on public.aaa  (cost=0.43..50120.71 rows=10754 width=5) 
(actual time=0.040..149.580 rows=10 loops=1)

Or with:
SET random_page_cost=1; SET cpu_operator_cost=0.03; -- default: 0.0025 see 
cost_bitmap_tree_node()
EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag= 
true;  
 Index Scan using i1 on public.aaa  (cost=5.22..49328.00 rows=10754 width=5) 
(actual time=0.051..109.082 rows=10 loops=1)

Or a combination trying to minimize the cost of the index scan:
postgres=# SET random_page_cost=1; SET cpu_index_tuple_cost=.0017; SET 
cpu_operator_cost=0.03; EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM 
aaa WHERE num=1 AND flag= true;  
 Index Scan using i1 on public.aaa  (cost=5.22..48977.10 rows=10754 width=5) 
(actual time=0.032..86.883 rows=10 loops=1)

Not sure if that's reasonable, but maybe it helps to understand.

Justin

Re: Bitmap scan is undercosted?

2017-12-01 Thread Vitaliy Garnashevich


On 02/12/2017 07:51, Jeff Janes wrote:
On Fri, Dec 1, 2017 at 3:54 PM, Vitaliy Garnashevich 
mailto:[email protected]>> wrote:


On 02/12/2017 01:11, Justin Pryzby wrote:

I tried to reproduce this issue and couldn't, under PG95 and 10.1:

On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:

On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy
Garnashevich wrote:

We recently had an issue in production, where a bitmap
scan was chosen
instead of an index scan. Despite being 30x slower,
the bitmap scan had
about the same cost as the index scan.
drop table if exists aaa;
create table aaa as select (id%100)::int num,
(id%10=1)::bool flag from
generate_series(1, 1000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

What is:
effective_io_concurrency
max_parallel_workers_per_gather (I gather you don't have this)

effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?


For the aaa.num = 39 case, the faster index scan actually does hit 15 
times more buffers than the bitmap scan does.  While 1.5 is lot lower 
than 4.0, it is still much higher than the true cost of reading a page 
from the buffer cache.   This why the index scan is getting punished.  
You could lower random_page_cost and seq_page_cost to 0, to remove 
those considerations.  (I'm not saying you should do this on your 
production system, but rather you should do it as a way to investigate 
the issue.  But it might make sense on production as well)

seq_page_cost = 1.0
random_page_cost = 1.0*
*explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=11536.74..20856.96 rows=10257 width=5) 
(actual time=108.338..108.338 rows=0 loops=1)
  ->  BitmapAnd  (cost=11536.74..11536.74 rows=10257 width=0) (actual 
time=108.226..108.226 rows=0 loops=1)
    ->  Bitmap Index Scan on i1  (cost=0.00..1025.43 rows=10 
width=0) (actual time=18.563..18.563 rows=10 loops=1)
    ->  Bitmap Index Scan on i2  (cost=0.00..10505.93 rows=1025666 
width=0) (actual time=78.493..78.493 rows=100 loops=1)


Index Scan using i1 on aaa  (cost=0.44..44663.58 rows=10257 width=5) 
(actual time=51.264..51.264 rows=0 loops=1)


Here I've used the filter num = 2, which produces rows=0 at BitmapAnd, 
and thus avoids a lot of work at "Bitmap Heap Scan" node, while still 
leaving about the same proportion in bitmap vs index - the bitmap is 
twice slower but twice less costly. It does not matter much which value 
to use for the filter, if it's other than num = 1.



seq_page_cost = 0.0
random_page_cost = 0.0
explain analyze select * from aaa where num = 2 and flag = true;

Bitmap Heap Scan on aaa  (cost=753.00..2003.00 rows=10257 width=5) 
(actual time=82.212..82.212 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..750.43 rows=10 width=0) 
(actual time=17.401..17.401 rows=10 loops=1)


Index Scan using i1 on aaa  (cost=0.44..1750.43 rows=10257 width=5) 
(actual time=49.766..49.766 rows=0 loops=1)


The bitmap plan was reduced to use only one bitmap scan, and finally it 
costs more than the index plan. But I doubt that the settings 
seq_page_cost = random_page_cost = 0.0 should actually be used. Probably 
it should be instead something like 1.0/1.0 or 1.0/1.1, but other costs 
increased, to have more weight.



# x4 tuple/operator costs - bitmap scan still a bit cheaper
set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.04;
set cpu_index_tuple_cost = 0.02;
set cpu_operator_cost = 0.01;

Bitmap Heap Scan on aaa  (cost=36882.97..46587.82 rows=10257 width=5) 
(actual time=106.045..106.045 rows=0 loops=1)
  ->  BitmapAnd  (cost=36882.97..36882.97 rows=10257 width=0) (actual 
time=105.966..105.966 rows=0 loops=1)
    ->  Bitmap Index Scan on i1  (cost=0.00..3276.74 rows=10 
width=0) (actual time=15.977..15.977 rows=10 loops=1)
    ->  Bitmap Index Scan on i2  (cost=0.00..33584.72 rows=1025666 
width=0) (actual time=79.208..79.208 rows=100 loops=1)


Index Scan using i1 on aaa  (cost=1.74..49914.89 rows=10257 width=5) 
(actual time=50.144..50.144 rows=0 loops=1)



# x5 tuple/operator costs - switched to single bitmap index scan, but 
now it costs more than the index scan

set seq_page_cost = 1.0;
set random_page_cost = 1.0;
set cpu_tuple_cost = 0.05;
set cpu_index_tuple_cost = 0.025;
set cpu_operator_cost = 0.0125;

Bitmap Heap Scan on aaa  (cost=4040.00..54538.00 rows=10257 width=5) 
(actual time=82.338..82.338 rows=0 loops=1)
  ->  Bitmap Index Scan on i1  (cost=0.00..4027.18 rows=10 width=0) 
(actual time=19.541..19.541 rows=10 loops=1)


Index Scan using i1 on aaa  (cost=2.17..51665.32 rows=10257 wi

Delete tables difference involves seq scan

Re: Delete tables difference involves seq scan

Re: Delete tables difference involves seq scan

Bad plan for ltree predicate <@

Bitmap scan is undercosted?

Re: Bitmap scan is undercosted?

Re: Bad plan for ltree predicate <@

Re: Bitmap scan is undercosted?

Re: Bitmap scan is undercosted?

Re: Bitmap scan is undercosted?

Re: Bitmap scan is undercosted?

Re: Bad plan for ltree predicate <@

Re: Bitmap scan is undercosted?

Re: Bitmap scan is undercosted?

Re: Bitmap scan is undercosted?

15 matches

Site Navigation

Mail list logo

Footer information