Simple Query Elapsed Time ~ 3 hrs Using Bitmap Index/Heap Scan

2018-06-05 Thread Fred Habash
Trying to optimize the Elapsed Time (ET) of this query. Currently, it is 
hovering around 3 hrs.


Running a 'vaccum analyse' had no effect on ET. Even forcing an 
'indexonly' scan by disabling 'enable_seqscan', still around the 3 hrs.


The table is around 4.6B rows,

 explain select cit_id, cl_value from reflink.citation_locators where 
cl_value = '1507617681' and vclf_number = 1 ;

   QUERY PLAN
-
 Bitmap Heap Scan on citation_locators (cost=5066559.01..50999084.79 
rows=133 width=23)

   Recheck Cond: (vclf_number = 1)
   Filter: (cl_value = '1507617681'::text)
   ->  Bitmap Index Scan on cl_indx_fk02 (cost=0.00..5066558.97 
rows=493984719 width=0)

 Index Cond: (vclf_number = 1)
(5 rows)

reflink.citation_locators
Table"reflink.citation_locators"
  Column  |   Type   | Modifiers | Storage  | Stats 
target | Description
--+--+---+--+--+-
 cl_id| bigint   | notnull   | plain|   
   |
 cl_value | text | notnull   | extended |   
   |
 vclf_number  | integer  | notnull   | plain|   
   |
 cit_id   | bigint   | notnull   | plain|   
   |
 cl_date_created  | timestamp with time zone | notnull   | plain|   
   |
 cl_date_modified | timestamp with time zone |   | plain|   
   |
Indexes:
"cl_pk"  PRIMARY KEY, btree (cl_id)
"cl_cnst_uk01"  UNIQUE CONSTRAINT, btree (cit_id, vclf_number, cl_value)
"cl_indx_fk01"  btree (cit_id)
"cl_indx_fk02"  btree (vclf_number)
Foreign-key constraints:
"cl_cnst_fk01"  FOREIGN KEY (cit_id) REFERENCES citations(cit_id) NOT 
VALID"cl_cnst_fk02"  FOREIGN KEY (vclf_number) REFERENCES 
valid_cit_locator_fields(vclf_number)



Re: Simple Query Elapsed Time ~ 3 hrs Using Bitmap Index/Heap Scan

2018-06-05 Thread Sergei Kornilov
Hello
Try using index btree(vclf_number, cl_value) instead of btree (vclf_number).

regards, Sergei



Re: Simple Query Elapsed Time ~ 3 hrs Using Bitmap Index/Heap Scan

2018-06-05 Thread Tom Lane
Fred Habash  writes:
> Indexes:
>  "cl_pk"  PRIMARY KEY, btree (cl_id)
>  "cl_cnst_uk01"  UNIQUE CONSTRAINT, btree (cit_id, vclf_number, cl_value)
>  "cl_indx_fk01"  btree (cit_id)
>  "cl_indx_fk02"  btree (vclf_number)

This is pretty inefficient index design.  Your query is slow because the
only selective condition it has is on cl_value, but you have no index
that can be searched with cl_value as the leading condition.  Moreover,
you have two indexes that can be searched with cit_id as the leading
condition, which is just wasteful.  I'd try reorganizing the cl_cnst_uk01
index as (cl_value, vclf_number, cit_id) so that it can serve for
searches on cl_value, while still enforcing the same uniqueness condition.
This particular column ordering would also let your query use the
vclf_number constraint as a secondary search condition, which would
help even more.

There's relevant advice about index design in the manual,

https://www.postgresql.org/docs/current/static/indexes.html

(see 11.3 and 11.5 particularly)

regards, tom lane



Re: Simple Query Elapsed Time ~ 3 hrs Using Bitmap Index/Heap Scan

2018-06-05 Thread Matthew Hall
Probably the cardinality of "vclf_number" is really bad. So the scan on that 
index is returning many million or billion rows and then you get a recheck 
which takes semi-forever. So you need an index on cl_value or both vclf_number 
and cl_value. If you know some properties of the values actually stored inside 
of those that will help. 

Matthew Hall

> On Jun 5, 2018, at 7:17 AM, Fred Habash  wrote:
> 
> Trying to optimize the Elapsed Time (ET) of this query. Currently, it is 
> hovering around 3 hrs.
> 
> Running a 'vaccum analyse' had no effect on ET. Even forcing an 'indexonly' 
> scan by disabling 'enable_seqscan', still around the 3 hrs. 
> The table is around 4.6B rows, 
>  explain select cit_id, cl_value from reflink.citation_locators where 
> cl_value = '1507617681' and vclf_number = 1 ;
>QUERY PLAN 
>
> -
>  Bitmap Heap Scan on citation_locators  (cost=5066559.01..50999084.79 
> rows=133 width=23)
>Recheck Cond: (vclf_number = 1)
>Filter: (cl_value = '1507617681'::text)
>->  Bitmap Index Scan on cl_indx_fk02  (cost=0.00..5066558.97 
> rows=493984719 width=0)
>  Index Cond: (vclf_number = 1)
> (5 rows)
> 
> reflink.citation_locators 
> Table "reflink.citation_locators"
>   Column  |   Type   | Modifiers | Storage  | Stats 
> target | Description 
> --+--+---+--+--+-
>  cl_id| bigint   | not null  | plain| 
>  | 
>  cl_value | text | not null  | extended | 
>  | 
>  vclf_number  | integer  | not null  | plain| 
>  | 
>  cit_id   | bigint   | not null  | plain| 
>  | 
>  cl_date_created  | timestamp with time zone | not null  | plain| 
>  | 
>  cl_date_modified | timestamp with time zone |   | plain| 
>  | 
> Indexes:
> "cl_pk" PRIMARY KEY, btree (cl_id)
> "cl_cnst_uk01" UNIQUE CONSTRAINT, btree (cit_id, vclf_number, cl_value)
> "cl_indx_fk01" btree (cit_id)
> "cl_indx_fk02" btree (vclf_number)
> Foreign-key constraints:
> "cl_cnst_fk01" FOREIGN KEY (cit_id) REFERENCES citations(cit_id) NOT 
> VALID"cl_cnst_fk02" FOREIGN KEY (vclf_number) REFERENCES 
> valid_cit_locator_fields(vclf_number)


RE: Simple Query Elapsed Time ~ 3 hrs Using Bitmap Index/Heap Scan

2018-06-05 Thread Fd Habash
Indexes are being redone as per these insights. Appreciate the great support.


Thank you

From: Matthew Hall
Sent: Tuesday, June 5, 2018 10:42 AM
To: Fred Habash
Cc: [email protected]
Subject: Re: Simple Query Elapsed Time ~ 3 hrs Using Bitmap Index/Heap Scan

Probably the cardinality of "vclf_number" is really bad. So the scan on that 
index is returning many million or billion rows and then you get a recheck 
which takes semi-forever. So you need an index on cl_value or both vclf_number 
and cl_value. If you know some properties of the values actually stored inside 
of those that will help. 
Matthew Hall

On Jun 5, 2018, at 7:17 AM, Fred Habash  wrote:
Trying to optimize the Elapsed Time (ET) of this query. Currently, it is 
hovering around 3 hrs.
Running a 'vaccum analyse' had no effect on ET. Even forcing an 'indexonly' 
scan by disabling 'enable_seqscan', still around the 3 hrs. 
The table is around 4.6B rows, 
 explain select cit_id, cl_value from reflink.citation_locators where cl_value 
= '1507617681' and vclf_number = 1 ;
   QUERY PLAN   
 
-
 Bitmap Heap Scan on citation_locators  (cost=5066559.01..50999084.79 rows=133 
width=23)
   Recheck Cond: (vclf_number = 1)
   Filter: (cl_value = '1507617681'::text)
   ->  Bitmap Index Scan on cl_indx_fk02  (cost=0.00..5066558.97 rows=493984719 
width=0)
 Index Cond: (vclf_number = 1)
(5 rows)
reflink.citation_locators 
Table "reflink.citation_locators"
  Column  |   Type   | Modifiers | Storage  | Stats 
target | Description 
--+--+---+--+--+-
 cl_id| bigint   | not null  | plain|   
   | 
 cl_value | text | not null  | extended |   
   | 
 vclf_number  | integer  | not null  | plain|   
   | 
 cit_id   | bigint   | not null  | plain|   
   | 
 cl_date_created  | timestamp with time zone | not null  | plain|   
   | 
 cl_date_modified | timestamp with time zone |   | plain|   
   | 
Indexes:
"cl_pk" PRIMARY KEY, btree (cl_id)
"cl_cnst_uk01" UNIQUE CONSTRAINT, btree (cit_id, vclf_number, cl_value)
"cl_indx_fk01" btree (cit_id)
"cl_indx_fk02" btree (vclf_number)
Foreign-key constraints:
"cl_cnst_fk01" FOREIGN KEY (cit_id) REFERENCES citations(cit_id) NOT VALID  
  "cl_cnst_fk02" FOREIGN KEY (vclf_number) REFERENCES 
valid_cit_locator_fields(vclf_number)