RE: select query does not pick up the right index

2019-01-04 Thread Abadie Lana

-Original Message-
From: David Rowley  
Sent: 03 January 2019 22:42
To: Abadie Lana 
Cc: [email protected]
Subject: Re: select query does not pick up the right index

On Fri, 4 Jan 2019 at 02:20, Abadie Lana  wrote:
> > From: David Rowley 
> > Sent: 03 January 2019 14:01
> Right, so you need to check your indexes on sample_ctrl_year and 
> sample_buil_year. You need an index on (channel_id, smpl_time) on those.

> These indexes exist already

That's interesting. The \d output indicates that the indexes are not INVALID, 
so it's not all that obvious why the planner would choose a lesser index to 
provide the required rows. One thought is that the more suitable index is very 
bloated.  This would increase the estimated cost of scanning the index and 
reduce the chances of the index being selected by the query planner.

If you execute:

select indrelid::regclass as table_name, indexrelid::Regclass as
index_name,pg_size_pretty(pg_relation_size(indrelid))
table_size,pg_size_pretty(pg_relation_size(indexrelid)) index_size from 
pg_index where indrelid in('sample_ctrl_year'::regclass, 
'sample_buil_year'::regclass) order by indrelid::regclass::name, 
indexrelid::regclass::name;

This should show you the size of the tables and indexes in question.
If the sample_time_cy_idx and sample_time_by_idx indexes are very large when 
compared with the size of their table, then it is likely worth building a new 
index for these then dropping the old index then retrying the re-written 
version of the query.  If this is a live system then you can build the new 
indexes by using the CREATE INDEX CONCURRENTLY command.  This will allow other 
DML operations to work without being blocked. The old indexes can then be 
dropped with DROP INDEX CONCURRENTLY.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Here the result...For me it does not sound that it is bloated...Also still a 
mystery why wrong indexes are picked up for buil and ctrl and not for util...

select indrelid::regclass as table_name, indexrelid::Regclass as
index_name,pg_size_pretty(pg_relation_size(indrelid))
table_size,pg_size_pretty(pg_relation_size(indexrelid)) index_size from 
pg_index where indrelid in('sample_ctrl_year'::regclass, 
'sample_buil_year'::regclass,'sample_util_year'::regclass) order by 
indrelid::regclass::name, indexrelid::regclass::name;
table_name| index_name  | table_size | index_size
--+-++
 sample_buil_year | sample_time_by_idx  | 4492 MB| 1522 MB
 sample_buil_year | sample_time_yb1_idx | 4492 MB| 1522 MB
 sample_buil_year | smpl_time_bx2_idx   | 4492 MB| 1084 MB
 sample_ctrl_year | sample_time_cy_idx  | 7065 MB| 2394 MB
 sample_ctrl_year | sample_time_yc1_idx | 7065 MB| 2394 MB
 sample_ctrl_year | smpl_time_cmx2_idx  | 7065 MB| 1705 MB
 sample_util_year | sample_time_uy_idx  | 7140 MB| 2426 MB
 sample_util_year | sample_time_yu1_idx | 7140 MB| 2426 MB
 sample_util_year | smpl_time_ux2_idx   | 7140 MB| 1727 MB
(9 rows)

I have recreated the indexes for sample_ctrl_year and sample_buil_year and same 
index size.
I rerun the query... and still the same plan execution as previously sent
Thanks for your support...One thing I spot is the I/O on this machine is rather 
slow... the very first time I run this query it will take Execution time: 
247503.006 ms ( I can see that postgres process is in state D and low 
CPU...,using iotop I can see I/O read speed cannot go beyond 20MB/sec. The 
second time I run the query, the CPU goes up to 100%, no D state).






RE: select query does not pick up the right index

2019-01-04 Thread Abadie Lana


-Original Message-
From: Justin Pryzby  
Sent: 04 January 2019 00:48
To: Abadie Lana 
Cc: David Rowley ; 
[email protected]
Subject: Re: select query does not pick up the right index

On Thu, Jan 03, 2019 at 12:57:27PM +, Abadie Lana wrote:
> Main parameters : effective_cache_size : 4GB, shared_buffers 4GB, 
> work_mem 4MB

I doubt it will help much, but you should consider increasing work_mem, unless 
you have many expensive queries running at once.

Could you also send the rest of the pg_statistic for that table ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, 
array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='...' AND 
tablename='...' ORDER BY 1 DESC; 

Hmm. Is it normal that the couple (tablename,attname ) is not unique? I'm 
surprised to see sample_{ctrl,util,buil} quoted twice

css_archive_3_0_0=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) 
frac_MCV, tablename, attname, null_frac, n_distinct, 
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist 
FROM pg_stats WHERE attname='channel_id' AND tablename like 'sample%' ORDER BY 
1 DESC;
 frac_mcv | tablename |  attname   | null_frac | n_distinct | n_mcv | 
n_hist
--+---++---++---+
1 | sample_buil_year  | channel_id | 0 | 16 |16 |
  0.98249 | sample_ctrl   | channel_id | 0 | 26 |17 |   
   9
 0.982333 | sample_ctrl_month | channel_id | 0 | 34 |17 |   
  17
 0.981533 | sample_ctrl   | channel_id | 0 | 28 |18 |   
  10
   0.9371 | sample_ctrl_year  | channel_id | 0 | 38 |16 |   
  22
 0.928767 | sample_buil_month | channel_id | 0 |940 |54 |   
 101
  0.92535 | sample| channel_id | 0 |   2144 |   167 |   
1001
 0.907501 | sample_buil   | channel_id | 0 |565 |43 |   
 101
   0.8876 | sample_util_year  | channel_id | 0 |501 |45 |   
 101
0.815 | sample_util   | channel_id | 0 |557 |82 |   
 101
 0.807667 | sample_buil   | channel_id | 0 |164 |31 |   
 101
 0.806267 | sample_util   | channel_id | 0 |732 |   100 |   
 101
 0.803766 | sample_util_month | channel_id | 0 |731 |   100 |   
 101
(13 rows)

Ah...sample_ctrl_year and sample_buil_year have n_distinct -1? Unlike 
sample_util_year. Could that explain the wrong choice? 

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, 
array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='smpl_time' 
AND tablename like 'sample%' ORDER BY 1 DESC;
  frac_mcv  | tablename |  attname  | null_frac | n_distinct  | n_mcv | 
n_hist
+---+---+---+-+---+
| sample_ctrl_month | smpl_time | 0 |  -1 |   | 
   101
| sample_ctrl_year  | smpl_time | 0 |  -1 |   | 
   101
| sample_ctrl   | smpl_time | 0 |  -1 |   | 
   101
| sample_ctrl   | smpl_time | 0 |  -1 |   | 
   101
| sample_buil_year  | smpl_time | 0 |  -1 |   | 
   101
  0.0154667 | sample_buil_month | smpl_time | 0 | 1.03857e+06 |   100 | 
   101
  0.0154523 | sample_buil   | smpl_time | 0 |  854250 |   100 | 
   101
 0.0115 | sample_util   | smpl_time | 0 |  405269 |   100 | 
   101
  0.0112333 | sample_util   | smpl_time | 0 |  537030 |   100 | 
   101
  0.0106667 | sample_util_month | smpl_time | 0 |  539001 |   100 | 
   101
 0.00946667 | sample_buil   | smpl_time | 0 |   -0.328554 |   100 | 
   101
 0.00852342 | sample| smpl_time | 0 |  1.5125e+07 |  1000 | 
  1001
 0.00780001 | sample_util_year  | smpl_time | 0 | 1.73199e+06 |   100 | 
   101
(13 rows)





RE: select query does not pick up the right index

2019-01-04 Thread Abadie Lana
-Original Message-
From: [email protected] 
 On Behalf Of 
Abadie Lana
Sent: 04 January 2019 09:18
To: Justin Pryzby 
Cc: David Rowley ; 
[email protected]
Subject: [Possible Spoof] RE: select query does not pick up the right index

Warning: This message was sent by 
[email protected] supposedly on behalf 
of Abadie Lana . Please contact

-Original Message-
From: Justin Pryzby 
Sent: 04 January 2019 00:48
To: Abadie Lana 
Cc: David Rowley ; 
[email protected]
Subject: Re: select query does not pick up the right index

On Thu, Jan 03, 2019 at 12:57:27PM +, Abadie Lana wrote:
> Main parameters : effective_cache_size : 4GB, shared_buffers 4GB, 
> work_mem 4MB

I doubt it will help much, but you should consider increasing work_mem, unless 
you have many expensive queries running at once.

Could you also send the rest of the pg_statistic for that table ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, 
array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='...' AND 
tablename='...' ORDER BY 1 DESC; 

Hmm. Is it normal that the couple (tablename,attname ) is not unique? I'm 
surprised to see sample_{ctrl,util,buil} quoted twice

css_archive_3_0_0=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) 
frac_MCV, tablename, attname, null_frac, n_distinct, 
array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist 
FROM pg_stats WHERE attname='channel_id' AND tablename like 'sample%' ORDER BY 
1 DESC;
 frac_mcv | tablename |  attname   | null_frac | n_distinct | n_mcv | 
n_hist
--+---++---++---+
1 | sample_buil_year  | channel_id | 0 | 16 |16 |
  0.98249 | sample_ctrl   | channel_id | 0 | 26 |17 |   
   9
 0.982333 | sample_ctrl_month | channel_id | 0 | 34 |17 |   
  17
 0.981533 | sample_ctrl   | channel_id | 0 | 28 |18 |   
  10
   0.9371 | sample_ctrl_year  | channel_id | 0 | 38 |16 |   
  22
 0.928767 | sample_buil_month | channel_id | 0 |940 |54 |   
 101
  0.92535 | sample| channel_id | 0 |   2144 |   167 |   
1001
 0.907501 | sample_buil   | channel_id | 0 |565 |43 |   
 101
   0.8876 | sample_util_year  | channel_id | 0 |501 |45 |   
 101
0.815 | sample_util   | channel_id | 0 |557 |82 |   
 101
 0.807667 | sample_buil   | channel_id | 0 |164 |31 |   
 101
 0.806267 | sample_util   | channel_id | 0 |732 |   100 |   
 101
 0.803766 | sample_util_month | channel_id | 0 |731 |   100 |   
 101
(13 rows)

Ah...sample_ctrl_year and sample_buil_year have n_distinct -1? Unlike 
sample_util_year. Could that explain the wrong choice? 

SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, 
array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='smpl_time' 
AND tablename like 'sample%' ORDER BY 1 DESC;
  frac_mcv  | tablename |  attname  | null_frac | n_distinct  | n_mcv | 
n_hist
+---+---+---+-+---+
| sample_ctrl_month | smpl_time | 0 |  -1 |   | 
   101
| sample_ctrl_year  | smpl_time | 0 |  -1 |   | 
   101
| sample_ctrl   | smpl_time | 0 |  -1 |   | 
   101
| sample_ctrl   | smpl_time | 0 |  -1 |   | 
   101
| sample_buil_year  | smpl_time | 0 |  -1 |   | 
   101
  0.0154667 | sample_buil_month | smpl_time | 0 | 1.03857e+06 |   100 | 
   101
  0.0154523 | sample_buil   | smpl_time | 0 |  854250 |   100 | 
   101
 0.0115 | sample_util   | smpl_time | 0 |  405269 |   100 | 
   101
  0.0112333 | sample_util   | smpl_time | 0 |  537030 |   100 | 
   101
  0.0106667 | sample_util_month | smpl_time | 0 |  539001 |   100 | 
   101
 0.00946667 | sample_buil   | smpl_time | 0 |   -0.328554 |   100 | 
   101
 0.00852342 | sample| smpl_time | 0 |  1.5125e+07 |  1000 | 
  1001
 0.00780001 | sample_util_year  | smpl_time | 0 | 1.73199e+06 |   100 | 
   101
(13 rows)

Based on your feedback...i rerun analyse directly on the two table 
sample_ctrl_year and sample_buil_year
The new values are
SELECT (SELECT sum(x) FROM unnest(most_commo

Re: select query does not pick up the right index

2019-01-04 Thread Justin Pryzby
On Fri, Jan 04, 2019 at 08:58:57AM +, Abadie Lana wrote:
> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
> attname, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, 
> array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='...' AND 
> tablename='...' ORDER BY 1 DESC; 
> 
> Hmm. Is it normal that the couple (tablename,attname ) is not unique? I'm 
> surprised to see sample_{ctrl,util,buil} quoted twice

One of the rows is for "inherited stats" (including child tables) stats and one
is "noninherited stats".

The unique index on the table behind that view is:
"pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, 
stainherit)

On the wiki, I added inherited and correlation columns.  Would you rerun that 
query ?
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

I'm also interested to see \d and channel_id statistics for the channel table.

> explain (analyze, buffers) select 
> 'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_val
>  from sample c WHERE c.channel_id = (SELECT channel_id FROM channel WHERE 
> name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by c.smpl_time desc limit 5;

You originally wrote this as a implicit comma join.  Does the original query
still have an issue ?  The =(subselect query) doesn't allow the planner to
optimize for the given channel, which seems to be a fundamental problem.

On Fri, Jan 04, 2019 at 08:58:57AM +, Abadie Lana wrote:
> Based on your feedback...i rerun analyse directly on the two table 
> sample_ctrl_year and sample_buil_year
> [...] Now when running the query again, only for sample_buil_year table the 
> wrong index is picked up...

It looks like statistics on your tables were completely wrong; not just
sample_ctrl_year and sample_buil_year.  Right ?

Autoanalyze would normally handle this on nonempty tables (children or
otherwise) and you should manually run ANALZYE on the parents (both levels of
them) whenever statistics change, like after running a big DELETE or DROP or
after a significant interval of time has passed relative to the range of time
in the table's timestamp columns.

Do you know why autoanalze didn't handle the nonempty tables on its own ?

> Now, the channel name I gave has no entries in sample_buil_year...(and when I 
> run the query directly against sample_buil_year the right index is picked 
> up) So maybe something related with the partitioning?

>->  Index Scan Backward using smpl_time_bx2_idx on 
> sample_buil_year c_5  (cost=0.56..2023054.76 rows=665761 width=75) (actual 
> time=13216.589..13216.589 rows=0 loops=1)
>  Filter: (channel_id = $0)
>  Rows Removed by Filter: 50597834
>  Buffers: shared hit=26626368

So it scanned the entire index expecting to find 5 matching channel IDs "pretty
soon", based on the generic distribution of channel IDs, without the benefit of
knowing that this channel ID doesn't exist at all (due to =(subquery)).

26e6 buffers is 200GB, apparently accessing some pages many
times (even if cached).

table_name| index_name  | table_size | index_size   

   
 sample_buil_year | smpl_time_bx2_idx   | 4492 MB| 1084 MB  

   

General comments:

On Wed, Jan 02, 2019 at 04:28:41PM +, Abadie Lana wrote:
>"sample_time_bm_idx" btree (channel_id, smpl_time)
>"sample_time_mb1_idx" btree (smpl_time, channel_id)
>"smpl_time_bx1_idx" btree (smpl_time)

The smpl_time index is loosely redundant with index on (smpl_time,channel_id).
You might consider dropping it, or otherwise dropping the smpl_time,channel_id
index and making two separate indices on smpl_time and channel.  That would
allow bitmap ANDing them together.

Or possibly (depending on detail of your data loading) leaving the composite
index and changing smpl_time to a BRIN index - it's nice to be able to CLUSTER
on the btree index to maximize the efficiency of the brin index.

>Check constraints:
>"sample_buil_month_smpl_time_check" CHECK (smpl_time >= (now() - '32 
> days'::interval)::timestamp without time zone AND smpl_time <= now())

I'm surprised that works, and not really sure what it's doing..but in any case
it's maybe not doing what you wanted(??).  I'm guessing you never get
constraint exclusion (which is irrelevant for this query but still).

Justin