Re: Performance Issue (Not using Index when joining two tables).

2020-09-14 Thread Gopisetty, Ramesh
@Michael Lewis; @Tom 
Lane; @Tomas 
Vondra

Hi,

Thanks for looking into the problem/issue.Let me give more details about 
the functions...Yes,  we are using row level security.

Actually, we have converted an Oracle VPD database (Virtual Private Databases - 
 In short row level security)  into postgresql.   We have several functions 
available to filter or to provide the row level security.

f_sel_policy_ty_static;  f_sel_policy_all  filters the tables where the vpd_key 
is provided initially.
f_sel_policy_prod_locale  filters the table where the prod_locale_code is 
provided initially.

Before running any queries in the database, we will set the context 
settings/row level security  based on the function below..

 
CALLvpd_filter(vpd_key=>'',mod_user=>'X',user_locale=>'en_XX',prod_locale=>'XX');

This will set the context variables and provide row level security.   All the 
tables in our database consists of vpd_key which is a filter for to run the 
queries for a given client.

The tables mentioned below chr_emp_position and chr_simple_val consists of many 
rows and the functions filter them based on the vpd_key and 
prod_user_locale_code.
Once after providing the row level security we executed the query joining the 
tables..And where the index is not being utlitized/ the query runs slower 
i.e., greater than 8seconds.

The normal structure of the tables will be like this..

chr_emp_position  --- has columns  vpd_key,oid, home_Dept_oid, eff_date, 
start_Date,.etc.,(almost having 200+ columns).   -- primary key is  
vpd_key and oid.
chr_simple_Val   --- has columns   vpd_key, oid , category, description..et.c,  
   (almost has around 70 columns).(primary key is  vpd_key and oid)

The rows mentioned below are after setting the row level security on those 
tables ..

i.e,  after executing the function

 CALL 
vpd_filter(spv_vpd_key=>'',spv_mod_usr=>'X',spv_user_locale=>'en_XX',spv_prod_locale=>'XX');


pgwfc01q=> select count(*) from chr_simple_val;
 count
---
 13158
(1 row)

pgwfc01q=> select count(*) from chr_emp_position;
 count
---
   228
(1 row)


The primary key for the table chr_Simple_val  contains OID.   Still not using 
the index.

I'm sharing the explain plan over here..

pgwfc01q=> explain analyze select 
cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from chr_emp_position cep inner 
join chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID;

 QUERY P
LAN


 Hash Join  (cost=49299.91..51848.83 rows=651 width=42) (actual 
time=3512.692..3797.583 rows=228 loops=1)
   Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
   ->  Seq Scan on chr_emp_position cep  (cost=0.00..2437.77 rows=436 width=11) 
(actual time=44.713..329.435 rows=22
8 loops=1)
 Filter: ((("current_user"())::text <> ANY 
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND 
f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))
 Rows Removed by Filter: 3695
   ->  Hash  (cost=49176.40..49176.40 rows=9881 width=31) (actual 
time=3467.907..3467.908 rows=13158 loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 1031kB
 ->  Seq Scan on chr_simple_val ctc  (cost=0.00..49176.40 rows=9881 
width=31) (actual time=2.191..3460.929 r
ows=13158 loops=1)
   Filter: ((("current_user"())::text <> ANY 
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_ty_static(
vpd_key) AND f_sel_policy_prod_locale((ctc.*)::character varying, 
prod_locale_code))
Planning Time: 0.297 ms
 Execution Time: 3797.768 ms
(12 rows)


If i don't set the context and run as a root user the explain plan is as 
below..   And it executes in milliseconds even without the index having the 
full table scan.


  1.   I'm not sure if my filters are time consuming.  Most of the queries 
works except few.  We hadn't seen the problem in Oracle.  I'm not comparing 
between Oracle and Postgres here.   I see both are two different flavors. but 
trying to get my query runs less than 8seconds.
  2.   I'm not sure why the index on chr_simple_val is not being used here  
vpd_key,oid.   I'm confident if it uses index, it will/might  be faster as it 
is looking for 2 or  3 home departments based on oid.
  3.   I'm not sure why even having the full scan it worked for the root user.
  4.   I'm not sure why the bitmap heap scan was not followed after setting the 
row level security.   How to make the bitmap heap scan on chr_emp_position as i 
observed here.

fyi.,

Running as a root user.

pgwfc01q=> explain analyze s

Re: Performance Issue (Not using Index when joining two tables).

2020-09-14 Thread Tom Lane
"Gopisetty, Ramesh"  writes:
> Thanks for looking into the problem/issue.Let me give more details about 
> the functions...Yes,  we are using row level security.

Hm.  If those expensive filter functions are being injected by RLS on the
target tables (rather than by something like an intermediate view), then
the planner is constrained to ensure that they execute before any query
conditions that it doesn't know to be "leakproof".  So unless your join
operator is leakproof, the shape of plan that you're hoping for will not
be allowed.  Since you haven't mentioned anything about data types, it's
hard to know whether that's the issue.  (The hash condition seems to be
texteq, which is leakproof, but there are also casts involved which
might not be.)

The two queries you provided explain plans for are not the same, so
comparing their plans is a fairly pointless activity.  *Of course*
the query runs faster when you restrict it to fetch fewer rows.  The
original query has no restriction clause that corresponds to the
clauses being used for index conditions in the second query, so it's
hardly a surprise that you do not get that plan, RLS or no RLS.

regards, tom lane




autoanalyze creates bad plan, manual analyze fixes it?

2020-09-14 Thread Robert Treat
Howdy folks,

Recently i've run into a problem where autoanalyze is causing a query
plan to flip over to using an index which is about 10x slower, and the
problem is fixed by running an alayze manually. some relevant info:

UPDATE sleeping_intents SET
raptor_after='2020-09-14T19:21:03.581106'::timestamp,
status='requires_capture',
updated_at='2020-09-14T16:21:03.581104+00:00'::timestamptz WHERE
sleeping_intents.id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid AND
sleeping_intents.status = 'finna' RETURNING *;

The plan generated after autoanalyze is:

Update on sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual
time=57.945..57.945 rows=0 loops=1)
   Buffers: shared hit=43942
   ->  Index Scan using
sleeping_intents_status_created_at_raptor_after_idx on
sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual
time=57.943..57.943 rows=0 loops=1)
 Index Cond: (status = 'init'::text)
 Filter: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
 Rows Removed by Filter: 1262
 Buffers: shared hit=43942
 Planning time: 0.145 ms
 Execution time: 57.981 ms

after i run analyze manually, the query plan is changed to this:

Update on sleeping_intents  (cost=0.57..8.59 rows=1 width=272) (actual
time=0.023..0.023 rows=0 loops=1)
   Buffers: shared hit=7
   ->  Index Scan using sleeping_intents_pkey on sleeping_intents
(cost=0.57..8.59 rows=1 width=272) (actual time=0.022..0.022 rows=0
loops=1)
 Index Cond: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
 Filter: (status = 'init'::text)
 Rows Removed by Filter: 1
 Buffers: shared hit=7
 Planning time: 0.092 ms
 Execution time: 0.066 ms

Note that in the second query, it switches back to using the primary
key index, which does seem like the logically better choice, even
though it shows a higher estimated cost than the "bad" case
(understanding the cost must change somewhere in the process, but
there no way to see it afaict).

In trying to determine why it switches, I dug up some likely useful info:
Index definitions:
 (20 GB) "sleeping_intents_pkey" PRIMARY KEY, btree (id)
 (37 GB) "sleeping_intents_status_created_at_raptor_after_idx" btree
(status, created_at DESC, raptor_after DESC)

Basic info on the table:
> select relid::regclass, 
> n_live_tup,n_mod_since_analyze,analyze_count,autoanalyze_count from 
> pg_stat_user_tables where relname='sleeping_intents';
  relid  | n_live_tup | n_mod_since_analyze | analyze_count |
autoanalyze_count
-++-+---+---
 sleeping_intents |  491171179 | 1939347 | 4 |
   80

(that num mods is in the last ~5 hours, the table is fairly active,
although on a relatively small portion of the data)

Statistics after manual analyze:
   tablename|attname| null_frac | avg_width |
n_distinct | correlation |   most_common_freqs
-+---+---+---++-+
 sleeping_intents | id| 0 |16 | -1
| -0.00133045 | [null]
 sleeping_intents | status| 0 | 9 |  6
|0.848468 | {0.918343,0.0543667,0.0267567,0.00051,1e-05,1e-05}
 sleeping_intents | created_at| 0 | 8 | -1
|0.993599 | [null]
 sleeping_intents | raptor_after | 0.0663433 | 8 |  -0.933657
| 0.99392 | [null]

In a previous go around with this table, I also increased the
statistics target for the id column to 1000, vs 100 which is the
database default.

Originally I was mostly interested in trying to understand why it
would choose something other than the non-pk index, which sort of
feels like a bug; what could be faster than seeking an individual
entry in a pk index? There are cases where it might make sense, but
this doesn't seem like one (even accounting for the infrequency of the
status we are looking for, which is 1e-05, the disparity in index size
should push it back to the pk imho, unless I am not thinking through
correlation enough?).

However, it also seems very odd that this problem occurs at all. In
the last couple of times this has happened, the manual analyze has
been run within ~30-45 minutes of the auto-analyze, and while the data
is changing, it isn't changing that rapidly that this should make a
significant difference, but I don't see any other reason that
autoanalyze would produce a different result than manual analyze.

All that said, any insight on the above two items would be great, but
the most immediate concern would be around suggestions for preventing
this from happening again?

Thanks in advance,

Robert Treat
https://xzilla.net




Re: autoanalyze creates bad plan, manual analyze fixes it?

2020-09-14 Thread Justin Pryzby
On Mon, Sep 14, 2020 at 07:11:12PM -0400, Robert Treat wrote:
> Howdy folks,
> 
> Recently i've run into a problem where autoanalyze is causing a query
> plan to flip over to using an index which is about 10x slower, and the
> problem is fixed by running an alayze manually. some relevant info:

I think it's because 1) the costs and scan rowcounts are similar ; and, 2) the
stats are probably near some threshold which causes the plan to change.  I'm
guessing if you run a manual ANALYZE 100 times, you'll sometimes get the bad
plan.  Maybe depending on the data visible at the time analyze is invoked.

> UPDATE sleeping_intents SET
> raptor_after='2020-09-14T19:21:03.581106'::timestamp,
> status='requires_capture',
> updated_at='2020-09-14T16:21:03.581104+00:00'::timestamptz WHERE
> sleeping_intents.id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid AND
> sleeping_intents.status = 'finna' RETURNING *;

Do you mean status='init' ??

> The plan generated after autoanalyze is:
> 
> Update on sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual 
> time=57.945..57.945 rows=0 loops=1)
>Buffers: shared hit=43942
>->  Index Scan using sleeping_intents_status_created_at_raptor_after_idx 
> on sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual 
> time=57.943..57.943 rows=0 loops=1)
>  Index Cond: (status = 'init'::text)
>  Filter: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
>  Rows Removed by Filter: 1262
>  Buffers: shared hit=43942
>  Planning time: 0.145 ms
>  Execution time: 57.981 ms
> 
> after i run analyze manually, the query plan is changed to this:
> 
> Update on sleeping_intents  (cost=0.57..8.59 rows=1 width=272) (actual 
> time=0.023..0.023 rows=0 loops=1)
>Buffers: shared hit=7
>->  Index Scan using sleeping_intents_pkey on sleeping_intents 
> (cost=0.57..8.59 rows=1 width=272) (actual time=0.022..0.022 rows=0 loops=1)
>  Index Cond: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
>  Filter: (status = 'init'::text)
>  Rows Removed by Filter: 1
>  Buffers: shared hit=7
>  Planning time: 0.092 ms
>  Execution time: 0.066 ms
> 
> Note that in the second query, it switches back to using the primary
> key index, which does seem like the logically better choice, even
> though it shows a higher estimated cost than the "bad" case
> (understanding the cost must change somewhere in the process, but
> there no way to see it afaict).

If you SET enable_indexscan=off you can try to get an bitmap index scan, which
will reveal how much of the cost is attributed to the index component and how
much to the heap.  That might help to refine costs, which may help.

> Statistics after manual analyze:
>tablename|attname| null_frac | avg_width |
> n_distinct | correlation |   most_common_freqs
> -+---+---+---++-+
>  sleeping_intents | id| 0 |16 | -1
> | -0.00133045 | [null]
>  sleeping_intents | status| 0 | 9 |  6
> |0.848468 | {0.918343,0.0543667,0.0267567,0.00051,1e-05,1e-05}
>  sleeping_intents | created_at| 0 | 8 | -1
> |0.993599 | [null]
>  sleeping_intents | raptor_after | 0.0663433 | 8 |  -0.933657
> | 0.99392 | [null]
> 
> In a previous go around with this table, I also increased the
> statistics target for the id column to 1000, vs 100 which is the
> database default.

What about status ?
I wonder if sometimes the sample doesn't include *any* rows for the 1e-5
statuses.  So the planner would estimate the rowcount based on ndistinct and
the other frequencies.  But if you rerun analyze, then it thinks it'll get one
row based on the sampled frequency of status.  

What postgres version, and what non-default settings ?
Maybe you can run explain(settings,...).

-- 
Justin




Re: autoanalyze creates bad plan, manual analyze fixes it?

2020-09-14 Thread Pavel Stehule
Ășt 15. 9. 2020 v 1:11 odesĂ­latel Robert Treat  napsal:

> Howdy folks,
>
> Recently i've run into a problem where autoanalyze is causing a query
> plan to flip over to using an index which is about 10x slower, and the
> problem is fixed by running an alayze manually. some relevant info:
>
> UPDATE sleeping_intents SET
> raptor_after='2020-09-14T19:21:03.581106'::timestamp,
> status='requires_capture',
> updated_at='2020-09-14T16:21:03.581104+00:00'::timestamptz WHERE
> sleeping_intents.id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid AND
> sleeping_intents.status = 'finna' RETURNING *;
>
> The plan generated after autoanalyze is:
>
> Update on sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual
> time=57.945..57.945 rows=0 loops=1)
>Buffers: shared hit=43942
>->  Index Scan using
> sleeping_intents_status_created_at_raptor_after_idx on
> sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual
> time=57.943..57.943 rows=0 loops=1)
>  Index Cond: (status = 'init'::text)
>  Filter: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
>  Rows Removed by Filter: 1262
>  Buffers: shared hit=43942
>  Planning time: 0.145 ms
>  Execution time: 57.981 ms
>

This looks pretty strange - why for 1262 rows you need to read 43942 pages?

Can you reindex this index. Maybe it is bloated.

Regards

Pavel



>
> after i run analyze manually, the query plan is changed to this:
>
> Update on sleeping_intents  (cost=0.57..8.59 rows=1 width=272) (actual
> time=0.023..0.023 rows=0 loops=1)
>Buffers: shared hit=7
>->  Index Scan using sleeping_intents_pkey on sleeping_intents
> (cost=0.57..8.59 rows=1 width=272) (actual time=0.022..0.022 rows=0
> loops=1)
>  Index Cond: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
>  Filter: (status = 'init'::text)
>  Rows Removed by Filter: 1
>  Buffers: shared hit=7
>  Planning time: 0.092 ms
>  Execution time: 0.066 ms
>
> Note that in the second query, it switches back to using the primary
> key index, which does seem like the logically better choice, even
> though it shows a higher estimated cost than the "bad" case
> (understanding the cost must change somewhere in the process, but
> there no way to see it afaict).
>
> In trying to determine why it switches, I dug up some likely useful info:
> Index definitions:
>  (20 GB) "sleeping_intents_pkey" PRIMARY KEY, btree (id)
>  (37 GB) "sleeping_intents_status_created_at_raptor_after_idx" btree
> (status, created_at DESC, raptor_after DESC)
>
> Basic info on the table:
> > select relid::regclass,
> n_live_tup,n_mod_since_analyze,analyze_count,autoanalyze_count from
> pg_stat_user_tables where relname='sleeping_intents';
>   relid  | n_live_tup | n_mod_since_analyze | analyze_count |
> autoanalyze_count
>
> -++-+---+---
>  sleeping_intents |  491171179 | 1939347 | 4 |
>80
>
> (that num mods is in the last ~5 hours, the table is fairly active,
> although on a relatively small portion of the data)
>
> Statistics after manual analyze:
>tablename|attname| null_frac | avg_width |
> n_distinct | correlation |   most_common_freqs
>
> -+---+---+---++-+
>  sleeping_intents | id| 0 |16 | -1
> | -0.00133045 | [null]
>  sleeping_intents | status| 0 | 9 |  6
> |0.848468 | {0.918343,0.0543667,0.0267567,0.00051,1e-05,1e-05}
>  sleeping_intents | created_at| 0 | 8 | -1
> |0.993599 | [null]
>  sleeping_intents | raptor_after | 0.0663433 | 8 |  -0.933657
> | 0.99392 | [null]
>
> In a previous go around with this table, I also increased the
> statistics target for the id column to 1000, vs 100 which is the
> database default.
>
> Originally I was mostly interested in trying to understand why it
> would choose something other than the non-pk index, which sort of
> feels like a bug; what could be faster than seeking an individual
> entry in a pk index? There are cases where it might make sense, but
> this doesn't seem like one (even accounting for the infrequency of the
> status we are looking for, which is 1e-05, the disparity in index size
> should push it back to the pk imho, unless I am not thinking through
> correlation enough?).
>
> However, it also seems very odd that this problem occurs at all. In
> the last couple of times this has happened, the manual analyze has
> been run within ~30-45 minutes of the auto-analyze, and while the data
> is changing, it isn't changing that rapidly that this should make a
> significant difference, but I don't see any other reason that
> autoanalyze would produce a different result than manual analyze.
>
> All that said, any insigh

Re: autoanalyze creates bad plan, manual analyze fixes it?

2020-09-14 Thread Robert Treat
On Mon, Sep 14, 2020 at 7:41 PM Justin Pryzby  wrote:
>
> On Mon, Sep 14, 2020 at 07:11:12PM -0400, Robert Treat wrote:
> > Howdy folks,
> >
> > Recently i've run into a problem where autoanalyze is causing a query
> > plan to flip over to using an index which is about 10x slower, and the
> > problem is fixed by running an alayze manually. some relevant info:
>
> I think it's because 1) the costs and scan rowcounts are similar ; and, 2) the
> stats are probably near some threshold which causes the plan to change.  I'm
> guessing if you run a manual ANALYZE 100 times, you'll sometimes get the bad
> plan.  Maybe depending on the data visible at the time analyze is invoked.
>

I've been thinking to try to capture statistics info in the bad case,
I wonder if I could reproduce the situation that way.

> > UPDATE sleeping_intents SET
> > raptor_after='2020-09-14T19:21:03.581106'::timestamp,
> > status='requires_capture',
> > updated_at='2020-09-14T16:21:03.581104+00:00'::timestamptz WHERE
> > sleeping_intents.id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid AND
> > sleeping_intents.status = 'finna' RETURNING *;
>
> Do you mean status='init' ??
Yes, sorry, was playing around with different status's and copy/paste error.
>
> > The plan generated after autoanalyze is:
> >
> > Update on sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual 
> > time=57.945..57.945 rows=0 loops=1)
> >Buffers: shared hit=43942
> >->  Index Scan using sleeping_intents_status_created_at_raptor_after_idx 
> > on sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual 
> > time=57.943..57.943 rows=0 loops=1)
> >  Index Cond: (status = 'init'::text)
> >  Filter: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
> >  Rows Removed by Filter: 1262
> >  Buffers: shared hit=43942
> >  Planning time: 0.145 ms
> >  Execution time: 57.981 ms
> >
> > after i run analyze manually, the query plan is changed to this:
> >
> > Update on sleeping_intents  (cost=0.57..8.59 rows=1 width=272) (actual 
> > time=0.023..0.023 rows=0 loops=1)
> >Buffers: shared hit=7
> >->  Index Scan using sleeping_intents_pkey on sleeping_intents 
> > (cost=0.57..8.59 rows=1 width=272) (actual time=0.022..0.022 rows=0 loops=1)
> >  Index Cond: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
> >  Filter: (status = 'init'::text)
> >  Rows Removed by Filter: 1
> >  Buffers: shared hit=7
> >  Planning time: 0.092 ms
> >  Execution time: 0.066 ms
> >
> > Note that in the second query, it switches back to using the primary
> > key index, which does seem like the logically better choice, even
> > though it shows a higher estimated cost than the "bad" case
> > (understanding the cost must change somewhere in the process, but
> > there no way to see it afaict).
>
> If you SET enable_indexscan=off you can try to get an bitmap index scan, which
> will reveal how much of the cost is attributed to the index component and how
> much to the heap.  That might help to refine costs, which may help.
>

I'm not quite sure what your getting at, but took a look and got this
surprising plan:

Update on sleeping_intents  (cost=4.58..8.60 rows=1 width=272) (actual
time=0.025..0.025 rows=0 loops=1)
   Buffers: shared hit=6
   ->  Bitmap Heap Scan on sleeping_intents  (cost=4.58..8.60 rows=1
width=272) (actual time=0.025..0.025 rows=0 loops=1)
 Recheck Cond: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
 Filter: (status = 'init'::text)
 Rows Removed by Filter: 1
 Heap Blocks: exact=2
 Buffers: shared hit=6
 ->  Bitmap Index Scan on sleeping_intents_pkey
(cost=0.00..4.58 rows=1 width=0) (actual time=0.017..0.017 rows=3
loops=1)
   Index Cond: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
   Buffers: shared hit=4
 Planning time: 1.170 ms
 Execution time: 0.063 ms

The one thing about this is that these are 0 update runs, because as
noted the data is always changing. I do think it's instructive to see
the plans, but in this case it feels a bit unfair.

> > Statistics after manual analyze:
> >tablename|attname| null_frac | avg_width |
> > n_distinct | correlation |   most_common_freqs
> > -+---+---+---++-+
> >  sleeping_intents | id| 0 |16 | -1
> > | -0.00133045 | [null]
> >  sleeping_intents | status| 0 | 9 |  6
> > |0.848468 | {0.918343,0.0543667,0.0267567,0.00051,1e-05,1e-05}
> >  sleeping_intents | created_at| 0 | 8 | -1
> > |0.993599 | [null]
> >  sleeping_intents | raptor_after | 0.0663433 | 8 |  -0.933657
> > | 0.99392 | [null]
> >
> > In a previous go around with this table, I also increased the
> > statistics target for the id column to 1000,