Postgres not using correct indices for views.

2019-08-08 Thread Thomas Rosenstein

Hi,

we have created restricted view for our tables, so that we can allow 
access to non-gdpr relevant data but hide everything else.


For exactly those views, the Query Planner uses the wrong indices, when 
executing exactly the same query, once it takes 0.1 s and on the views 
it takes nearly 18 sec (it does a full table scan, or uses the wrong 
indices).


Do we have to GRANT additional rights? I see it's using some indices, 
just not the correct ones!


Has anyone experienced the same issues? What can we do about that?

Thanks
BR
Thomas




Re: Postgres not using correct indices for views.

2019-08-08 Thread Tom Lane
"Thomas Rosenstein"  writes:
> we have created restricted view for our tables, so that we can allow 
> access to non-gdpr relevant data but hide everything else.
> For exactly those views, the Query Planner uses the wrong indices, when 
> executing exactly the same query, once it takes 0.1 s and on the views 
> it takes nearly 18 sec (it does a full table scan, or uses the wrong 
> indices).
> Do we have to GRANT additional rights? I see it's using some indices, 
> just not the correct ones!

Does EXPLAIN show reasonable rowcount estimates when you query
directly, but bad ones when you query via the views?

If so, a likely guess is that you're falling foul of the restrictions
added for CVE-2017-7484:

Author: Peter Eisentraut 
Branch: master Release: REL_10_BR [e2d4ef8de] 2017-05-08 09:26:32 -0400
Branch: REL9_6_STABLE Release: REL9_6_3 [c33c42362] 2017-05-08 09:18:57 -0400
Branch: REL9_5_STABLE Release: REL9_5_7 [d45cd7c0e] 2017-05-08 09:19:07 -0400
Branch: REL9_4_STABLE Release: REL9_4_12 [3e5ea1f9b] 2017-05-08 09:19:15 -0400
Branch: REL9_3_STABLE Release: REL9_3_17 [4f1b2089a] 2017-05-08 09:19:23 -0400
Branch: REL9_2_STABLE Release: REL9_2_21 [d035c1b97] 2017-05-08 09:19:42 -0400

Add security checks to selectivity estimation functions

Some selectivity estimation functions run user-supplied operators over
data obtained from pg_statistic without security checks, which allows
those operators to leak pg_statistic data without having privileges on
the underlying tables.  Fix by checking that one of the following is
satisfied: (1) the user has table or column privileges on the table
underlying the pg_statistic data, or (2) the function implementing the
user-supplied operator is leak-proof.  If neither is satisfied, planning
will proceed as if there are no statistics available.

At least one of these is satisfied in most cases in practice.  The only
situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.

Reported-by: Robert Haas 
Author: Peter Eisentraut 
Author: Tom Lane 

Security: CVE-2017-7484


However, if you're not on the latest minor releases, you might
find that updating would fix this for you, because of

Author: Dean Rasheed 
Branch: master Release: REL_12_BR [a0905056f] 2019-05-06 11:54:32 +0100
Branch: REL_11_STABLE Release: REL_11_3 [98dad4cd4] 2019-05-06 11:56:37 +0100
Branch: REL_10_STABLE Release: REL_10_8 [ca74e3e0f] 2019-05-06 11:58:32 +0100
Branch: REL9_6_STABLE Release: REL9_6_13 [71185228c] 2019-05-06 12:00:00 +0100
Branch: REL9_5_STABLE Release: REL9_5_17 [01256815a] 2019-05-06 12:01:44 +0100
Branch: REL9_4_STABLE Release: REL9_4_22 [3c009] 2019-05-06 12:05:05 +0100

Use checkAsUser for selectivity estimator checks, if it's set.

In examine_variable() and examine_simple_variable(), when checking the
user's table and column privileges to determine whether to grant
access to the pg_statistic data, use checkAsUser for the privilege
checks, if it's set. This will be the case if we're accessing the
table via a view, to indicate that we should perform privilege checks
as the view owner rather than the current user.

This change makes this planner check consistent with the check in the
executor, so the planner will be able to make use of statistics if the
table is accessible via the view. This fixes a performance regression
introduced by commit e2d4ef8de8, which affects queries against
non-security barrier views in the case where the user doesn't have
privileges on the underlying table, but the view owner does.

Note that it continues to provide the same safeguards controlling
access to pg_statistic for direct table access (in which case
checkAsUser won't be set) and for security barrier views, because of
the nearby checks on rte->security_barrier and rte->securityQuals.

Back-patch to all supported branches because e2d4ef8de8 was.

Dean Rasheed, reviewed by Jonathan Katz and Stephen Frost.

regards, tom lane




Re: Postgres not using correct indices for views.

2019-08-08 Thread Thomas Rosenstein

Hi,

I'm upgraded to 10.10 from today (on the replicated instance - main db 
is still 10.5), but still have the issue.


The table is owned by the user "creamfinance", and the view is also 
owned by the same user - based on the text you quoted this should allow 
the correct access.


The planner estimates the correct row counts, but still does the wrong 
planning.


Wrong:

```
Limit  (cost=1880359.00..1880359.03 rows=9 width=1508) (actual 
time=25093.258..25093.270 rows=9 loops=1)
   ->  Sort  (cost=1880359.00..1884101.04 rows=1496816 width=1508) 
(actual time=25093.257..25093.257 rows=9 loops=1)

 Sort Key: p.customer_id DESC
 Sort Method: top-N heapsort  Memory: 33kB
 ->  Hash Join  (cost=359555.11..1849150.95 rows=1496816 
width=1508) (actual time=1081.081..24251.466 rows=543231 loops=1)

   Hash Cond: (p.customer_id = l.customer_id)
   Join Filter: ((p.date - '3 days'::interval day) <= 
l.duedate)

   Rows Removed by Join Filter: 596120
   ->  Seq Scan on payments p  (cost=0.00..393323.74 
rows=10046437 width=228) (actual time=0.013..13053.366 rows=10054069 
loops=1)
   ->  Hash  (cost=67.49..67.49 rows=153409 
width=1272) (actual time=689.835..689.835 rows=156682 loops=1)

 Buckets: 32768  Batches: 8  Memory Usage: 7737kB
 ->  Bitmap Heap Scan on loans l  
(cost=22732.48..331833.40 rows=153409 width=1272) (actual 
time=64.142..398.893 rows=156682 loops=1)

   Recheck Cond: (location_id = 46)
   Heap Blocks: exact=105938
   ->  Bitmap Index Scan on 
loans_location_id_repaid_desc_id_index  (cost=0.00..22694.12 rows=153409 
width=0) (actual time=41.324..41.324 rows=157794 loops=1)

 Index Cond: (location_id = 46)
```

Correct:

```
 Limit  (cost=0.87..52.60 rows=9 width=1471)
   ->  Nested Loop  (cost=0.87..2961441.25 rows=515233 width=1471)
 ->  Index Scan Backward using loans_customer_id_index on loans 
 (cost=0.43..2215467.63 rows=153409 width=1257)

   Filter: (location_id = 46)
 ->  Index Scan using payments_customer_id_idx on payments  
(cost=0.43..4.76 rows=10 width=206)

   Index Cond: (customer_id = loans.customer_id)
   Filter: ((date - '3 days'::interval day) <= 
loans.duedate)

```

Thanks

Thomas

On 8 Aug 2019, at 18:05, Tom Lane wrote:


"Thomas Rosenstein"  writes:

we have created restricted view for our tables, so that we can allow
access to non-gdpr relevant data but hide everything else.
For exactly those views, the Query Planner uses the wrong indices, 
when
executing exactly the same query, once it takes 0.1 s and on the 
views

it takes nearly 18 sec (it does a full table scan, or uses the wrong
indices).
Do we have to GRANT additional rights? I see it's using some indices,
just not the correct ones!


Does EXPLAIN show reasonable rowcount estimates when you query
directly, but bad ones when you query via the views?

If so, a likely guess is that you're falling foul of the restrictions
added for CVE-2017-7484:

Author: Peter Eisentraut 
Branch: master Release: REL_10_BR [e2d4ef8de] 2017-05-08 09:26:32 
-0400
Branch: REL9_6_STABLE Release: REL9_6_3 [c33c42362] 2017-05-08 
09:18:57 -0400
Branch: REL9_5_STABLE Release: REL9_5_7 [d45cd7c0e] 2017-05-08 
09:19:07 -0400
Branch: REL9_4_STABLE Release: REL9_4_12 [3e5ea1f9b] 2017-05-08 
09:19:15 -0400
Branch: REL9_3_STABLE Release: REL9_3_17 [4f1b2089a] 2017-05-08 
09:19:23 -0400
Branch: REL9_2_STABLE Release: REL9_2_21 [d035c1b97] 2017-05-08 
09:19:42 -0400


Add security checks to selectivity estimation functions

Some selectivity estimation functions run user-supplied operators 
over
data obtained from pg_statistic without security checks, which 
allows
those operators to leak pg_statistic data without having 
privileges on
the underlying tables.  Fix by checking that one of the following 
is
satisfied: (1) the user has table or column privileges on the 
table
underlying the pg_statistic data, or (2) the function implementing 
the
user-supplied operator is leak-proof.  If neither is satisfied, 
planning

will proceed as if there are no statistics available.

At least one of these is satisfied in most cases in practice.  The 
only

situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.

Reported-by: Robert Haas 
Author: Peter Eisentraut 
Author: Tom Lane 

Security: CVE-2017-7484


However, if you're not on the latest minor releases, you might
find that updating would fix this for you, because of

Author: Dean Rasheed 
Branch: master Release: REL_12_BR [a0905056f] 2019-05-06 11:54:32 
+0100
Branch: REL_11_STABLE Release: REL_11_3 [98dad4cd4] 2019-05-06 
11:56:37 +0100
Branch: REL_10_STABLE Release: REL_10_8 [ca74e3e0f] 2019-05-06 
11:5

Re: Postgres not using correct indices for views.

2019-08-08 Thread Thomas Rosenstein
To add additional info, the same behaviour is exhibited with the owner, 
and the user which only has read priviledges on the view!


On 8 Aug 2019, at 22:04, Thomas Rosenstein wrote:


Hi,

I'm upgraded to 10.10 from today (on the replicated instance - main db 
is still 10.5), but still have the issue.


The table is owned by the user "creamfinance", and the view is also 
owned by the same user - based on the text you quoted this should 
allow the correct access.


The planner estimates the correct row counts, but still does the wrong 
planning.


Wrong:

```
Limit  (cost=1880359.00..1880359.03 rows=9 width=1508) (actual 
time=25093.258..25093.270 rows=9 loops=1)
   ->  Sort  (cost=1880359.00..1884101.04 rows=1496816 width=1508) 
(actual time=25093.257..25093.257 rows=9 loops=1)

 Sort Key: p.customer_id DESC
 Sort Method: top-N heapsort  Memory: 33kB
 ->  Hash Join  (cost=359555.11..1849150.95 rows=1496816 
width=1508) (actual time=1081.081..24251.466 rows=543231 loops=1)

   Hash Cond: (p.customer_id = l.customer_id)
   Join Filter: ((p.date - '3 days'::interval day) <= 
l.duedate)

   Rows Removed by Join Filter: 596120
   ->  Seq Scan on payments p  (cost=0.00..393323.74 
rows=10046437 width=228) (actual time=0.013..13053.366 rows=10054069 
loops=1)
   ->  Hash  (cost=67.49..67.49 rows=153409 
width=1272) (actual time=689.835..689.835 rows=156682 loops=1)

 Buckets: 32768  Batches: 8  Memory Usage: 7737kB
 ->  Bitmap Heap Scan on loans l  
(cost=22732.48..331833.40 rows=153409 width=1272) (actual 
time=64.142..398.893 rows=156682 loops=1)

   Recheck Cond: (location_id = 46)
   Heap Blocks: exact=105938
   ->  Bitmap Index Scan on 
loans_location_id_repaid_desc_id_index  (cost=0.00..22694.12 
rows=153409 width=0) (actual time=41.324..41.324 rows=157794 loops=1)

 Index Cond: (location_id = 46)
```

Correct:

```
 Limit  (cost=0.87..52.60 rows=9 width=1471)
   ->  Nested Loop  (cost=0.87..2961441.25 rows=515233 width=1471)
 ->  Index Scan Backward using loans_customer_id_index on 
loans  (cost=0.43..2215467.63 rows=153409 width=1257)

   Filter: (location_id = 46)
 ->  Index Scan using payments_customer_id_idx on payments  
(cost=0.43..4.76 rows=10 width=206)

   Index Cond: (customer_id = loans.customer_id)
   Filter: ((date - '3 days'::interval day) <= 
loans.duedate)

```

Thanks

Thomas

On 8 Aug 2019, at 18:05, Tom Lane wrote:


"Thomas Rosenstein"  writes:

we have created restricted view for our tables, so that we can allow
access to non-gdpr relevant data but hide everything else.
For exactly those views, the Query Planner uses the wrong indices, 
when
executing exactly the same query, once it takes 0.1 s and on the 
views

it takes nearly 18 sec (it does a full table scan, or uses the wrong
indices).
Do we have to GRANT additional rights? I see it's using some 
indices,

just not the correct ones!


Does EXPLAIN show reasonable rowcount estimates when you query
directly, but bad ones when you query via the views?

If so, a likely guess is that you're falling foul of the restrictions
added for CVE-2017-7484:

Author: Peter Eisentraut 
Branch: master Release: REL_10_BR [e2d4ef8de] 2017-05-08 09:26:32 
-0400
Branch: REL9_6_STABLE Release: REL9_6_3 [c33c42362] 2017-05-08 
09:18:57 -0400
Branch: REL9_5_STABLE Release: REL9_5_7 [d45cd7c0e] 2017-05-08 
09:19:07 -0400
Branch: REL9_4_STABLE Release: REL9_4_12 [3e5ea1f9b] 2017-05-08 
09:19:15 -0400
Branch: REL9_3_STABLE Release: REL9_3_17 [4f1b2089a] 2017-05-08 
09:19:23 -0400
Branch: REL9_2_STABLE Release: REL9_2_21 [d035c1b97] 2017-05-08 
09:19:42 -0400


Add security checks to selectivity estimation functions

Some selectivity estimation functions run user-supplied operators 
over
data obtained from pg_statistic without security checks, which 
allows
those operators to leak pg_statistic data without having 
privileges on
the underlying tables.  Fix by checking that one of the following 
is
satisfied: (1) the user has table or column privileges on the 
table
underlying the pg_statistic data, or (2) the function 
implementing the
user-supplied operator is leak-proof.  If neither is satisfied, 
planning

will proceed as if there are no statistics available.

At least one of these is satisfied in most cases in practice.  
The only

situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.

Reported-by: Robert Haas 
Author: Peter Eisentraut 
Author: Tom Lane 

Security: CVE-2017-7484


However, if you're not on the latest minor releases, you might
find that updating would fix this for you, because of

Author: Dean Rasheed 
Branch: master Release: REL_12_BR 

Re: Postgres not using correct indices for views.

2019-08-08 Thread Tom Lane
"Thomas Rosenstein"  writes:
> The planner estimates the correct row counts, but still does the wrong 
> planning.

Hm, I'm not exactly convinced.  You show

> Wrong:
>   ->  Hash Join  (cost=359555.11..1849150.95 rows=1496816 
> width=1508) (actual time=1081.081..24251.466 rows=543231 loops=1)
> Hash Cond: (p.customer_id = l.customer_id)
> Join Filter: ((p.date - '3 days'::interval day) <= 
> l.duedate)
> Rows Removed by Join Filter: 596120

> Correct:
> ->  Nested Loop  (cost=0.87..2961441.25 rows=515233 width=1471)

The join size estimate seems a lot closer to being correct in the
second case, which could lend support to the idea that statistics
aren't being applied in the first case.

However ... it sort of looks like the planner didn't even consider
the second plan shape in the "wrong" case.  If it had, then even
if it costed it 3X more than it did in the "right" case, the second
plan would still have won out by orders of magnitude.  So there's
something else going on.

Can you show the actual query and table and view definitions?

regards, tom lane