Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction

2019-06-21 Thread Simon Riggs
On Thu, 20 Jun 2019 at 16:13, Chris Wilson 
wrote:

> With the following results:
>
>
>
> testing=# explain analyze select * from brin_test where id >= 9;
>
>QUERY PLAN
>
>
> -
>
> Bitmap Heap Scan on brin_test  (cost=8.55..630.13 rows=10146 width=8)
> (actual time=0.474..1.796 rows=10001 loops=1)
>
>Recheck Cond: (id >= 9)
>
>Rows Removed by Index Recheck: 3215
>
>Heap Blocks: lossy=59
>
>->  Bitmap Index Scan on idx_brin_test_brin  (cost=0.00..6.02
> rows=14286 width=0) (actual time=0.026..0.026 rows=640 loops=1)
>
>  Index Cond: (id >= 9)
>
> Planning Time: 0.155 ms
>
> Execution Time: 2.133 ms
>
> (8 rows)
>
>
>
> testing=# explain analyze select * from brin_test where id >= 9 and r
> in (1,3);
>
>QUERY PLAN
>
>
> -
>
> Bitmap Heap Scan on brin_test  (cost=6.06..556.21 rows=219 width=8)
> (actual time=6.101..23.927 rows=200 loops=1)
>
>Recheck Cond: ((id >= 9) AND (r = ANY ('{1,3}'::integer[])))
>
>Rows Removed by Index Recheck: 13016
>
>Heap Blocks: lossy=59
>
>->  Bitmap Index Scan on idx_brin_test_brin  (cost=0.00..6.01 rows=7143
> width=0) (actual time=0.038..0.038 rows=1280 loops=1)
>
>  Index Cond: ((id >= 9) AND (r = ANY ('{1,3}'::integer[])))
>
> Planning Time: 0.071 ms
>
> Execution Time: 23.954 ms
>
> (8 rows)
>
>
>
> Note that introducing a disjunction (set of possible values) into the
> query doubles the number of actual rows returned, and increases the number
> removed by the index recheck. It looks to me as though perhaps the BRIN
> index does not completely support queries with a set of possible values,
> and executes the query multiple times (try adding more values of R to see
> what I mean). The execution time also increases massively.
>
>
>
> Could anyone help me to understand what’s going on here, and whether
> there’s a bug or limitation of BRIN indexes? If it’s a limitation, then the
> query planner does not seem to account for it, and chooses this plan even
> when it’s a bad one (much worse than removing result rows using a filter).
>

In both cases the index is returning a lossy bitmap of 59 heap blocks. The
second query is more restrictive, so the number removed by index recheck is
higher. The total of number rows returned plus the number of rows removed
by index recheck is the same in both cases.

The only weirdness is why the index reports it has returned 640 rows in one
query and 1280 in second query. Since a lossy bitmap is returned, that
figure can only be an estimate. The estimate differs between queries, but
is wrong in both cases.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Solutions for the Enterprise


RE: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction

2019-06-21 Thread Chris Wilson
That makes perfect sense, thanks Simon!

Chris.

From: Simon Riggs 
Sent: 21 June 2019 10:17
To: Chris Wilson 
Cc: [email protected]
Subject: Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction

On Thu, 20 Jun 2019 at 16:13, Chris Wilson 
mailto:[email protected]>> wrote:
With the following results:

testing=# explain analyze select * from brin_test where id >= 9;
   QUERY PLAN
-
Bitmap Heap Scan on brin_test  (cost=8.55..630.13 rows=10146 width=8) (actual 
time=0.474..1.796 rows=10001 loops=1)
   Recheck Cond: (id >= 9)
   Rows Removed by Index Recheck: 3215
   Heap Blocks: lossy=59
   ->  Bitmap Index Scan on idx_brin_test_brin  (cost=0.00..6.02 rows=14286 
width=0) (actual time=0.026..0.026 rows=640 loops=1)
 Index Cond: (id >= 9)
Planning Time: 0.155 ms
Execution Time: 2.133 ms
(8 rows)

testing=# explain analyze select * from brin_test where id >= 9 and r in 
(1,3);
   QUERY PLAN
-
Bitmap Heap Scan on brin_test  (cost=6.06..556.21 rows=219 width=8) (actual 
time=6.101..23.927 rows=200 loops=1)
   Recheck Cond: ((id >= 9) AND (r = ANY ('{1,3}'::integer[])))
   Rows Removed by Index Recheck: 13016
   Heap Blocks: lossy=59
   ->  Bitmap Index Scan on idx_brin_test_brin  (cost=0.00..6.01 rows=7143 
width=0) (actual time=0.038..0.038 rows=1280 loops=1)
 Index Cond: ((id >= 9) AND (r = ANY ('{1,3}'::integer[])))
Planning Time: 0.071 ms
Execution Time: 23.954 ms
(8 rows)

Note that introducing a disjunction (set of possible values) into the query 
doubles the number of actual rows returned, and increases the number removed by 
the index recheck. It looks to me as though perhaps the BRIN index does not 
completely support queries with a set of possible values, and executes the 
query multiple times (try adding more values of R to see what I mean). The 
execution time also increases massively.

Could anyone help me to understand what’s going on here, and whether there’s a 
bug or limitation of BRIN indexes? If it’s a limitation, then the query planner 
does not seem to account for it, and chooses this plan even when it’s a bad one 
(much worse than removing result rows using a filter).

In both cases the index is returning a lossy bitmap of 59 heap blocks. The 
second query is more restrictive, so the number removed by index recheck is 
higher. The total of number rows returned plus the number of rows removed by 
index recheck is the same in both cases.

The only weirdness is why the index reports it has returned 640 rows in one 
query and 1280 in second query. Since a lossy bitmap is returned, that figure 
can only be an estimate. The estimate differs between queries, but is wrong in 
both cases.

--
Simon Riggs
http://www.2ndQuadrant.com/
PostgreSQL Solutions for the Enterprise


RE: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction

2019-06-21 Thread Chris Wilson
One possible optimisation occurred to me (that I guess we can’t currently do). 
If we use a larger example with some correlation in r, we can see that the time 
to execute the bitmap index scan is proportional to the number of items in the 
IN/ANY disjunction:

drop table brin_test;
create table brin_test AS SELECT g as id, ((g / 100) + (g % 100)) as r
from generate_series(1,1000) as g;
create index idx_brin_test_brin on brin_test using brin (id, r) with 
(pages_per_range = 32);
vacuum analyze brin_test;
set max_parallel_workers_per_gather = 0;

/* Note that these queries return no results, so there is no time spent in 
bitmap heap scan, rechecking the conditions: */
explain analyze select * from brin_test where id >= 900 and r = 
any(array_fill(1, ARRAY[100]));
explain analyze select * from brin_test where id >= 900 and r = 
any(array_fill(1, ARRAY[1000]));

With the following results (long arrays elided):

testing=# explain analyze select * from brin_test where id >= 900 and r = 
any(array_fill(1, ARRAY[100]));
Bitmap Heap Scan on brin_test  (cost=15.27..11781.13 rows=1031 width=8) (actual 
time=23.830..23.830 rows=0 loops=1)
   Recheck Cond: ((id >= 900) AND (r = ANY ('{1,…,1}'::integer[])))
   ->  Bitmap Index Scan on idx_brin_test_brin  (cost=0.00..15.01 rows=7231 
width=0) (actual time=23.829..23.829 rows=0 loops=1)
 Index Cond: ((id >= 900) AND (r = ANY ('{1,…,1}'::integer[])))
Planning Time: 0.092 ms
Execution Time: 23.853 ms
(6 rows)

testing=# explain analyze select * from brin_test where id >= 900 and r = 
any(array_fill(1, ARRAY[1000]));
Bitmap Heap Scan on brin_test  (cost=17.59..36546.51 rows=10308 width=8) 
(actual time=237.748..237.748 rows=0 loops=1)
   Recheck Cond: ((id >= 900) AND (r = ANY ('{1,…,1}'::integer[])))
   ->  Bitmap Index Scan on idx_brin_test_brin  (cost=0.00..15.02 rows=14461 
width=0) (actual time=237.747..237.747 rows=0 loops=1)
 Index Cond: ((id >= 900) AND (r = ANY ('{1,…,1}'::integer[])))
Planning Time: 0.354 ms
Execution Time: 237.817 ms
(6 rows)

We can see that scanning 10x as many values takes 10x as long. It seems that we 
are checking each value in the array individually. However, since the BRIN 
index stores ranges of values in each block, all we care about (for the index 
scan) is whether the ranges overlap with the query. So we could compute the 
minimum and maximum in the array, and check whether each block contains any 
values in that range, and if so (and the other conditions are met) then emit 
the block for heap scanning. Does that make sense?

If I manually add those conditions to the query, it uses them and speeds up by 
about 1000 times:

testing=# explain analyze select * from brin_test where id >= 900 and r >= 
1 and r <= 1 and r = any(array_fill(1, ARRAY[1000]));
Bitmap Heap Scan on brin_test  (cost=15.01..19951.91 rows=1 width=8) (actual 
time=0.263..0.263 rows=0 loops=1)
   Recheck Cond: ((id >= 900) AND (r >= 1) AND (r <= 1))
   Filter: (r = ANY ('{1,…,1}'::integer[]))
   ->  Bitmap Index Scan on idx_brin_test_brin  (cost=0.00..15.01 rows=7231 
width=0) (actual time=0.261..0.262 rows=0 loops=1)
 Index Cond: ((id >= 900) AND (r >= 1) AND (r <= 1))
Planning Time: 0.393 ms
Execution Time: 0.280 ms
(7 rows)

From: Chris Wilson
Sent: 21 June 2019 10:20
To: 'Simon Riggs' 
Cc: [email protected]
Subject: RE: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction

That makes perfect sense, thanks Simon!

Chris.

From: Simon Riggs mailto:[email protected]>>
Sent: 21 June 2019 10:17
To: Chris Wilson 
mailto:[email protected]>>
Cc: [email protected]
Subject: Re: EXPLAIN ANALYZE of BRIN bitmap index scan with disjunction

On Thu, 20 Jun 2019 at 16:13, Chris Wilson 
mailto:[email protected]>> wrote:
With the following results:

testing=# explain analyze select * from brin_test where id >= 9;
   QUERY PLAN
-
Bitmap Heap Scan on brin_test  (cost=8.55..630.13 rows=10146 width=8) (actual 
time=0.474..1.796 rows=10001 loops=1)
   Recheck Cond: (id >= 9)
   Rows Removed by Index Recheck: 3215
   Heap Blocks: lossy=59
   ->  Bitmap Index Scan on idx_brin_test_brin  (cost=0.00..6.02 rows=14286 
width=0) (actual time=0.026..0.026 rows=640 loops=1)
 Index Cond: (id >= 9)
Planning Time: 0.155 ms
Execution Time: 2.133 ms
(8 rows)

testing=# explain analyze select * from brin_test where id >= 9 and r in 
(1,3);
   QUERY PLAN
-
Bitmap Heap Scan on brin_test  (cost=6.06..556.21 rows=219 width=8) (actual 
tim

Using indexes in RLS policies (sub)queries

2019-06-21 Thread Grégory EL MAJJOUTI
Hello everyone,

I am attempting to set up a row level security policy based on geo-location
(and the PostGIS extension). I am struggling to have it make use of column
indexes.

The following example defines a table with geography points and aims to
restrict access to it based on distance to another set of points in a
secondary table. It has been tested on 11.2.


CREATE EXTENSION postgis;

-- This is the table we want to secure with RLS
DROP TABLE IF EXISTS example1;
CREATE TABLE example1 (
id   serial  NOT NULL,
geo geographyNULL,
CONSTRAINT  example1_pk PRIMARY KEY (id)
) with ( OIDS=FALSE );

-- Seed the table with 100k random points
INSERT INTO example1(geo)
SELECT  ST_SetSRID(
ST_MakePoint(
(random()*360.0) - 180.0,
(random()*180.0) - 90.0),
4326) as geom
FROM  generate_series(1, 10);

CREATE INDEX example1_spx ON example1 USING GIST (geo);

-- This table will hold points for the row level policy
DROP TABLE IF EXISTS example_acl;
CREATE TABLE example_acl (
geo geographyNULL
) with ( OIDS=FALSE );

INSERT INTO example_acl(geo)
SELECT  ST_SetSRID(
ST_MakePoint(
(random()*360.0) - 180.0,
(random()*180.0) - 90.0),
4326) as geom
FROM  generate_series(1, 100);


-- Simple query that performs an index scan
EXPLAIN ANALYZE VERBOSE SELECT count(*) from example1
INNER JOIN example_acl on st_dwithin(example_acl.geo, example1.geo, 1000)


Aggregate  (cost=12364.11..12364.12 rows=1 width=8) (actual
time=4.802..4.802 rows=1 loops=1)
  Output: count(*)
  ->  Nested Loop  (cost=0.41..12364.00 rows=45 width=0) (actual
time=4.797..4.797 rows=0 loops=1)
->  Seq Scan on public.example_acl  (cost=0.00..23.60 rows=1360
width=32) (actual time=0.034..0.066 rows=100 loops=1)
  Output: example_acl.geo
->  Index Scan using example1_spx on public.example1
 (cost=0.41..9.06 rows=1 width=32) (actual time=0.044..0.044 rows=0
loops=100)
  Output: example1.id, example1.geo
  Index Cond: (example1.geo && _st_expand(example_acl.geo,
'1000'::double precision))
  Filter: ((example_acl.geo && _st_expand(example1.geo,
'1000'::double precision)) AND _st_dwithin(example_acl.geo, example1.geo,
'1000'::double precision, true))
Planning time: 60.690 ms
Execution time: 5.006 ms


-- Setting up the policy
CREATE ROLE example_role;
GRANT SELECT ON TABLE example1 to example_role;
GRANT SELECT ON TABLE example_acl to example_role;
ALTER TABLE example1 ENABLE ROW LEVEL SECURITY;

CREATE POLICY example_location_policy ON example1
   AS permissive
   FOR SELECT
   TO example_role
   USING (
   EXISTS (
 SELECT 1
 FROM example_acl
 WHERE (
   st_dwithin(example_acl.geo, example1.geo, 1000)
 )
   )
);


SET ROLE example_role;
EXPLAIN ANALYZE VERBOSE SELECT count(*) from example1;

Aggregate  (cost=5251959.00..5251959.01 rows=1 width=8) (actual
time=9256.606..9256.606 rows=1 loops=1)
  Output: count(*)
  ->  Seq Scan on public.example1  (cost=0.00..5251834.00 rows=5
width=0) (actual time=9256.601..9256.601 rows=0 loops=1)
Output: example1.id, example1.geo
Filter: (SubPlan 1)
Rows Removed by Filter: 10
SubPlan 1
  ->  Seq Scan on public.example_acl  (cost=0.00..52.50 rows=1
width=0) (actual time=0.089..0.089 rows=0 loops=10)
Filter: ((example_acl.geo && _st_expand(example1.geo,
'1000'::double precision)) AND (example1.geo && _st_expand(example_acl.geo,
'1000'::double precision)) AND _st_dwithin(example_acl.geo, example1.geo,
'1000'::double precision, true))
Rows Removed by Filter: 100
Planning time: 67.601 ms
Execution time: 9256.812 ms

As you can see, the policy does not use the index example1_spx on the
geography column.
Is there a way to rewrite that policy so that it would make use of the
index?

Thank you in advance.

Best regards,
Grégory El Majjouti