GIST combo index condition chosen for users queries is different from table owner's query

2022-09-19 Thread Dennis White
My project is using PostgreSQL 12.9 and has a table that gets millions of
positions of ships every day. It is partitioned by tier (the importance of
position) and sub-partitioned by time.
We also use RLS to restrict access to rows in the table.

Our problem is the query planner will not let user queries use the spatial
condition like it does for the table owner and as a result the queries
don't ever finish.
I created a minimal test case of the problem and have attached all of the
statements needed to recreate the problem

The key statements are as follows:

CREATE TABLE IF NOT EXISTS test.qtest_position (
position_id bigint NOT NULL,
tiersmallint,
toi timestamptz(0) NOT NULL,
track_idbigint,
security_tagvarchar(33),
posit   public.geometry(Point)
) PARTITION BY LIST (tier);

CREATE TABLE IF NOT EXISTS test.qtest_posit_t1 PARTITION OF
test.qtest_position
FOR VALUES IN (1) PARTITION BY RANGE (toi);

CREATE TABLE test.qtest_posit_t1_template ( LIKE test.qtest_posit_t1
INCLUDING ALL );
ALTER TABLE test.qtest_posit_t1_template ADD PRIMARY KEY (position_id);
CREATE INDEX ON test.qtest_posit_t1_template (track_id, toi);
CREATE INDEX ON test.qtest_posit_t1_template USING GIST (posit, toi)
include (security_tag);

SELECT partman.create_parent('test.qtest_posit_t1', 'toi', 'native', '1
days',
  p_premake := 20, p_start_partition := '2022-09-01', p_template_table :=
'test.qtest_posit_t1_template');

Here's a simple example of a user query and the resulting plan:

set role test_user;
SET
explain analyze SELECT
position_961.SECURITY_TAG,
position_961.TRACK_ID,
position_961.POSIT,
position_961.POSITION_ID,
position_961.TOI
FROM
test.qtest_position position_961
WHERE
(ST_Intersects( position_961.POSIT,
ST_SETSRID(ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[
67.729806,
15.734972],[49.077444,-9.989],[39.995417,-3.982083],[51.616528,17.977861],[67.729806,15.734972]]]}')
 , 4326)  )
)  AND
 position_961.TOI BETWEEN '2022-09-02T20:28:42.753Z'::TIMESTAMPTZ AND
'2022-09-03T20:28:42.753Z'::TIMESTAMPTZ
  and test.user_has_access(security_tag) = '1';



QUERY PLAN






--
 Append  (cost=0.42..495598323.78 rows=2 width=60) (actual
time=7246.524..31614.944 rows=699 loops=1)
   ->  Index Scan using
qtest_posit_t1_p2022_09_02_posit_toi_security_tag_idx on
qtest_posit_t1_p2022_09_02 position_961  (cost=0.42..72721949.89 rows=1
width=60) (actual time=7246.523..9560.695 rows=
121 loops=1)
 Index Cond: ((toi >= '2022-09-02 20:28:42.753+00'::timestamp with
time zone) AND (toi <= '2022-09-03 20:28:42.753+00'::timestamp with time
zone))
 Filter: ((test.user_has_access(security_tag) = '1'::text) AND
(test.user_has_access(security_tag) = '1'::text) AND st_intersects(posit,
'010320E610010005002C9B3924B5EE504091F3
FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))
 Rows Removed by Filter: 2851335
   ->  Seq Scan on qtest_posit_t1_p2022_09_03 position_961_1
 (cost=0.00..422876373.88 rows=1 width=60) (actual time=84.624..22054.050
rows=578 loops=1)
 Filter: ((toi >= '2022-09-02 20:28:42.753+00'::timestamp with time
zone) AND (toi <= '2022-09-03 20:28:42.753+00'::timestamp with time zone)
AND (test.user_has_access(security_tag) = '1'::tex
t) AND (test.user_has_access(security_tag) = '1'::text) AND
st_intersects(posit,
'010320E610010005002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9
F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))
 Rows Removed by Filter: 16561241
 Planning Time: 0.369 ms
 Execution Time: 31615.459 ms
(10 rows)

-- run the same query as table owner
set role test_owner;
SET
explain analyze SELECT
position_961.SECURITY_TAG,
position_961.TRACK_ID,
position_961.POSIT,
position_961.POSITION_ID,
position_961.TOI
FROM
test.qtest_position position_961
WHERE
(ST_Intersects( position_961.POSIT,
ST_SETSRID(ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[
67.729806,
15.734972],[49.077444,-9.989],[39.995417,-3.982083],[51.616528,17.977861],[67.729806,15.734972]]]}')
 , 4326)  )
)  AND
 position_961.TOI BETWEEN '

Re: GIST combo index condition chosen for users queries is different from table owner's query

2022-09-20 Thread Dennis White
Thanks.
As soon as I read your reply I recalled the leakproof issue from a
discussion with a former colleague years ago.
At the time, I was new to Postgresql and I realize now I should have
remembered that.

Disabling the RLS indeed resulted in the superior plan for the test_user.
The harder part will be baking the function call used for RLS into all
query predicates rather than relying on RLS to do it for us.
I also recall that we got around the leakproof problem in postgres 10.2 by
somehow just declaring st_intersects() to be leakproof but that would
probably not work in an AWS RDS deployment. I will research the leakproof
issue more and see what options we may have in dealing with this problem.
Perhaps sometime in the future RLS won't break such queries but I
understand that is probably not an easy task.
Thanks for replying and helping me on my way.

Dennis

On Mon, Sep 19, 2022 at 7:28 PM Tom Lane  wrote:

> Dennis White  writes:
> > Is there something I can do to allow users queries to use the index with
> a
> > condition like that used for the table owner's query?
>
> It looks like the problem in your badly-optimized query is that
> there is not an indexable condition being extracted from the
> ST_INTERSECTS() call.  In the well-optimized one, we've got
>
>->  Index Scan using
> qtest_posit_t1_p2022_09_02_posit_toi_security_tag_idx...
>  Index Cond: ((posit &&
> '010320E610010005002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry)
> AND ...
>  Filter: ((test.user_has_access(security_tag) = '1'::text) AND
> st_intersects(posit,
>
> '010320E610010005002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry))
>
> I presume what's happening there is that st_intersects() has got a support
> function that knows that "st_intersects(foo, bar)" implies "foo && bar"
> and the latter can be used with an index on foo.
>
> However, to do that in the presence of RLS we have to know that the
> extracted condition would be leakproof.  I'm not sure that the geometry &&
> operator is leakproof in the first place; and even if it is, we might not
> consider this option unless st_intersects() is also marked leakproof,
> which most likely it isn't.  You'd have to ask the PostGIS crew whether
> either of those things would be safe to consider leakproof ... but I'm
> betting they'll say that doing so would create an unreasonably large
> bug surface.
>
> By and large, the combination of RLS with complicated WHERE conditions
> is just deadly for performance, because most of the time we won't be
> able to use the WHERE conditions until after applying the RLS filter.
> Do you really need to use RLS in this application?  If you're stuck
> doing so, you could maybe ameliorate things by implementing the RLS
> check functions in the fastest way you can, like writing C code
> for them.
>
> regards, tom lane
>


Re: How to do faster DML

2024-02-04 Thread Dennis White
I'm surprised no one has mentioned perhaps it's a good idea to partition
this table while adding the pk. By your own statements the table is
difficult to work with as is. Without partitioning the table, row inserts
would need to walk the pk index and could be a factor. If this is static
table then perhaps that's ok but if not...

Anyway I don't recall what type the ID was or how it's set but i suggest
you seriously investigate using it to partition this table into manageable
smaller tables.
Best of luck.

On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro  wrote:

> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000
>> offset 20_000_000;
>>
>
> You can use min/max values grouping them by 10_000_000 records, so you
> don´t need that offset, then generate commands and run them.
>
> select format('insert into mytable2 select * from mytable1 where i between
> %s and %s;', max(i), min(i)) from t group by i/10_000_000;
>


Can a long running procedure detect when smart shutdown is pending?

2024-07-05 Thread Dennis White
My project's DB has a mutli-step stored procedure using Transaction Control
that may take 30 minutes or more to complete.
I am curious if there is a way to make it more smart shutdown friendly so
it can stop between steps?

We are using both PG 14 and PG 16 on Rhel 8.
Pardon me if it's obvious but is there a function to call or a table that
could be checked after a commit within the procedure to determine a
shutdown is pending?

Thanks,
Dennis


Re: Can a long running procedure detect when smart shutdown is pending?

2024-07-05 Thread Dennis White
Thanks for responding.
This will be a stored procedure written in plpgsql that's kicked off by
pg_cron.
I wasn't sure a normal smart shutdown would stop it.

Thanks

On Fri, Jul 5, 2024 at 4:57 PM Achilleas Mantzios <
a.mantz...@cloud.gatewaynet.com> wrote:

> Στις 5/7/24 21:12, ο/η Dennis White έγραψε:
>
> My project's DB has a mutli-step stored procedure using Transaction
> Control that may take 30 minutes or more to complete.
> I am curious if there is a way to make it more smart shutdown friendly so
> it can stop between steps?
>
> We are using both PG 14 and PG 16 on Rhel 8.
> Pardon me if it's obvious but is there a function to call or a table that
> could be checked after a commit within the procedure to determine a
> shutdown is pending?
>
> Maybe somehow checking the log for a message like :
>
> received smart shutdown request
>
> Or use tail_n_mail against the log and then implement some logic using
> NOTIFY , ideally you want your procedure to be interrupted rather than do
> polling.
>
>
> Thanks,
> Dennis
>
> --
> Achilleas Mantzios
>  IT DEV - HEAD
>  IT DEPT
>  Dynacom Tankers Mgmt (as agents only)
>
>


Re: Can a long running procedure detect when smart shutdown is pending?

2024-07-06 Thread Dennis White
Thanks everyone. A more direct way to check via a sql function would be
better but I suppose the dblink extension method will work.

Thanks again,
Dennis

On Sat, Jul 6, 2024 at 9:38 AM Tom Lane  wrote:

> Laurenz Albe  writes:
> > On Fri, 2024-07-05 at 14:12 -0400, Dennis White wrote:
> >> My project's DB has a mutli-step stored procedure using Transaction
> Control that may take 30 minutes or more to complete.
> >> I am curious if there is a way to make it more smart shutdown friendly
> so it can stop between steps?
>
> > I don't think there is a direct way to do that in SQL; that would
> require a new
> > system function that exposes canAcceptConnections() in SQL.
>
> It's worse than that: the state variables involved are local to the
> postmaster, so you wouldn't get the right answer in a backend even
> if the function were reachable.
>
> > What you could do is use the dblink extension to connect to the local
> database.
> > If you get an error "the database system is shutting down", there is a
> smart
> > shutdown in progress.
>
> This'd probably work.  Ugly, but ...
>
> regards, tom lane
>