GIST combo index condition chosen for users queries is different from table owner's query
My project is using PostgreSQL 12.9 and has a table that gets millions of positions of ships every day. It is partitioned by tier (the importance of position) and sub-partitioned by time. We also use RLS to restrict access to rows in the table. Our problem is the query planner will not let user queries use the spatial condition like it does for the table owner and as a result the queries don't ever finish. I created a minimal test case of the problem and have attached all of the statements needed to recreate the problem The key statements are as follows: CREATE TABLE IF NOT EXISTS test.qtest_position ( position_id bigint NOT NULL, tiersmallint, toi timestamptz(0) NOT NULL, track_idbigint, security_tagvarchar(33), posit public.geometry(Point) ) PARTITION BY LIST (tier); CREATE TABLE IF NOT EXISTS test.qtest_posit_t1 PARTITION OF test.qtest_position FOR VALUES IN (1) PARTITION BY RANGE (toi); CREATE TABLE test.qtest_posit_t1_template ( LIKE test.qtest_posit_t1 INCLUDING ALL ); ALTER TABLE test.qtest_posit_t1_template ADD PRIMARY KEY (position_id); CREATE INDEX ON test.qtest_posit_t1_template (track_id, toi); CREATE INDEX ON test.qtest_posit_t1_template USING GIST (posit, toi) include (security_tag); SELECT partman.create_parent('test.qtest_posit_t1', 'toi', 'native', '1 days', p_premake := 20, p_start_partition := '2022-09-01', p_template_table := 'test.qtest_posit_t1_template'); Here's a simple example of a user query and the resulting plan: set role test_user; SET explain analyze SELECT position_961.SECURITY_TAG, position_961.TRACK_ID, position_961.POSIT, position_961.POSITION_ID, position_961.TOI FROM test.qtest_position position_961 WHERE (ST_Intersects( position_961.POSIT, ST_SETSRID(ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[ 67.729806, 15.734972],[49.077444,-9.989],[39.995417,-3.982083],[51.616528,17.977861],[67.729806,15.734972]]]}') , 4326) ) ) AND position_961.TOI BETWEEN '2022-09-02T20:28:42.753Z'::TIMESTAMPTZ AND '2022-09-03T20:28:42.753Z'::TIMESTAMPTZ and test.user_has_access(security_tag) = '1'; QUERY PLAN -- Append (cost=0.42..495598323.78 rows=2 width=60) (actual time=7246.524..31614.944 rows=699 loops=1) -> Index Scan using qtest_posit_t1_p2022_09_02_posit_toi_security_tag_idx on qtest_posit_t1_p2022_09_02 position_961 (cost=0.42..72721949.89 rows=1 width=60) (actual time=7246.523..9560.695 rows= 121 loops=1) Index Cond: ((toi >= '2022-09-02 20:28:42.753+00'::timestamp with time zone) AND (toi <= '2022-09-03 20:28:42.753+00'::timestamp with time zone)) Filter: ((test.user_has_access(security_tag) = '1'::text) AND (test.user_has_access(security_tag) = '1'::text) AND st_intersects(posit, '010320E610010005002C9B3924B5EE504091F3 FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry)) Rows Removed by Filter: 2851335 -> Seq Scan on qtest_posit_t1_p2022_09_03 position_961_1 (cost=0.00..422876373.88 rows=1 width=60) (actual time=84.624..22054.050 rows=578 loops=1) Filter: ((toi >= '2022-09-02 20:28:42.753+00'::timestamp with time zone) AND (toi <= '2022-09-03 20:28:42.753+00'::timestamp with time zone) AND (test.user_has_access(security_tag) = '1'::tex t) AND (test.user_has_access(security_tag) = '1'::text) AND st_intersects(posit, '010320E610010005002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9 F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry)) Rows Removed by Filter: 16561241 Planning Time: 0.369 ms Execution Time: 31615.459 ms (10 rows) -- run the same query as table owner set role test_owner; SET explain analyze SELECT position_961.SECURITY_TAG, position_961.TRACK_ID, position_961.POSIT, position_961.POSITION_ID, position_961.TOI FROM test.qtest_position position_961 WHERE (ST_Intersects( position_961.POSIT, ST_SETSRID(ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[ 67.729806, 15.734972],[49.077444,-9.989],[39.995417,-3.982083],[51.616528,17.977861],[67.729806,15.734972]]]}') , 4326) ) ) AND position_961.TOI BETWEEN '
Re: GIST combo index condition chosen for users queries is different from table owner's query
Thanks. As soon as I read your reply I recalled the leakproof issue from a discussion with a former colleague years ago. At the time, I was new to Postgresql and I realize now I should have remembered that. Disabling the RLS indeed resulted in the superior plan for the test_user. The harder part will be baking the function call used for RLS into all query predicates rather than relying on RLS to do it for us. I also recall that we got around the leakproof problem in postgres 10.2 by somehow just declaring st_intersects() to be leakproof but that would probably not work in an AWS RDS deployment. I will research the leakproof issue more and see what options we may have in dealing with this problem. Perhaps sometime in the future RLS won't break such queries but I understand that is probably not an easy task. Thanks for replying and helping me on my way. Dennis On Mon, Sep 19, 2022 at 7:28 PM Tom Lane wrote: > Dennis White writes: > > Is there something I can do to allow users queries to use the index with > a > > condition like that used for the table owner's query? > > It looks like the problem in your badly-optimized query is that > there is not an indexable condition being extracted from the > ST_INTERSECTS() call. In the well-optimized one, we've got > >-> Index Scan using > qtest_posit_t1_p2022_09_02_posit_toi_security_tag_idx... > Index Cond: ((posit && > '010320E610010005002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry) > AND ... > Filter: ((test.user_has_access(security_tag) = '1'::text) AND > st_intersects(posit, > > '010320E610010005002C9B3924B5EE504091F3FE3F4E782F40BEA25BAFE9894840EE7C3F355EFA23C0F47002D369FF434019A9F7544EDB0FC0BE88B663EACE4940AB08371955FA31402C9B3924B5EE504091F3FE3F4E782F40'::geometry)) > > I presume what's happening there is that st_intersects() has got a support > function that knows that "st_intersects(foo, bar)" implies "foo && bar" > and the latter can be used with an index on foo. > > However, to do that in the presence of RLS we have to know that the > extracted condition would be leakproof. I'm not sure that the geometry && > operator is leakproof in the first place; and even if it is, we might not > consider this option unless st_intersects() is also marked leakproof, > which most likely it isn't. You'd have to ask the PostGIS crew whether > either of those things would be safe to consider leakproof ... but I'm > betting they'll say that doing so would create an unreasonably large > bug surface. > > By and large, the combination of RLS with complicated WHERE conditions > is just deadly for performance, because most of the time we won't be > able to use the WHERE conditions until after applying the RLS filter. > Do you really need to use RLS in this application? If you're stuck > doing so, you could maybe ameliorate things by implementing the RLS > check functions in the fastest way you can, like writing C code > for them. > > regards, tom lane >
Re: How to do faster DML
I'm surprised no one has mentioned perhaps it's a good idea to partition this table while adding the pk. By your own statements the table is difficult to work with as is. Without partitioning the table, row inserts would need to walk the pk index and could be a factor. If this is static table then perhaps that's ok but if not... Anyway I don't recall what type the ID was or how it's set but i suggest you seriously investigate using it to partition this table into manageable smaller tables. Best of luck. On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro wrote: > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 >> offset 20_000_000; >> > > You can use min/max values grouping them by 10_000_000 records, so you > don´t need that offset, then generate commands and run them. > > select format('insert into mytable2 select * from mytable1 where i between > %s and %s;', max(i), min(i)) from t group by i/10_000_000; >
Can a long running procedure detect when smart shutdown is pending?
My project's DB has a mutli-step stored procedure using Transaction Control that may take 30 minutes or more to complete. I am curious if there is a way to make it more smart shutdown friendly so it can stop between steps? We are using both PG 14 and PG 16 on Rhel 8. Pardon me if it's obvious but is there a function to call or a table that could be checked after a commit within the procedure to determine a shutdown is pending? Thanks, Dennis
Re: Can a long running procedure detect when smart shutdown is pending?
Thanks for responding. This will be a stored procedure written in plpgsql that's kicked off by pg_cron. I wasn't sure a normal smart shutdown would stop it. Thanks On Fri, Jul 5, 2024 at 4:57 PM Achilleas Mantzios < a.mantz...@cloud.gatewaynet.com> wrote: > Στις 5/7/24 21:12, ο/η Dennis White έγραψε: > > My project's DB has a mutli-step stored procedure using Transaction > Control that may take 30 minutes or more to complete. > I am curious if there is a way to make it more smart shutdown friendly so > it can stop between steps? > > We are using both PG 14 and PG 16 on Rhel 8. > Pardon me if it's obvious but is there a function to call or a table that > could be checked after a commit within the procedure to determine a > shutdown is pending? > > Maybe somehow checking the log for a message like : > > received smart shutdown request > > Or use tail_n_mail against the log and then implement some logic using > NOTIFY , ideally you want your procedure to be interrupted rather than do > polling. > > > Thanks, > Dennis > > -- > Achilleas Mantzios > IT DEV - HEAD > IT DEPT > Dynacom Tankers Mgmt (as agents only) > >
Re: Can a long running procedure detect when smart shutdown is pending?
Thanks everyone. A more direct way to check via a sql function would be better but I suppose the dblink extension method will work. Thanks again, Dennis On Sat, Jul 6, 2024 at 9:38 AM Tom Lane wrote: > Laurenz Albe writes: > > On Fri, 2024-07-05 at 14:12 -0400, Dennis White wrote: > >> My project's DB has a mutli-step stored procedure using Transaction > Control that may take 30 minutes or more to complete. > >> I am curious if there is a way to make it more smart shutdown friendly > so it can stop between steps? > > > I don't think there is a direct way to do that in SQL; that would > require a new > > system function that exposes canAcceptConnections() in SQL. > > It's worse than that: the state variables involved are local to the > postmaster, so you wouldn't get the right answer in a backend even > if the function were reachable. > > > What you could do is use the dblink extension to connect to the local > database. > > If you get an error "the database system is shutting down", there is a > smart > > shutdown in progress. > > This'd probably work. Ugly, but ... > > regards, tom lane >