Re: Catching up with performance & PostgreSQL 15
On 2022-11-29 Tu 16:06, David Rowley wrote: > On Wed, 30 Nov 2022 at 03:31, Tom Lane wrote: >> Alvaro Herrera writes: >>> IMO it was a mistake to turn JIT on in the default config, so that's one >>> thing you'll likely want to change. >> I wouldn't necessarily go quite that far, but I do think that the >> default cost thresholds for invoking it are enormously too low, >> or else there are serious bugs in the cost-estimation algorithms >> for deciding when to use it. A nearby example[1] of a sub-1-sec >> partitioned query that took 30sec after JIT was enabled makes me >> wonder if we're accounting correctly for per-partition JIT costs. > I'm very grateful for JIT. However, I do agree that the costs need to work. > > The problem is that the threshold to turn JIT on does not consider how > many expressions need to be compiled. It's quite different to JIT > compile a simple one-node plan with a total cost of 10 than to JIT > compile a plan that costs the same but queries 1000 partitions. I > think we should be compiling expressions based on the cost of the > individial node rather than the total cost of the plan. We need to > make some changes so we can more easily determine the number of times > a given node will be executed before we can determine how worthwhile > JITting an expression in a node will be. > I think Alvaro's point is that it would have been better to work out these wrinkles before turning on JIT by default. Based on anecdotal reports from the field I'm inclined to agree. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Catching up with performance & PostgreSQL 15
Hi, On November 30, 2022 3:47:32 AM PST, Andrew Dunstan wrote: > >On 2022-11-29 Tu 16:06, David Rowley wrote: >> On Wed, 30 Nov 2022 at 03:31, Tom Lane wrote: >>> Alvaro Herrera writes: IMO it was a mistake to turn JIT on in the default config, so that's one thing you'll likely want to change. >>> I wouldn't necessarily go quite that far, but I do think that the >>> default cost thresholds for invoking it are enormously too low, >>> or else there are serious bugs in the cost-estimation algorithms >>> for deciding when to use it. A nearby example[1] of a sub-1-sec >>> partitioned query that took 30sec after JIT was enabled makes me >>> wonder if we're accounting correctly for per-partition JIT costs. >> I'm very grateful for JIT. However, I do agree that the costs need to work. >> >> The problem is that the threshold to turn JIT on does not consider how >> many expressions need to be compiled. It's quite different to JIT >> compile a simple one-node plan with a total cost of 10 than to JIT >> compile a plan that costs the same but queries 1000 partitions. I >> think we should be compiling expressions based on the cost of the >> individial node rather than the total cost of the plan. We need to >> make some changes so we can more easily determine the number of times >> a given node will be executed before we can determine how worthwhile >> JITting an expression in a node will be. >> > >I think Alvaro's point is that it would have been better to work out >these wrinkles before turning on JIT by default. Based on anecdotal >reports from the field I'm inclined to agree. The problem is that back when it was introduced these problems didn't exist to a significant degree. JIT was developed when partitioning was very minimal- and the problems we're seeing are almost exclusively with queries with many partitions. The problems really only started much more recently. It also wasn't enabled in the first release.. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Catching up with performance & PostgreSQL 15
Andres Freund writes: > On November 30, 2022 3:47:32 AM PST, Andrew Dunstan > wrote: >> I think Alvaro's point is that it would have been better to work out >> these wrinkles before turning on JIT by default. Based on anecdotal >> reports from the field I'm inclined to agree. > The problem is that back when it was introduced these problems didn't exist > to a significant degree. JIT was developed when partitioning was very > minimal- and the problems we're seeing are almost exclusively with queries > with many partitions. The problems really only started much more recently. It > also wasn't enabled in the first release.. Well, wherever you want to pin the blame, it seems clear that we have a problem now. And I don't think flipping back to off-by-default is the answer -- surely there is some population of users who will not be happy with that. We really need to prioritize fixing the cost-estimation problems, and/or tweaking the default thresholds. regards, tom lane
Geometric types row estimation
Hello everyone, I'm having a problem regarding the point type/gist indexes. Here's a minimal reproduction of it: create table test(p point); insert into test(p) values (point(0, 0)); insert into test(p) values (point(0, 1)); insert into test(p) values (point(1, 0)); insert into test(p) values (point(1, 1)); insert into test(p) values (point(50, 0)); analyze test; explain analyze select * from test where p <@ box '(0,0),(1,1)'; explain analyze select * from test where p <@ box '(50,0),(51,1)'; The two queries get the same cost/row estimation, of 1 row. This is the EXPLAIN ANALYZE of the first query: Seq Scan on test (cost=0.00..1.07 rows=1 width=16) (actual time=0.022..0.026 rows=4 loops=1) Filter: ((p[0] >= '0'::double precision) AND (p[0] <= '1'::double precision)) Rows Removed by Filter: 1 Planning Time: 0.115 ms Execution Time: 0.055 ms (5 rows) What I was expecting is the first query to estimate 4 rows and the second to estimate 1, like what I get If I try the same thing using integers. create table test(x integer, y integer); insert into test(x, y) values (0, 0); insert into test(x, y) values (0, 1); insert into test(x, y) values (1, 0); insert into test(x, y) values (1, 1); insert into test(x, y) values (50, 0); analyze test; explain analyze select * from test where x between 0 and 1 and y between 0 and 1; explain analyze select * from test where x between 50 and 51 and y between 0 and 1; My question is: is this expected behaviour? I actually have a much larger table with a gist index where I found this occurring, and this causes the planner to make bad decisions: every query that I do will have the same estimation, and whenever this estimation is very wrong, the planner does not take the optimal decision. I'm using the official docker image, PostgreSQL 15.1 (Debian 15.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit, running everything in psql (PostgreSQL) 15.1 (Ubuntu 15.1-1.pgdg22.04+1). Best regards, Igor
Re: Geometric types row estimation
I'm sorry, I sent the wrong EXPLAIN ANALYZE for the first query, this is the correct one: Seq Scan on test (cost=0.00..1.06 rows=1 width=16) (actual time=0.018..0.022 rows=4 loops=1) Filter: (p <@ '(1,1),(0,0)'::box) Rows Removed by Filter: 1 Planning Time: 0.211 ms Execution Time: 0.051 ms (5 rows) On Wed, 30 Nov 2022 at 17:44, Igor ALBUQUERQUE SILVA < [email protected]> wrote: > Hello everyone, > > I'm having a problem regarding the point type/gist indexes. Here's a > minimal reproduction of it: > > create table test(p point); > insert into test(p) values (point(0, 0)); > insert into test(p) values (point(0, 1)); > insert into test(p) values (point(1, 0)); > insert into test(p) values (point(1, 1)); > insert into test(p) values (point(50, 0)); > analyze test; > explain analyze select * from test where p <@ box '(0,0),(1,1)'; > explain analyze select * from test where p <@ box '(50,0),(51,1)'; > > The two queries get the same cost/row estimation, of 1 row. This is the > EXPLAIN ANALYZE of the first query: > > Seq Scan on test (cost=0.00..1.07 rows=1 width=16) (actual > time=0.022..0.026 rows=4 loops=1) >Filter: ((p[0] >= '0'::double precision) AND (p[0] <= '1'::double > precision)) >Rows Removed by Filter: 1 > Planning Time: 0.115 ms > Execution Time: 0.055 ms > (5 rows) > > What I was expecting is the first query to estimate 4 rows and the second > to estimate 1, like what I get If I try the same thing using integers. > > create table test(x integer, y integer); > insert into test(x, y) values (0, 0); > insert into test(x, y) values (0, 1); > insert into test(x, y) values (1, 0); > insert into test(x, y) values (1, 1); > insert into test(x, y) values (50, 0); > analyze test; > explain analyze select * from test where x between 0 and 1 and y between 0 > and 1; > explain analyze select * from test where x between 50 and 51 and y between > 0 and 1; > > My question is: is this expected behaviour? I actually have a much larger > table with a gist index where I found this occurring, and this causes the > planner to make bad decisions: every query that I do will have the same > estimation, and whenever this estimation is very wrong, the planner does > not take the optimal decision. > > I'm using the official docker image, PostgreSQL 15.1 (Debian > 15.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) > 10.2.1 20210110, 64-bit, running everything in psql (PostgreSQL) 15.1 > (Ubuntu 15.1-1.pgdg22.04+1). > > Best regards, > Igor >
Re: Geometric types row estimation
Igor ALBUQUERQUE SILVA writes:
> I'm having a problem regarding the point type/gist indexes. Here's a
> minimal reproduction of it:
> ...
> What I was expecting is the first query to estimate 4 rows and the second
> to estimate 1, like what I get If I try the same thing using integers.
Unfortunately, the selectivity estimation functions for PG's geometric
types are mostly just stubs. The estimation function for point <@ box
in particular is contsel [1]:
/*
* contsel -- How likely is a box to contain (be contained by) a given box?
*
* This is a tighter constraint than "overlap", so produce a smaller
* estimate than areasel does.
*/
Datum
contsel(PG_FUNCTION_ARGS)
{
PG_RETURN_FLOAT8(0.001);
}
It's been like that (excepting notational changes) since Berkeley days,
because nobody has bothered to make it better.
In general, PG's built-in geometric types have never gotten much
beyond their origins as an academic proof-of-concept. I think people
who are doing serious work that requires such operations mostly use
PostGIS, and I'd suggest looking into that.
Or, if you feel like doing a lot of work to make these estimators
better, have at it.
regards, tom lane
[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/utils/adt/geo_selfuncs.c;hb=HEAD
Re: Geometric types row estimation
Hi Tom,
Thanks a lot for the explanation, I thought the built-in types were more
standard, so I didn't mention that I was having the same thing using
postgis. Here's the example (I changed the values a little bit to avoid
rounding errors):
create table test(p geometry(point));
insert into test(p) values (st_makepoint(0,0));
insert into test(p) values (st_makepoint(0,1));
insert into test(p) values (st_makepoint(1,0));
insert into test(p) values (st_makepoint(1,1));
insert into test(p) values (st_makepoint(50,0));
analyze test;
explain analyze select * from test where
ST_Contains(ST_GeomFromText('POLYGON((-1 -1,2 -1,2 2,-1 2,-1 -1))'), p);
explain analyze select * from test where
ST_Contains(ST_GeomFromText('POLYGON((49 -1,51 -1,51 1,49 1,49 -1))'), p);
EXPLAIN ANALYZE:
Seq Scan on test (cost=0.00..126.05 rows=1 width=32) (actual
time=0.015..0.022 rows=4 loops=1)
Filter:
st_contains('01030001000500F0BFF0BF0040F0BF00400040F0BF0040F0BFF0BF'::geometry,
p)
Rows Removed by Filter: 1
Planning Time: 0.072 ms
Execution Time: 0.035 ms
(5 rows)
Do you know if the functions in Postgis are also stubbed? Or maybe I'm
doing something wrong with the syntax?
This time I'm using the postgis docker image, PostgreSQL 15.1 (Debian
15.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6)
10.2.1 20210110, 64-bit
Best regards,
Igor
On Wed, 30 Nov 2022 at 18:18, Tom Lane wrote:
> Igor ALBUQUERQUE SILVA writes:
> > I'm having a problem regarding the point type/gist indexes. Here's a
> > minimal reproduction of it:
> > ...
> > What I was expecting is the first query to estimate 4 rows and the second
> > to estimate 1, like what I get If I try the same thing using integers.
>
> Unfortunately, the selectivity estimation functions for PG's geometric
> types are mostly just stubs. The estimation function for point <@ box
> in particular is contsel [1]:
>
> /*
> * contsel -- How likely is a box to contain (be contained by) a
> given box?
> *
> * This is a tighter constraint than "overlap", so produce a smaller
> * estimate than areasel does.
> */
> Datum
> contsel(PG_FUNCTION_ARGS)
> {
> PG_RETURN_FLOAT8(0.001);
> }
>
> It's been like that (excepting notational changes) since Berkeley days,
> because nobody has bothered to make it better.
>
> In general, PG's built-in geometric types have never gotten much
> beyond their origins as an academic proof-of-concept. I think people
> who are doing serious work that requires such operations mostly use
> PostGIS, and I'd suggest looking into that.
>
> Or, if you feel like doing a lot of work to make these estimators
> better, have at it.
>
> regards, tom lane
>
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/utils/adt/geo_selfuncs.c;hb=HEAD
>
Re: Geometric types row estimation
Igor ALBUQUERQUE SILVA writes: > Thanks a lot for the explanation, I thought the built-in types were more > standard, so I didn't mention that I was having the same thing using > postgis. Hm --- you'd have to take that up with the PostGIS people. But they at least would be likely to have motivation to improve things. regards, tom lane
Re: Geometric types row estimation
Ok I'll do that, thanks a lot! On Wed, 30 Nov 2022 at 18:45, Tom Lane wrote: > Igor ALBUQUERQUE SILVA writes: > > Thanks a lot for the explanation, I thought the built-in types were more > > standard, so I didn't mention that I was having the same thing using > > postgis. > > Hm --- you'd have to take that up with the PostGIS people. But they > at least would be likely to have motivation to improve things. > > regards, tom lane >
