Re: Catching up with performance & PostgreSQL 15

2022-11-30 Thread Andrew Dunstan


On 2022-11-29 Tu 16:06, David Rowley wrote:
> On Wed, 30 Nov 2022 at 03:31, Tom Lane  wrote:
>> Alvaro Herrera  writes:
>>> IMO it was a mistake to turn JIT on in the default config, so that's one
>>> thing you'll likely want to change.
>> I wouldn't necessarily go quite that far, but I do think that the
>> default cost thresholds for invoking it are enormously too low,
>> or else there are serious bugs in the cost-estimation algorithms
>> for deciding when to use it.  A nearby example[1] of a sub-1-sec
>> partitioned query that took 30sec after JIT was enabled makes me
>> wonder if we're accounting correctly for per-partition JIT costs.
> I'm very grateful for JIT. However, I do agree that the costs need to work.
>
> The problem is that the threshold to turn JIT on does not consider how
> many expressions need to be compiled. It's quite different to JIT
> compile a simple one-node plan with a total cost of 10 than to JIT
> compile a plan that  costs the same but queries 1000 partitions. I
> think we should be compiling expressions based on the cost of the
> individial node rather than the total cost of the plan. We need to
> make some changes so we can more easily determine the number of times
> a given node will be executed before we can determine how worthwhile
> JITting an expression in a node will be.
>

I think Alvaro's point is that it would have been better to work out
these wrinkles before turning on JIT by default. Based on anecdotal
reports from the field I'm inclined to agree.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Catching up with performance & PostgreSQL 15

2022-11-30 Thread Andres Freund
Hi, 

On November 30, 2022 3:47:32 AM PST, Andrew Dunstan  wrote:
>
>On 2022-11-29 Tu 16:06, David Rowley wrote:
>> On Wed, 30 Nov 2022 at 03:31, Tom Lane  wrote:
>>> Alvaro Herrera  writes:
 IMO it was a mistake to turn JIT on in the default config, so that's one
 thing you'll likely want to change.
>>> I wouldn't necessarily go quite that far, but I do think that the
>>> default cost thresholds for invoking it are enormously too low,
>>> or else there are serious bugs in the cost-estimation algorithms
>>> for deciding when to use it.  A nearby example[1] of a sub-1-sec
>>> partitioned query that took 30sec after JIT was enabled makes me
>>> wonder if we're accounting correctly for per-partition JIT costs.
>> I'm very grateful for JIT. However, I do agree that the costs need to work.
>>
>> The problem is that the threshold to turn JIT on does not consider how
>> many expressions need to be compiled. It's quite different to JIT
>> compile a simple one-node plan with a total cost of 10 than to JIT
>> compile a plan that  costs the same but queries 1000 partitions. I
>> think we should be compiling expressions based on the cost of the
>> individial node rather than the total cost of the plan. We need to
>> make some changes so we can more easily determine the number of times
>> a given node will be executed before we can determine how worthwhile
>> JITting an expression in a node will be.
>>
>
>I think Alvaro's point is that it would have been better to work out
>these wrinkles before turning on JIT by default. Based on anecdotal
>reports from the field I'm inclined to agree.

The problem is that back when it was introduced these problems didn't exist to 
a significant degree. JIT was developed when partitioning was very minimal- and 
the problems we're seeing are almost exclusively with queries with many 
partitions. The problems really only started much more recently. It also wasn't 
enabled in the first release..

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Catching up with performance & PostgreSQL 15

2022-11-30 Thread Tom Lane
Andres Freund  writes:
> On November 30, 2022 3:47:32 AM PST, Andrew Dunstan  
> wrote:
>> I think Alvaro's point is that it would have been better to work out
>> these wrinkles before turning on JIT by default. Based on anecdotal
>> reports from the field I'm inclined to agree.

> The problem is that back when it was introduced these problems didn't exist 
> to a significant degree. JIT was developed when partitioning was very 
> minimal- and the problems we're seeing are almost exclusively with queries 
> with many partitions. The problems really only started much more recently. It 
> also wasn't enabled in the first release..

Well, wherever you want to pin the blame, it seems clear that we
have a problem now.  And I don't think flipping back to off-by-default
is the answer -- surely there is some population of users who will
not be happy with that.  We really need to prioritize fixing the
cost-estimation problems, and/or tweaking the default thresholds.

regards, tom lane




Geometric types row estimation

2022-11-30 Thread Igor ALBUQUERQUE SILVA
Hello everyone,

I'm having a problem regarding the point type/gist indexes. Here's a
minimal reproduction of it:

create table test(p point);
insert into test(p) values (point(0, 0));
insert into test(p) values (point(0, 1));
insert into test(p) values (point(1, 0));
insert into test(p) values (point(1, 1));
insert into test(p) values (point(50, 0));
analyze test;
explain analyze select * from test where p <@ box '(0,0),(1,1)';
explain analyze select * from test where p <@ box '(50,0),(51,1)';

The two queries get the same cost/row estimation, of 1 row. This is the
EXPLAIN ANALYZE of the first query:

Seq Scan on test  (cost=0.00..1.07 rows=1 width=16) (actual
time=0.022..0.026 rows=4 loops=1)
   Filter: ((p[0] >= '0'::double precision) AND (p[0] <= '1'::double
precision))
   Rows Removed by Filter: 1
 Planning Time: 0.115 ms
 Execution Time: 0.055 ms
(5 rows)

What I was expecting is the first query to estimate 4 rows and the second
to estimate 1, like what I get If I try the same thing using integers.

create table test(x integer, y integer);
insert into test(x, y) values (0, 0);
insert into test(x, y) values (0, 1);
insert into test(x, y) values (1, 0);
insert into test(x, y) values (1, 1);
insert into test(x, y) values (50, 0);
analyze test;
explain analyze select * from test where x between 0 and 1 and y between 0
and 1;
explain analyze select * from test where x between 50 and 51 and y between
0 and 1;

My question is: is this expected behaviour? I actually have a much larger
table with a gist index where I found this occurring, and this causes the
planner to make bad decisions: every query that I do will have the same
estimation, and whenever this estimation is very wrong, the planner does
not take the optimal decision.

I'm using the official docker image, PostgreSQL 15.1 (Debian
15.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6)
10.2.1 20210110, 64-bit, running everything in psql (PostgreSQL) 15.1
(Ubuntu 15.1-1.pgdg22.04+1).

Best regards,
Igor


Re: Geometric types row estimation

2022-11-30 Thread Igor ALBUQUERQUE SILVA
I'm sorry, I sent the wrong EXPLAIN ANALYZE for the first query, this is
the correct one:

Seq Scan on test  (cost=0.00..1.06 rows=1 width=16) (actual
time=0.018..0.022 rows=4 loops=1)
   Filter: (p <@ '(1,1),(0,0)'::box)
   Rows Removed by Filter: 1
 Planning Time: 0.211 ms
 Execution Time: 0.051 ms
(5 rows)

On Wed, 30 Nov 2022 at 17:44, Igor ALBUQUERQUE SILVA <
[email protected]> wrote:

> Hello everyone,
>
> I'm having a problem regarding the point type/gist indexes. Here's a
> minimal reproduction of it:
>
> create table test(p point);
> insert into test(p) values (point(0, 0));
> insert into test(p) values (point(0, 1));
> insert into test(p) values (point(1, 0));
> insert into test(p) values (point(1, 1));
> insert into test(p) values (point(50, 0));
> analyze test;
> explain analyze select * from test where p <@ box '(0,0),(1,1)';
> explain analyze select * from test where p <@ box '(50,0),(51,1)';
>
> The two queries get the same cost/row estimation, of 1 row. This is the
> EXPLAIN ANALYZE of the first query:
>
> Seq Scan on test  (cost=0.00..1.07 rows=1 width=16) (actual
> time=0.022..0.026 rows=4 loops=1)
>Filter: ((p[0] >= '0'::double precision) AND (p[0] <= '1'::double
> precision))
>Rows Removed by Filter: 1
>  Planning Time: 0.115 ms
>  Execution Time: 0.055 ms
> (5 rows)
>
> What I was expecting is the first query to estimate 4 rows and the second
> to estimate 1, like what I get If I try the same thing using integers.
>
> create table test(x integer, y integer);
> insert into test(x, y) values (0, 0);
> insert into test(x, y) values (0, 1);
> insert into test(x, y) values (1, 0);
> insert into test(x, y) values (1, 1);
> insert into test(x, y) values (50, 0);
> analyze test;
> explain analyze select * from test where x between 0 and 1 and y between 0
> and 1;
> explain analyze select * from test where x between 50 and 51 and y between
> 0 and 1;
>
> My question is: is this expected behaviour? I actually have a much larger
> table with a gist index where I found this occurring, and this causes the
> planner to make bad decisions: every query that I do will have the same
> estimation, and whenever this estimation is very wrong, the planner does
> not take the optimal decision.
>
> I'm using the official docker image, PostgreSQL 15.1 (Debian
> 15.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6)
> 10.2.1 20210110, 64-bit, running everything in psql (PostgreSQL) 15.1
> (Ubuntu 15.1-1.pgdg22.04+1).
>
> Best regards,
> Igor
>


Re: Geometric types row estimation

2022-11-30 Thread Tom Lane
Igor ALBUQUERQUE SILVA  writes:
> I'm having a problem regarding the point type/gist indexes. Here's a
> minimal reproduction of it:
> ...
> What I was expecting is the first query to estimate 4 rows and the second
> to estimate 1, like what I get If I try the same thing using integers.

Unfortunately, the selectivity estimation functions for PG's geometric
types are mostly just stubs.  The estimation function for point <@ box
in particular is contsel [1]:

/*
 *  contsel -- How likely is a box to contain (be contained by) a given box?
 *
 * This is a tighter constraint than "overlap", so produce a smaller
 * estimate than areasel does.
 */
Datum
contsel(PG_FUNCTION_ARGS)
{
PG_RETURN_FLOAT8(0.001);
}

It's been like that (excepting notational changes) since Berkeley days,
because nobody has bothered to make it better.

In general, PG's built-in geometric types have never gotten much
beyond their origins as an academic proof-of-concept.  I think people
who are doing serious work that requires such operations mostly use
PostGIS, and I'd suggest looking into that.

Or, if you feel like doing a lot of work to make these estimators
better, have at it.

regards, tom lane

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/utils/adt/geo_selfuncs.c;hb=HEAD




Re: Geometric types row estimation

2022-11-30 Thread Igor ALBUQUERQUE SILVA
Hi Tom,

Thanks a lot for the explanation, I thought the built-in types were more
standard, so I didn't mention that I was having the same thing using
postgis. Here's the example (I changed the values a little bit to avoid
rounding errors):

create table test(p geometry(point));
insert into test(p) values (st_makepoint(0,0));
insert into test(p) values (st_makepoint(0,1));
insert into test(p) values (st_makepoint(1,0));
insert into test(p) values (st_makepoint(1,1));
insert into test(p) values (st_makepoint(50,0));
analyze test;
explain analyze select * from test where
ST_Contains(ST_GeomFromText('POLYGON((-1 -1,2 -1,2 2,-1 2,-1 -1))'), p);
explain analyze select * from test where
ST_Contains(ST_GeomFromText('POLYGON((49 -1,51 -1,51 1,49 1,49 -1))'), p);

EXPLAIN ANALYZE:

 Seq Scan on test  (cost=0.00..126.05 rows=1 width=32) (actual
time=0.015..0.022 rows=4 loops=1)
   Filter:
st_contains('01030001000500F0BFF0BF0040F0BF00400040F0BF0040F0BFF0BF'::geometry,
p)
   Rows Removed by Filter: 1
 Planning Time: 0.072 ms
 Execution Time: 0.035 ms
(5 rows)

Do you know if the functions in Postgis are also stubbed? Or maybe I'm
doing something wrong with the syntax?

This time I'm using the postgis docker image, PostgreSQL 15.1 (Debian
15.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6)
10.2.1 20210110, 64-bit

Best regards,
Igor

On Wed, 30 Nov 2022 at 18:18, Tom Lane  wrote:

> Igor ALBUQUERQUE SILVA  writes:
> > I'm having a problem regarding the point type/gist indexes. Here's a
> > minimal reproduction of it:
> > ...
> > What I was expecting is the first query to estimate 4 rows and the second
> > to estimate 1, like what I get If I try the same thing using integers.
>
> Unfortunately, the selectivity estimation functions for PG's geometric
> types are mostly just stubs.  The estimation function for point <@ box
> in particular is contsel [1]:
>
> /*
>  *  contsel -- How likely is a box to contain (be contained by) a
> given box?
>  *
>  * This is a tighter constraint than "overlap", so produce a smaller
>  * estimate than areasel does.
>  */
> Datum
> contsel(PG_FUNCTION_ARGS)
> {
> PG_RETURN_FLOAT8(0.001);
> }
>
> It's been like that (excepting notational changes) since Berkeley days,
> because nobody has bothered to make it better.
>
> In general, PG's built-in geometric types have never gotten much
> beyond their origins as an academic proof-of-concept.  I think people
> who are doing serious work that requires such operations mostly use
> PostGIS, and I'd suggest looking into that.
>
> Or, if you feel like doing a lot of work to make these estimators
> better, have at it.
>
> regards, tom lane
>
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/utils/adt/geo_selfuncs.c;hb=HEAD
>


Re: Geometric types row estimation

2022-11-30 Thread Tom Lane
Igor ALBUQUERQUE SILVA  writes:
> Thanks a lot for the explanation, I thought the built-in types were more
> standard, so I didn't mention that I was having the same thing using
> postgis.

Hm --- you'd have to take that up with the PostGIS people.  But they
at least would be likely to have motivation to improve things.

regards, tom lane




Re: Geometric types row estimation

2022-11-30 Thread Igor ALBUQUERQUE SILVA
Ok I'll do that, thanks a lot!

On Wed, 30 Nov 2022 at 18:45, Tom Lane  wrote:

> Igor ALBUQUERQUE SILVA  writes:
> > Thanks a lot for the explanation, I thought the built-in types were more
> > standard, so I didn't mention that I was having the same thing using
> > postgis.
>
> Hm --- you'd have to take that up with the PostGIS people.  But they
> at least would be likely to have motivation to improve things.
>
> regards, tom lane
>