Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread Alex

Shouldn't this process be automatic based on some heuristics?
Saving 10ms planning but costing 14s execution is catastrophic.
For example, using some statistics to limit planner time to some percent of
of previous executions.
This way, if query is fast, planning is fast, but if query is slow, more
planning can save huge execution time.
This is a better general usage option and should be enabled by default, and
users who want fast planning should set the variable to use the generic
plan.
Justin Pryzby wrote:
On Sun, May 02, 2021 at 07:45:26PM +, Alex wrote:
PreparedStatement: 15s
Raw query with embedded params: 1s
See issue on github with query and explain analyze:
https://github.com/pgjdbc/pgjdbc/issues/2145

| ..PostgreSQL Version? 12
|Prepared statement
|...
|Planning Time: 11.596 ms
|Execution Time: 14799.266 ms
|
|Raw statement
|Planning Time: 22.685 ms
|Execution Time: 1012.992 ms
The prepared statemnt has 2x faster planning time, which is what it's meant
to
improve.
The execution time is slower, and I think you can improve it with this.
https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE


|plan_cache_mode (enum)
| Prepared statements (either explicitly prepared or implicitly generated,
for example by PL/pgSQL) can be executed using custom or generic plans.
Custom plans are made afresh for each execution using its specific set of
parameter values, while generic plans do not rely on the parameter values
and can be re-used across executions. Thus, use of a generic plan saves
planning time, but if the ideal plan depends strongly on the parameter
values then a generic plan may be inefficient. The choice between these
options is normally made automatically, but it can be overridden with
plan_cache_mode. The allowed values are auto (the default),
force_custom_plan and force_generic_plan. This setting is considered when a
cached plan is to be executed, not when it is prepared. For more
information see PREPARE.
--
Justin


Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread Rick Otten
On Tue, May 4, 2021 at 6:05 AM Alex  wrote:

> Shouldn't this process be automatic based on some heuristics?
>
> Saving 10ms planning but costing 14s execution is catastrophic.
>
> For example, using some statistics to limit planner time to some percent
> of of previous executions.
> This way, if query is fast, planning is fast, but if query is slow, more
> planning can save huge execution time.
> This is a better general usage option and should be enabled by default,
> and users who want fast planning should set the variable to use the generic
> plan.
>
>
>
"fast" and "slow" are relative things.  There are many queries that I would
be overjoyed with if they completed in 5 _minutes_.  And others where they
have to complete within 100ms or something is really wrong.  We don't
really know what the execution time is until the query actually executes.
Planning is a guess for the best approach.

Another factor is whether the data is in cache or out on disk.  Sometimes
you don't really know until you try to go get it.  That can significantly
change query performance and plans - especially if some of the tables in a
query with a lot of joins are in cache and some aren't and maybe some have
to be swapped out to pick up others.

If you are running the same dozen queries with different but similarly
scoped parameters over and over, one would hope that the system would
slowly tune itself to be highly optimized for those dozen queries.  That is
a pretty narrow use case for a powerful general purpose relational database
though.


Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread Alex

"Powerful general purpose relational database" but not smart...
I propose a feature to use information from previously executed queries to
adjust the query plan time accordingly.
Reusing the same generic plan may and will lead to very long execution
times.
Rick Otten wrote:
On Tue, May 4, 2021 at 6:05 AM Alex mailto:[email protected]> > wrote:
Shouldn't this process be automatic based on some heuristics?
Saving 10ms planning but costing 14s execution is catastrophic.
For example, using some statistics to limit planner time to some percent of
of previous executions.
This way, if query is fast, planning is fast, but if query is slow, more
planning can save huge execution time.
This is a better general usage option and should be enabled by default, and
users who want fast planning should set the variable to use the generic
plan.
"fast" and "slow" are relative things. There are many queries that I would
be overjoyed with if they completed in 5 _minutes_. And others where they
have to complete within 100ms or something is really wrong. We don't really
know what the execution time is until the query actually executes. Planning
is a guess for the best approach.
Another factor is whether the data is in cache or out on disk. Sometimes
you don't really know until you try to go get it. That can significantly
change query performance and plans - especially if some of the tables in a
query with a lot of joins are in cache and some aren't and maybe some have
to be swapped out to pick up others.
If you are running the same dozen queries with different but similarly
scoped parameters over and over, one would hope that the system would
slowly tune itself to be highly optimized for those dozen queries. That is
a pretty narrow use case for a powerful general purpose relational database
though.


Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread Laurenz Albe
On Tue, 2021-05-04 at 13:59 +, Alex wrote:
> "Powerful general purpose relational database" but not smart... 

Too smart can easily become slow...

> I propose a feature to use information from previously executed queries to 
> adjust the query plan time accordingly.
> Reusing the same generic plan may and will lead to very long execution times.

AI can go wrong too, and I personally would be worried that such cases
are very hard to debug...

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread Vijaykumar Jain
I am not an expert on this, But I would like to take a shot :)

Is it possible to share your prepared statement and parameter types.
I mean

something like this

PREPARE usrrptplan (int) AS
SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid
AND l.date = $2;
EXECUTE usrrptplan(1, current_date);


It's just that sometimes the datatypes of the prepared statement params are
not the same as the datatype of the field in the join and as a result it
may add some overhead.
PostgreSQL - general - bpchar, text and indexes (postgresql-archive.org)

There was one more thread where a person has similar issues, which was
sorted by using the relevant field type in the prepared field.


bPMA | explain.depesz.com   -> slow
(prepared)  Row 8
TsNn | explain.depesz.com  -> fast
(direct)   Row 8
It seems the join filters in the prepared version are doing a lot of work
on the fields massaging the fields that may add the cost overhead,


Also, if the above does not work, can you try the below plan GUC to check
if you see any improvements.

Tech preview: How PostgreSQL 12 handles prepared plans - CYBERTEC
(cybertec-postgresql.com)


Thanks,
Vijay

On Tue, 4 May 2021 at 20:52, Laurenz Albe  wrote:

> On Tue, 2021-05-04 at 13:59 +, Alex wrote:
> > "Powerful general purpose relational database" but not smart...
>
> Too smart can easily become slow...
>
> > I propose a feature to use information from previously executed queries
> to adjust the query plan time accordingly.
> > Reusing the same generic plan may and will lead to very long execution
> times.
>
> AI can go wrong too, and I personally would be worried that such cases
> are very hard to debug...
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>
>
>

-- 
Thanks,
Vijay
Mumbai, India


Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread David Rowley
On Tue, 4 May 2021 at 22:05, Alex  wrote:
> Shouldn't this process be automatic based on some heuristics?

When plan_cache_mode is set to "auto", then the decision to use a
generic or custom plan is cost-based. See [1]. There's a fairly crude
method there for estimating the effort required to replan the query.
The remainder is based on the average cost of the previous custom
plans + estimated planning effort vs cost of the generic plan.  The
cheaper one wins.

Certainly, what's there is far from perfect.  There are various
problems with it.  The estimated planning cost is pretty crude and
could do with an overhaul.   There are also issues with the plan costs
not being true to the cost of the query.  One problem there is that
run-time partition pruning is not costed into the plan.  This might
cause choose_custom_plan() to pick a custom plan when a generic one
with run-time pruning might have been better.

In order to get a better idea of where things are going wrong for you,
we'd need to see the EXPLAIN ANALYZE output for both the custom and
the generic plan.

David

[1] 
https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1019




Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread Justin Pryzby
On Mon, May 03, 2021 at 03:18:11PM -0500, Justin Pryzby wrote:
> On Sun, May 02, 2021 at 07:45:26PM +, Alex wrote:
> > PreparedStatement: 15s
> > Raw query with embedded params: 1s
> > See issue on github with query and explain analyze:
> > https://github.com/pgjdbc/pgjdbc/issues/2145
> 
> | ..PostgreSQL Version?  12
> |Prepared statement
> |...
> |Planning Time: 11.596 ms
> |Execution Time: 14799.266 ms
> |
> |Raw statement
> |Planning Time: 22.685 ms
> |Execution Time: 1012.992 ms
> 
> The prepared statemnt has 2x faster planning time, which is what it's meant to
> improve.
> 
> The execution time is slower, and I think you can improve it with this.
> https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE

Also, the rowcount estimates are way off starting with the scan nodes.

  ->  Bitmap Heap Scan on category_property_name cpn_limits  (cost=32.13..53.55 
rows=14 width=29) (actual time=0.665..8.822 rows=2650 loops=1)
Recheck Cond: ((lexeme = ANY ('{''rata'',""''polling'' ''rata'' 
''ratez''"",""''polling'' ''rata''"",""''rata'' ''ratez'' ''semnal'' 
''usb-ul''""}'::tsvector[])) OR (lexeme = '''frecventa'' 
''frecventez'''::tsvector) OR (lexeme = '''raportare'' ''rata'' 
''ratez'''::tsvector) OR (lexeme = ANY ('{''latime'',""''latime'' 
''placi''"",""''compatibila'' ''latime'' ''telefon''""}'::tsvector[])) OR 
(lexeme = '''lungime'''::tsvector) OR (lexeme = '''cablu'' 
''lungime'''::tsvector) OR (lexeme = '''inaltime'''::tsvector) OR (lexeme = 
'''rezolutie'''::tsvector) OR (lexeme = '''greutate'''::tsvector))
Heap Blocks: exact=85
->  BitmapOr  (cost=32.13..32.13 rows=14 width=0) (actual 
time=0.574..0.577 rows=0 loops=1) 
  ->  Bitmap Index Scan on category_property_name_lexeme_idx  
(cost=0.00..9.17 rows=4 width=0) (actual time=0.088..0.089 rows=10 loops=1)
Index Cond: (lexeme = ANY ('{''rata'',""''polling'' 
''rata'' ''ratez''"",""''polling'' ''rata''"",""''rata'' ''ratez'' ''semnal'' 
''usb-ul''""}'::tsvector[]))
  ->  Bitmap Index Scan on category_property_name_lexeme_idx  
(cost=0.00..2.29 rows=1 width=0) (actual time=0.047..0.047 rows=171 loops=1) 
Index Cond: (lexeme = '''frecventa'' 
''frecventez'''::tsvector)  

->  Bitmap Index Scan on 
category_property_name_lexeme_idx  (cost=0.00..2.29 rows=1 width=0) (actual 
time=0.015..0.015 rows=1 loops=1)   
Index Cond: (lexeme = 
'''raportare'' ''rata'' ''ratez'''::tsvector)
  ->  Bitmap Index Scan on category_property_name_lexeme_idx  
(cost=0.00..6.88 rows=3 width=0) (actual time=0.097..0.097 rows=547 loops=1)

 Index Cond: (lexeme = ANY ('{''latime'',""''latime'' 
''placi''"",""''compatibila'' ''latime'' ''telefon''""}'::tsvector[]))
  ->  Bitmap Index Scan on category_property_name_lexeme_idx  
(cost=0.00..2.29 rows=1 width=0) (actual time=0.107..0.107 rows=604 loops=1)

 Index Cond: (lexeme = '''lungime'''::tsvector)
  ->  Bitmap Index Scan on category_property_name_lexeme_idx  
(cost=0.00..2.29 rows=1 width=0) (actual time=0.030..0.030 rows=137 loops=1)
Index Cond: (lexeme = '''cablu'' ''lungime'''::tsvector)
  ->  Bitmap Index Scan on category_property_name_lexeme_idx  
(cost=0.00..2.29 rows=1 width=0) (actual time=0.079..0.079 rows=479 loops=1)

 Index Cond: (lexeme = '''inaltime'''::tsvector)
  ->  Bitmap Index Scan on category_property_name_lexeme_idx  
(cost=0.00..2.29 rows=1 width=0) (actual time=0.020..0.020 rows=40 loops=1)
Index Cond: (lexeme = '''rezolutie'''::tsvector)


 ->  Bitmap Index Scan on category_property_name_lexeme_idx  (cost=0.00..2.29 
rows=1 width=0) (actual time=0.088..0.088 rows=661 loops=1) 

Index Cond: (lexeme = '''greutate'''::tsvector)