15x slower PreparedStatement vs raw query

2021-05-03 Thread Alex

PreparedStatement: 15s
Raw query with embedded params: 1s
See issue on github with query and explain analyze:
https://github.com/pgjdbc/pgjdbc/issues/2145



Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread Alex

Shouldn't this process be automatic based on some heuristics?
Saving 10ms planning but costing 14s execution is catastrophic.
For example, using some statistics to limit planner time to some percent of
of previous executions.
This way, if query is fast, planning is fast, but if query is slow, more
planning can save huge execution time.
This is a better general usage option and should be enabled by default, and
users who want fast planning should set the variable to use the generic
plan.
Justin Pryzby wrote:
On Sun, May 02, 2021 at 07:45:26PM +0000, Alex wrote:
PreparedStatement: 15s
Raw query with embedded params: 1s
See issue on github with query and explain analyze:
https://github.com/pgjdbc/pgjdbc/issues/2145
<https://github.com/pgjdbc/pgjdbc/issues/2145>
| ..PostgreSQL Version? 12
|Prepared statement
|...
|Planning Time: 11.596 ms
|Execution Time: 14799.266 ms
|
|Raw statement
|Planning Time: 22.685 ms
|Execution Time: 1012.992 ms
The prepared statemnt has 2x faster planning time, which is what it's meant
to
improve.
The execution time is slower, and I think you can improve it with this.
https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE
<https://www.postgresql.org/docs/12/runtime-config-query.html#GUC-PLAN-CACHE_MODE>

|plan_cache_mode (enum)
| Prepared statements (either explicitly prepared or implicitly generated,
for example by PL/pgSQL) can be executed using custom or generic plans.
Custom plans are made afresh for each execution using its specific set of
parameter values, while generic plans do not rely on the parameter values
and can be re-used across executions. Thus, use of a generic plan saves
planning time, but if the ideal plan depends strongly on the parameter
values then a generic plan may be inefficient. The choice between these
options is normally made automatically, but it can be overridden with
plan_cache_mode. The allowed values are auto (the default),
force_custom_plan and force_generic_plan. This setting is considered when a
cached plan is to be executed, not when it is prepared. For more
information see PREPARE.
--
Justin


Re: 15x slower PreparedStatement vs raw query

2021-05-04 Thread Alex

"Powerful general purpose relational database" but not smart...
I propose a feature to use information from previously executed queries to
adjust the query plan time accordingly.
Reusing the same generic plan may and will lead to very long execution
times.
Rick Otten wrote:
On Tue, May 4, 2021 at 6:05 AM Alex mailto:[email protected]> > wrote:
Shouldn't this process be automatic based on some heuristics?
Saving 10ms planning but costing 14s execution is catastrophic.
For example, using some statistics to limit planner time to some percent of
of previous executions.
This way, if query is fast, planning is fast, but if query is slow, more
planning can save huge execution time.
This is a better general usage option and should be enabled by default, and
users who want fast planning should set the variable to use the generic
plan.
"fast" and "slow" are relative things. There are many queries that I would
be overjoyed with if they completed in 5 _minutes_. And others where they
have to complete within 100ms or something is really wrong. We don't really
know what the execution time is until the query actually executes. Planning
is a guess for the best approach.
Another factor is whether the data is in cache or out on disk. Sometimes
you don't really know until you try to go get it. That can significantly
change query performance and plans - especially if some of the tables in a
query with a lot of joins are in cache and some aren't and maybe some have
to be swapped out to pick up others.
If you are running the same dozen queries with different but similarly
scoped parameters over and over, one would hope that the system would
slowly tune itself to be highly optimized for those dozen queries. That is
a pretty narrow use case for a powerful general purpose relational database
though.


Re: 15x slower PreparedStatement vs raw query

2021-05-05 Thread Alex
 This is exactly my issue.
Using raw query, planning takes 22ms (custom plan), but using PreparedStatement 
planning takes 11ms (generic plan).It choose the faster generic plan 11ms, 
wining 11ms faster than custom plan, but loosing 14seconds!!! to execution...
The auto choose algorithm should be changed to include execution time in the 
decision.
On Wednesday, May 5, 2021, 9:57:20 AM GMT+3, David Rowley 
 wrote:  
 
 On Tue, 4 May 2021 at 22:05, Alex  wrote:
> Shouldn't this process be automatic based on some heuristics?

When plan_cache_mode is set to "auto", then the decision to use a
generic or custom plan is cost-based. See [1]. There's a fairly crude
method there for estimating the effort required to replan the query.
The remainder is based on the average cost of the previous custom
plans + estimated planning effort vs cost of the generic plan.  The
cheaper one wins.

Certainly, what's there is far from perfect.  There are various
problems with it.  The estimated planning cost is pretty crude and
could do with an overhaul.  There are also issues with the plan costs
not being true to the cost of the query.  One problem there is that
run-time partition pruning is not costed into the plan.  This might
cause choose_custom_plan() to pick a custom plan when a generic one
with run-time pruning might have been better.

In order to get a better idea of where things are going wrong for you,
we'd need to see the EXPLAIN ANALYZE output for both the custom and
the generic plan.

David

[1] 
https://github.com/postgres/postgres/blob/master/src/backend/utils/cache/plancache.c#L1019
  

Slow query because lexeme index not used

2021-08-07 Thread Alex
Table "product" has a GIN index on "lexeme" column (tsvector) that is not used.

Query that doesn't use lexeme idx:  https://explain.dalibo.com/plan/BlB#plan, 
~8s, ~60.000 blocks needed

Query forced to use lexeme idx: https://explain.dalibo.com/plan/i52, ~800ms 
(10x less), ~15.000 blocks needed (x4 less)
Table metdata:
 relname  | relpages | reltuples | relallvisible | relkind | 
relnatts | relhassubclass | reloptions | pg_table_size
--+--+---+---+-+--+++---
 product_property_default | 8992 |    622969 |  8992 | r   |
   16 | f  |    |  73719808
 product  |    49686 |    413840 | 49686 | r   |
   14 | f  |    | 493314048
Table stats:
   frac_mcv    |    tablename | attname | inherited | null_frac | 
n_distinct  | n_mcv | n_hist | correlation
---+--+-+---+---+-+---++-
   | product  | lexeme  | f | 0 |   
   -1 |   |    |
    0.99773335 | product_property_default | meaning | f | 0 |   
   63 |    39 | 24 |  0.19444875
 0.6416333 | product_property_default | first   | f | 0 |   
 2193 |   100 |    101 | -0.09763639
 0.0002334 | product_property_default | product | f | 0 | 
-0.15221785 | 1 |    101 |  0.08643274


Using windows docker with wsl2.Both cases are run with cold cache.All database 
memory is limited to 1GB by using .wslconfig file with memory=1GB, also the 
docker container is limited to 1GB. 
My requirement is to optimize disk access with this limited memory


Postgres 12.4










Re: Slow query because lexeme index not used

2021-08-09 Thread Alex
  > Could you show the table stats for product.id ?  In particular its
"correlation".
 frac_mcv | tablename | attname | inherited | null_frac | n_distinct | n_mcv | 
n_hist | correlation
--+---+-+---+---++---++-
  | product   | id  | f | 0 | -1 |   |  
  101 |   0.3857521

> How large is shared_buffers ?256MB

> Does the query plan improve if you increase work_mem ?No, same plan.
> Maybe you could encourage scanning in order of 
> product_property.product...Clustering "product_property_default" on 
> "product_property_default_product_idx" followed by analyze, does not change 
> the plan.

> Or you could write the query with a temp table:Creating the temp table 
> "product_ids" changes the plan to use lexeme_idx: 
> https://explain.dalibo.com/plan/19h
But I prefer not to use an extra step for my query.

> Maybe it would help to create stats on "first" and "meaning"...
I've played around with statistics, and increasing column stats with the 
extended statistics improve the planner estimation (nested loop 69x to 12x), 
but the same ineffective plan is issued, without 
lexeme_idx:https://explain.dalibo.com/plan/B7d#plan (has query with stats)




On Sunday, August 8, 2021, 3:35:31 AM GMT+3, Justin Pryzby 
 wrote:  
 
 On Sat, Aug 07, 2021 at 07:35:25PM +, Alex wrote:
> Table "product" has a GIN index on "lexeme" column (tsvector) that is not 
> used.
> 
> Query that doesn't use lexeme idx:  https://explain.dalibo.com/plan/BlB#plan, 
> ~8s, ~60.000 blocks needed
> 
> Query forced to use lexeme idx: https://explain.dalibo.com/plan/i52, ~800ms 
> (10x less), ~15.000 blocks needed (x4 less)

Could you show the table stats for product.id ?  In particular its
"correlation".

I guess the correlation is ~1, and the 10,659 index scans on product.id are
considered to be cheaper than scannning the lexeme index - since there are no
correlation stats for tsvector.

How large is shared_buffers ?

Does the query plan improve if you increase work_mem ?

Maybe you could encourage scanning in order of product_property.product.
You could CLUSTER product_property_default on an index on "product" and then
ANALYZE.  Or you could write the query with a temp table:

CREATE TEMP TABLE product_ids AS
SELECT product
FROM product_property
WHERE "meaning" = 'B' AND "first" = 1.7179869184E10
GROUP BY 1 -- or DISTINCT, because the table is only used for EXISTS
ORDER BY 1; -- to scan product in order of id
ANALYZE product_ids;

The index scans on product.id should be faster when you use
EXISTS(SELECT 1 FROM product_ids ...), even though it didn't use the lexeme 
index.

Maybe it would help to create stats on "first" and "meaning"; the rowcount is
underestimated by 3x, which means it did several times more index scans into
"product" than planned.
| Bitmap Heap Scan on product_property_default product_property_default 
(cost=2,748.6..8,823.4 rows=6,318 width=4) (actual time=43.945..211.621 
rows=21,061 loops=1) 

CREATE STATISTICS first_meaning ON first,meaning FROM product_property;
ANALYZE product_property;

> Table metdata:
>  relname  | relpages | reltuples | relallvisible | relkind | 
> relnatts | relhassubclass | reloptions | pg_table_size
> --+--+---+---+-+--+++---
>  product_property_default | 8992 |    622969 |  8992 | r   |  
>  16 | f  |    |  73719808
>  product  |    49686 |    413840 | 49686 | r   |  
>  14 | f  |    | 493314048
>
> Table stats:
>    frac_mcv    |    tablename | attname | inherited | null_frac | 
> n_distinct  | n_mcv | n_hist | correlation
> ---+--+-+---+---+-+---++-
>    | product  | lexeme  | f | 0 | 
>  -1 |   |    |
>     0.99773335 | product_property_default | meaning | f | 0 | 
>  63 |    39 | 24 |  0.19444875
>  0.6416333 | product_property_default | first   | f | 0 | 
>    2193 |   100 |    101 | -0.09763639
>  0.0002334 | product_property_default | product | f | 0 | 
> -0.15221785 | 1 |    101 |  0.08643274
> 
> 
> Using windows docker with wsl2.Both cases are run with cold cache.All 
> database memory is limited to 1GB by using .wslconfig file with memory=1GB, 
> also the docker container is limited to 1GB. 
> My requirement is to optimize disk access with this limited memory
  

RE: Query became very slow after 9.6 -> 10 upgrade

2017-11-22 Thread Alex Ignatov
Hello!

What about :

select name,setting from pg_settings where name like '%_cost';

 

--

Alex Ignatov 
Postgres Professional:  <http://www.postgrespro.com> http://www.postgrespro.com 
The Russian Postgres Company

 

 

From: Dmitry Shalashov [mailto:[email protected]] 
Sent: Wednesday, November 22, 2017 5:14 PM
To: [email protected]
Subject: Query became very slow after 9.6 -> 10 upgrade

 

Hi!

 

I've seen few letters like this on mailing list and for some reason thought 
that probably it won't happen to us, but here I am lol.

 

It's "nestloop hits again" situation.

 

I'll try to provide plan from 9.6 later, but right now I have only plan from 
10.1.

 

Query: https://pastebin.com/9b953tT7

It was running under 3 seconds (it's our default timeout) and now it runs for 
12 minutes.

 

\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)

\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy 
condition on day column)

\d domains: https://pastebin.com/65hk7YCm (73000 rows)

 

All three tables are analyzed.

 

EXPLAIN ANALYZE: https://pastebin.com/PenHEgf0

EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)

 

Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of 
RAM and two very, very fast NVME server class SSD's in RAID1.

 

What can I do with it?

 

 

Also maybe this will be useful:

 

1st query, runs under 1ms

select title, id, groups->0->>'provider' provider, domain_ids from adroom where 
groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and 
current_timestamp between start_ts and stop_ts

 

2nd query that uses 1st one, runs under 3 ms

select distinct unnest(domain_ids) FROM (select title, id, 
groups->0->>'provider' provider, domain_ids from adroom where 
groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and 
current_timestamp between start_ts and stop_ts) t1

 

3rd query which returns 1.5mln rows, runs in about 0.6s

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between 
date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', 
current_timestamp)

 

BUT if I'll add to 3rd query one additional condition, which is basically 2nd 
query, it will ran same 12 minutes:

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between 
date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', 
current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM 
(select title, id, groups->0->>'provider' provider, domain_ids from adroom 
where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and 
current_timestamp between start_ts and stop_ts) t1)

 

Plan of last query:

 Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual 
time=3.512..733248.271 rows=1442797 loops=1)

   ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual 
time=3.380..13.561 rows=3043 loops=1)

 Group Key: (unnest(adroom.domain_ids))

 ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual 
time=2.199..2.607 rows=3043 loops=1)

   Group Key: unnest(adroom.domain_ids)

   ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual 
time=0.701..1.339 rows=3173 loops=1)

 ->  Index Scan using adroom_active_idx on adroom  
(cost=0.28..87.27 rows=1 width=167) (actual time=0.688..1.040 rows=4 loops=1)

   Index Cond: ((CURRENT_TIMESTAMP >= start_ts) AND 
(CURRENT_TIMESTAMP <= stop_ts))

   Filter: (((groups -> 0) ->> 'provider'::text) ~ 
'^target_mail_ru'::text)

   Rows Removed by Filter: 41

   ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx 
on adroom_stat  (cost=0.58..25524.33 rows=491 width=16) (actual 
time=104.847..240.846 rows=474 loops=3043)

 Index Cond: ((day >= date_trunc('day'::text, (CURRENT_TIMESTAMP - '7 
days'::interval))) AND (day <= date_trunc('day'::text, CURRENT_TIMESTAMP)) AND 
(domain_id = (unnest(adroom.domain_ids

 Planning time: 1.580 ms

 Execution time: 71.740 ms

 

Dmitry Shalashov,  <http://relap.io/> relap.io &  <http://surfingbird.ru> 
surfingbird.ru



RE: Query became very slow after 9.6 -> 10 upgrade

2017-11-22 Thread Alex Ignatov
Here is my select right after initdb:

 

postgres=# select name,setting from pg_settings where name like '%_cost';

 name | setting

--+-

cpu_index_tuple_cost | 0.005

cpu_operator_cost| 0.0025

cpu_tuple_cost   | 0.01

parallel_setup_cost  | 1000

parallel_tuple_cost  | 0.1

random_page_cost | 4

seq_page_cost| 1

 

 

Can you generate plan with random_page_cost = 4?

 

 

--

Alex Ignatov 
Postgres Professional:  <http://www.postgrespro.com> http://www.postgrespro.com 
The Russian Postgres Company

 

From: Dmitry Shalashov [mailto:[email protected]] 
Sent: Wednesday, November 22, 2017 5:29 PM
To: Alex Ignatov 
Cc: [email protected]
Subject: Re: Query became very slow after 9.6 -> 10 upgrade

 

Sure, here it goes:

 

 name | setting

--+-

 cpu_index_tuple_cost | 0.005

 cpu_operator_cost| 0.0025

 cpu_tuple_cost   | 0.01

 parallel_setup_cost  | 1000

 parallel_tuple_cost  | 0.1

 random_page_cost | 1

 seq_page_cost| 1




 

Dmitry Shalashov,  <http://relap.io/> relap.io &  <http://surfingbird.ru> 
surfingbird.ru

 

2017-11-22 17:24 GMT+03:00 Alex Ignatov mailto:[email protected]> >:

Hello!

What about :

select name,setting from pg_settings where name like '%_cost';

 

--

Alex Ignatov 
Postgres Professional:  <http://www.postgrespro.com> http://www.postgrespro.com 
The Russian Postgres Company

 

 

From: Dmitry Shalashov [mailto:[email protected] <mailto:[email protected]> ] 
Sent: Wednesday, November 22, 2017 5:14 PM
To: [email protected] <mailto:[email protected]> 
Subject: Query became very slow after 9.6 -> 10 upgrade

 

Hi!

 

I've seen few letters like this on mailing list and for some reason thought 
that probably it won't happen to us, but here I am lol.

 

It's "nestloop hits again" situation.

 

I'll try to provide plan from 9.6 later, but right now I have only plan from 
10.1.

 

Query: https://pastebin.com/9b953tT7

It was running under 3 seconds (it's our default timeout) and now it runs for 
12 minutes.

 

\d adroom: https://pastebin.com/vBrPGtxT (3800 rows)

\d adroom_stat: https://pastebin.com/CkBArCC9 (47mln rows, 1.5mln satisfy 
condition on day column)

\d domains: https://pastebin.com/65hk7YCm (73000 rows)

 

All three tables are analyzed.

 

EXPLAIN ANALYZE: https://pastebin.com/PenHEgf0

EXPLAIN ANALYZE with nestloop off: https://pastebin.com/zX35CPCV (0.8s)

 

Regarding server parameters - it's a mighty beast with 2x E5-2630 v3, 192Gb of 
RAM and two very, very fast NVME server class SSD's in RAID1.

 

What can I do with it?

 

 

Also maybe this will be useful:

 

1st query, runs under 1ms

select title, id, groups->0->>'provider' provider, domain_ids from adroom where 
groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and 
current_timestamp between start_ts and stop_ts

 

2nd query that uses 1st one, runs under 3 ms

select distinct unnest(domain_ids) FROM (select title, id, 
groups->0->>'provider' provider, domain_ids from adroom where 
groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and 
current_timestamp between start_ts and stop_ts) t1

 

3rd query which returns 1.5mln rows, runs in about 0.6s

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between 
date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', 
current_timestamp)

 

BUT if I'll add to 3rd query one additional condition, which is basically 2nd 
query, it will ran same 12 minutes:

SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day between 
date_trunc('day', current_timestamp - interval '1 week') and date_trunc('day', 
current_timestamp) AND domain_id IN (select distinct unnest(domain_ids) FROM 
(select title, id, groups->0->>'provider' provider, domain_ids from adroom 
where groups->0->>'provider' ~ '^target_mail_ru' and not is_paused and 
current_timestamp between start_ts and stop_ts) t1)

 

Plan of last query:

 Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual 
time=3.512..733248.271 rows=1442797 loops=1)

   ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual 
time=3.380..13.561 rows=3043 loops=1)

 Group Key: (unnest(adroom.domain_ids))

 ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual 
time=2.199..2.607 rows=3043 loops=1)

   Group Key: unnest(adroom.domain_ids)

   ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual 
time=0.701..1.339 rows=3173 loops=1)

 ->

RE: Bad estimates

2017-11-22 Thread Alex Ignatov
It doesn’t help in this case.

 

--

Alex Ignatov 
Postgres Professional:  <http://www.postgrespro.com> http://www.postgrespro.com 
The Russian Postgres Company

 

From: Don Seiler [mailto:[email protected]] 
Sent: Wednesday, November 22, 2017 5:49 PM
To: Artur Zając 
Cc: [email protected]
Subject: Re: Bad estimates

 

I'm assuming you never analyzed the table after creation & data load? What does 
this show you:

 

select * from pg_stat_all_tables where relname='xyz';

 

Don.

 

-- 

Don Seiler
www.seiler.us <http://www.seiler.us> 



Bad plan chosen for union all

2017-11-28 Thread Alex Reece
 AS contribution_id,
coalesce(cm.yield, im.yield) AS yield,
coalesce(cm.term, im.term) AS term
FROM contributions contrib
JOIN investment_metrics_view im ON im.investment_id = contrib.investment_id
FULL OUTER JOIN contribution_metrics_view cm ON cm.contribution_id =
contrib.id

Best,
~Alex Reece


Re: Bad plan chosen for union all

2017-11-28 Thread Alex Reece
I managed to reduce my test case: the following query does not take
advantage of the index on contribution metrics.

explain select cim.yield
from earnings
JOIN contributions on contributions.id = earnings.note_id
JOIN
(
SELECT contribution_id,
max(CASE metrics.name WHEN 'Yield'::text THEN projected ELSE
NULL::double precision END) AS yield
from contribution_metrics
JOIN  metrics ON metrics.id = metric
group by contribution_id
) cim ON cim.contribution_id = contributions.id
WHERE earnings.id = '\x595400456c1f1400116b3843';

I got this:

  Hash Join  (cost=125.02..147.03 rows=1 width=8) (actual time=4.781..4.906
rows=1 loops=1)
Hash Cond: (contribution_metrics.contribution_id = contributions.id)
->  HashAggregate  (cost=116.86..126.64 rows=3261 width=21) (actual
time=4.157..4.600 rows=3261 loops=1)
  Group Key: contribution_metrics.contribution_id
  ->  Hash Join  (cost=1.11..108.18 rows=5788 width=33) (actual
time=0.021..2.425 rows=5788 loops=1)
Hash Cond: (contribution_metrics.metric = metrics.id)
->  Seq Scan on contribution_metrics  (cost=0.00..85.36
rows=5788 width=34) (actual time=0.006..0.695 rows=5788 loops=1)
->  Hash  (cost=1.05..1.05 rows=17 width=25) (actual
time=0.009..0.009 rows=17 loops=1)
  ->  Seq Scan on metrics  (cost=0.00..1.05 rows=17
width=25) (actual time=0.002..0.005 rows=17 loops=1)
->  Hash  (cost=8.15..8.15 rows=1 width=26) (actual time=0.022..0.022
rows=1 loops=1)
  ->  Nested Loop  (cost=0.14..8.15 rows=1 width=26) (actual
time=0.019..0.020 rows=1 loops=1)
->  Index Scan using earnings_pkey on earnings
(cost=0.06..4.06 rows=1 width=13) (actual time=0.009..0.009 rows=1 loops=1)
  Index Cond: (id = '\x595400456c1f1400116b3843'::bytea)
->  Index Only Scan using contributions_pkey on
contributions  (cost=0.08..4.09 rows=1 width=13) (actual time=0.008..0.009
rows=1 loops=1)
  Index Cond: (id = earnings.note_id)
  Planning time: 0.487 ms
  Execution time: 4.975 ms

But I expected it to be equivalent to the plan from this query:

select cim.yield from (
select contribution_id,
max(CASE metrics.name WHEN 'Yield'::text THEN projected ELSE NULL::double
precision END) AS yield
from contribution_metrics JOIN  metrics ON metrics.id = metric group by
contribution_id
) cim where cim.contribution_id = (
select contributions.id from contributions
join earnings on earnings.note_id = contributions.id
where earnings.id = '\x595400456c1f1400116b3843')

Which gives me _this_ plan, that correctly uses the index on
contribution_metrics.

  Subquery Scan on cim  (cost=9.32..14.23 rows=2 width=8) (actual
time=0.108..0.108 rows=1 loops=1)
InitPlan 1 (returns $1)
  ->  Nested Loop  (cost=0.14..8.15 rows=1 width=13) (actual
time=0.054..0.055 rows=1 loops=1)
->  Index Scan using earnings_pkey on earnings
(cost=0.06..4.06 rows=1 width=13) (actual time=0.025..0.026 rows=1 loops=1)
  Index Cond: (id = '\x595400456c1f1400116b3843'::bytea)
->  Index Only Scan using contributions_pkey on contributions
(cost=0.08..4.09 rows=1 width=13) (actual time=0.026..0.026 rows=1 loops=1)
  Index Cond: (id = earnings.note_id)
->  GroupAggregate  (cost=1.17..6.07 rows=2 width=21) (actual
time=0.108..0.108 rows=1 loops=1)
  Group Key: contribution_metrics.contribution_id
  ->  Hash Join  (cost=1.17..6.07 rows=2 width=33) (actual
time=0.100..0.101 rows=2 loops=1)
Hash Cond: (contribution_metrics.metric = metrics.id)
->  Index Scan using
contribution_metrics_contribution_id_metric_last_update_on_idx1 on
contribution_metrics  ( cost=0.06..4.95 rows=2 width=34) (actual time
  Index Cond: (contribution_id = $1)
->  Hash  (cost=1.05..1.05 rows=17 width=25) (actual
time=0.012..0.012 rows=17 loops=1)
  ->  Seq Scan on metrics  (cost=0.00..1.05 rows=17
width=25) (actual time=0.004..0.006 rows=17 loops=1)
  Planning time: 0.396 ms
  Execution time: 0.165 ms

schema here:
https://gist.github.com/awreece/aeacbc818277c7c6d99477645e7fcd03

Best,
~Alex



On Tue, Nov 28, 2017 at 2:13 AM Alex Reece  wrote:

> I'm on PostgreSQL 9.6.5 and getting an awkwardly bad plan chosen for my
> query.
>
> I want to do:
>
> select investments.id, cim.yield
> FROM contributions
> JOIN investments ON contributions.investment_id = investments.id
> JOIN contribution_investment_metrics_view cim ON cim.investment_id =
> investments.id
> WHERE contributions.id IN ('\x58c9c0d3ee944c48b32f814d', '\x11')
> Where contribution_investment_metrics_view is morally
>
> select investment_id, first(val) from (select * from contribution_metrics
> 

Re: Bad plan chosen for union all

2017-11-28 Thread Alex Reece
One more thing. Given this:


> The difference here is that, from the perspective of the outer query,
> the WHERE condition is a restriction clause on the "cim" relation,
> not a join clause.  So it will get pushed down into the subquery
> without creating any join order constraints on the outer query.


I expected the lateral form of the query to properly use the indexes. Sure
enough, this correctly uses the index:

explain select cim.yield
from earnings
JOIN contributions on contributions.id = earnings.note_id
JOIN LATERAL
(
SELECT contribution_id,
   max(CASE metrics.name WHEN 'Yield'::text THEN projected ELSE
NULL::double precision END) AS yield
from contribution_metrics
JOIN  metrics ON metrics.id = metric WHERE contributions.id =
contribution_id
group by contribution_id
) cim ON true
WHERE earnings.id = '\x595400456c1f1400116b3843'

However, when I try to wrap that subquery query again (e.g. as I would need
to if it were a view), it doesn't restrict:

select cim.yield
from earnings

JOIN contributions on contributions.id = earnings.note_id
JOIN LATERAL
(
select * from
(
SELECT contribution_id,
   max(CASE metrics.name WHEN 'Yield'::text THEN projected ELSE
NULL::double precision END) AS yield
from contribution_metrics
JOIN  metrics ON metrics.id = metric

group by contribution_id
) my_view WHERE contribution_id = contributions.id
) cim ON true
WHERE earnings.id = '\x595400456c1f1400116b3843'

Is there a way I can get the restriction to be pushed down into my subquery
in this lateral form?

Best,
~Alex


Different plan chosen when in lateral subquery

2017-12-05 Thread Alex Reece
note_id,
allocated_on DESC)
"portfolio_allocations_pnsa" btree (entity_id, note_id, series_id,
allocated_on DESC)

Indexes:
"contributions_pkey" PRIMARY KEY, btree (id)
"contributions_id_accrue_from_idx" btree (id,
events_earnings_accrue_from)

I have a few questions here:
  - Why doesn't it use the primary key index in either case?
  - Why isn't it choosing portfolio_allocations_pnsa, which seems like it
would prevent it from having to sort?

Best,
~Alex


Re: Different plan chosen when in lateral subquery

2017-12-05 Thread Alex Reece
Weird, when I deleted an erroneous index it started picking a reasonable
plan. This now works as expected, for posterity here is the bad plan:

 Nested Loop  (cost=21281.50..21323812.82 rows=5621000 width=47) (actual
time=171.648..7233.298 rows=85615 loops=1)

   ->  Function Scan on generate_series dates  (cost=0.00..3.00 rows=1000
width=8) (actual time=0.031..0.252 rows=267 loops=1)

   ->  Unique  (cost=21281.50..21290.08 rows=5621 width=39) (actual
time=25.730..27.050 rows=321 loops=267)

 ->  Sort  (cost=21281.50..21284.36 rows=5724 width=39) (actual
time=25.728..26.242 rows=6713 loops=267)

   Sort Key: alloc.note_id, alloc.series_id

   Sort Method: quicksort  Memory: 2220kB

   ->  Nested Loop  (cost=10775.92..21210.05 rows=5724
width=39) (actual time=1.663..21.938 rows=6713 loops=267)

 ->  Hash Join  (cost=10775.83..20355.61 rows=5724
width=52) (actual time=1.657..5.980 rows=6713 loops=267)

   Hash Cond: (alloc.note_id = contrib.id)

   ->  Bitmap Heap Scan on portfolio_allocations
alloc  (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278
rows=6713 loops=267)

 Recheck Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
date(dates.dates)))

 Heap Blocks: exact=118074

 ->  Bitmap Index Scan on
portfolio_allocations_entity_id_allocated_on_idx  (cost=0.00..69.53
rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo

   Index Cond: ((entity_id =
'\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
date(dates.dates)))

   ->  Hash  (cost=9464.85..9464.85 rows=354617
width=26) (actual time=169.792..169.792 rows=354617 loops=1)

 Buckets: 524288  Batches: 1  Memory Usage:
24296kB

 ->  Seq Scan on contributions contrib
 (cost=0.00..9464.85
rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1)

 ->  Index Only Scan using investments_pkey on
investments inv  (cost=0.08..0.15 rows=1 width=13) (actual
time=0.002..0.002 rows=1 loops=1792457)

   Index Cond: (id = contrib.investment_id)

   Heap Fetches: 1792457

 Planning time: 0.721 ms

 Execution time: 7236.507 ms


On Tue, Dec 5, 2017 at 10:04 AM Alex Reece  wrote:

> I get very different plan chosen when my query is in a lateral subquery vs
> standalone -- it doesn't use a key when joining on a table, instead opting
> to do a hash join. Here is the query:
>
> select distinct on (sub.entity_id, sub.note_id, sub.series_id)
>entity_id, note_id, series_id
> from
> (
> select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount,
> inv.name
> from public.portfolio_allocations alloc
> JOIN contributions contrib on contrib.id = alloc.note_id
> JOIN investments inv on inv.id = contrib.investment_id
> where entity_id = '\x5787f132f50f7b03002cf835' and
> alloc.allocated_on <= dates.date
> ) sub
>
> And wrapped inside the lateral:
>
> explain analyze
> select *
> from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,
>   current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
> LATERAL (
>  ...  ...
> ) lat
>
> Run by itself injecting a hard coded value for dates.date, I get the
> expected plan which uses a key index on contributions:
>
>   Unique  (cost=14.54..14.54 rows=2 width=39) (actual
> time=0.052..0.053 rows=2 loops=1)
>  ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual
> time=0.052..0.052 rows=2 loops=1)
>Sort Key: alloc.note_id, alloc.series_id
>Sort Method: quicksort  Memory: 25kB
>->  Nested Loop  (cost=0.25..14.53 rows=2 width=39) (actual
> time=0.030..0.042 rows=2  loops=1)
>  ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52)
> (actual time=0.022..0.028   rows=2 loops=1)
>->  Index Scan using
> portfolio_allocations_entity_id_allocated_on_idx on
> portfolio_allocations alloc  (cost=0.09..6.05 rows=2 width=39) (actual
>  time=0.012..0.014
>  Index Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND  (allocated_on <= '2017-03-14
> 20:59:59.999+00'::timestamp with time   zone))
>->  Index Scan using
> contributions_id_accrue_from_idx on contributions contrib
> (cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005   

Re: Different plan chosen when in lateral subquery

2017-12-05 Thread Alex Reece
Argh, so sorry for repeated posts; I'll be very careful to review them
before posting. The "good plan" was the result of me hard coding '2017-03-14
20:59:59.999+00'::timestamp of using dates.date inside the lateral
subquery. When I correctly use dates.date, it takes 7000ms instead of
0.3ms. My questions still remain:

I have a few questions here:
  - Why doesn't it use the primary key on contributions in either case,
preferring contributions_id_accrue_from_idx or none at all?
  - Why isn't it choosing portfolio_allocations_pnsa, which seems like it
would prevent it from having to sort?
  - What information can I gather to answer these questions on my own?

~Alex

On Tue, Dec 5, 2017 at 10:08 AM Alex Reece  wrote:

> Weird, when I deleted an erroneous index it started picking a reasonable
> plan. This now works as expected, for posterity here is the bad plan:
>
>  Nested Loop  (cost=21281.50..21323812.82 rows=5621000 width=47) (actual
> time=171.648..7233.298 rows=85615 loops=1)
>
>->  Function Scan on generate_series dates  (cost=0.00..3.00 rows=1000
> width=8) (actual time=0.031..0.252 rows=267 loops=1)
>
>->  Unique  (cost=21281.50..21290.08 rows=5621 width=39) (actual
> time=25.730..27.050 rows=321 loops=267)
>
>  ->  Sort  (cost=21281.50..21284.36 rows=5724 width=39) (actual
> time=25.728..26.242 rows=6713 loops=267)
>
>Sort Key: alloc.note_id, alloc.series_id
>
>Sort Method: quicksort  Memory: 2220kB
>
>->  Nested Loop  (cost=10775.92..21210.05 rows=5724
> width=39) (actual time=1.663..21.938 rows=6713 loops=267)
>
>  ->  Hash Join  (cost=10775.83..20355.61 rows=5724
> width=52) (actual time=1.657..5.980 rows=6713 loops=267)
>
>Hash Cond: (alloc.note_id = contrib.id)
>
>->  Bitmap Heap Scan on portfolio_allocations
> alloc  (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278
> rows=6713 loops=267)
>
>  Recheck Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
> date(dates.dates)))
>
>  Heap Blocks: exact=118074
>
>  ->  Bitmap Index Scan on
> portfolio_allocations_entity_id_allocated_on_idx  (cost=0.00..69.53
> rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo
>
>Index Cond: ((entity_id =
> '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
> date(dates.dates)))
>
>->  Hash  (cost=9464.85..9464.85 rows=354617
> width=26) (actual time=169.792..169.792 rows=354617 loops=1)
>
>  Buckets: 524288  Batches: 1  Memory
> Usage: 24296kB
>
>  ->  Seq Scan on contributions contrib  
> (cost=0.00..9464.85
> rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1)
>
>  ->  Index Only Scan using investments_pkey on
> investments inv  (cost=0.08..0.15 rows=1 width=13) (actual
> time=0.002..0.002 rows=1 loops=1792457)
>
>Index Cond: (id = contrib.investment_id)
>
>Heap Fetches: 1792457
>
>  Planning time: 0.721 ms
>
>  Execution time: 7236.507 ms
>
>
> On Tue, Dec 5, 2017 at 10:04 AM Alex Reece  wrote:
>
>> I get very different plan chosen when my query is in a lateral subquery
>> vs standalone -- it doesn't use a key when joining on a table, instead
>> opting to do a hash join. Here is the query:
>>
>> select distinct on (sub.entity_id, sub.note_id, sub.series_id)
>>entity_id, note_id, series_id
>> from
>> (
>> select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount,
>> inv.name
>> from public.portfolio_allocations alloc
>> JOIN contributions contrib on contrib.id = alloc.note_id
>> JOIN investments inv on inv.id = contrib.investment_id
>> where entity_id = '\x5787f132f50f7b03002cf835' and
>> alloc.allocated_on <= dates.date
>> ) sub
>>
>> And wrapped inside the lateral:
>>
>> explain analyze
>> select *
>> from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,
>>   current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
>> LATERAL (
>>  ...  ...
>> ) lat
>>
>> Run by itself injecting a hard coded value for dates.date, I get the
>> expected plan which uses a key index on contributions:
>>
>>  

Table with large number of int columns, very slow COPY FROM

2017-12-07 Thread Alex Tokarev
Hi,

I have a set of tables with fairly large number of columns, mostly int with
a few bigints and short char/varchar columns. I¹ve noticed that Postgres is
pretty slow at inserting data in such a table. I tried to tune every
possible setting: using unlogged tables, increased shared_buffers, etc; even
placed the db cluster on ramfs and turned fsync off. The results are pretty
much the same with the exception of using unlogged tables that improves
performance just a little bit.

I have made a minimally reproducible test case consisting of a table with
848 columns, inserting partial dataset of 100,000 rows with 240 columns. On
my dev VM the COPY FROM operation takes just shy of 3 seconds to complete,
which is entirely unexpected for such a small dataset.

Here¹s a tarball with test schema and data:
http://nohuhu.org/copy_perf.tar.bz2; it¹s 338k compressed but expands to
~50mb. Here¹s the result of profiling session with perf:
https://pastebin.com/pjv7JqxD


-- 
Regards,
Alex.





RE: pg_xlog unbounded growth

2018-02-07 Thread Alex Ignatov
Have you tried 
archive_command='/bin/true' 
as Andreas wrote?

-Original Message-
From: Stefan Petrea [mailto:[email protected]] 
Sent: Wednesday, January 24, 2018 2:48 PM
To: [email protected]
Subject: pg_xlog unbounded growth

Hello,

This email is structured in sections as follows:

1 - Estimating the size of pg_xlog depending on postgresql.conf parameters
2 - Cleaning up pg_xlog using a watchdog script
3 - Mailing list survey of related bugs
4 - Thoughts

We're using PostgreSQL 9.6.6 on a Ubuntu 16.04.3 LTS.
During some database imports(using pg_restore), we're noticing fast and
unbounded growth of pg_xlog up to the point where the partition(280G in size
for us) that stores it fills up and PostgreSQL shuts down. The error seen in
the logs:

2018-01-17 01:46:23.035 CST [41671] LOG:  database system was shut down
at 2018-01-16 15:49:26 CST
2018-01-17 01:46:23.038 CST [41671] FATAL:  could not write to file
"pg_xlog/xlogtemp.41671": No space left on device
2018-01-17 01:46:23.039 CST [41662] LOG:  startup process (PID 41671)
exited with exit code 1
2018-01-17 01:46:23.039 CST [41662] LOG:  aborting startup due to
startup process failure
2018-01-17 01:46:23.078 CST [41662] LOG:  database system is shut down

The config settings I thought were relevant are these ones (but I'm also
attaching the entire postgresql.conf if there are other ones that I missed):

wal_level=replica
archive_command='exit 0;'
min_wal_size=2GB
max_wal_size=500MB
checkpoint_completion_target = 0.7
wal_keep_segments = 8

So currently the pg_xlog is growing a lot, and there doesn't seem to be any
way to stop it.

There are some formulas I came across that allow one to compute the maximum
number of WAL allowed in pg_xlog as a function of the PostgreSQL config
parameters.

1.1) Method from 2012 found in [2]

The formula for the upper bound for WAL files in pg_xlog is 

(2 + checkpoint_completion_target) * checkpoint_segments + 1 which is ( (2 +
0.7) * (2048/16 * 1/3 ) ) + 1 ~ 116 WAL files

I used the 1/3 because of [6] the shift from checkpoint_segments to
max_wal_size in 9.5 , the relevant quote from the release notes being:

If you previously adjusted checkpoint_segments, the following formula
will give you an approximately equivalent setting:
max_wal_size = (3 * checkpoint_segments) * 16MB

Another way of computing it, also according to [2] is the following
2 * checkpoint_segments + wal_keep_segments + 1 which is (2048/16) + 8 + 1 =
137  WAL files

So far we have two answers, in practice none of them check out, since
pg_xlog grows indefinitely.

1.2) Method from the PostgreSQL internals book 

The book [4] says the following:

it could temporarily become up to "3 * checkpoint_segments + 1"

Ok, let's compute this too, it's 3 * (128/3) + 1 = 129 WAL files

This doesn't check out either.

1.3) On the mailing list [3] , I found similar formulas that were seen
previously.

1.4) The post at [5] says max_wal_size is as soft limit and also sets
wal_keep_segments = 0 in order to enforce keeping as little WAL as possible
around.  Would this work?

Does wal_keep_segments = 0 turn off WAL recycling? Frankly, I would rather
have WAL not be recycled/reused, and just deleted to keep pg_xlog below
expected size.

Another question is, does wal_level = replica affect the size of pg_xlog in
any way?  We have an archive_command that just exits with exit code 0, so I
don't see any reason for the pg_xlog files to not be cleaned up.

2) Cleaning up pg_xlog using a watchdog script

To get the import done I wrote a script that's actually inspired from a blog
post where the pg_xlog out of disk space problem is addressed [1].  It
periodically reads the last checkpoint's REDO WAL file, and deletes all WAL
in pg_xlog before that one. 

The intended usage is for this script to run alongside the imports in order
for pg_xlog to be cleaned up gradually and prevent the disk from filling up.

Unlike the blog post and probably slightly wrong is that I used
lexicographic ordering and not ordering by date. But I guess it worked
because the checks were frequent enough that no WAL ever got recycled. In
retrospect I should've used the date ordering.

Does this script have the same effect as checkpoint_completion_target=0 ?

At the end of the day, this script seems to have allowed the import we
needed to get done, but I acknowledge it was a stop-gap measure and not a
long-term solution, hence me posting on the mailing list to find a better
solution.

3) Mailing list survey of related bugs

On the mailing lists, in the past, there have been bugs around pg_xlog
growing out of control:

BUG 7902 [7] - Discusses a situation where WAL are produced faster than
checkpoints can be completed(written to disk), and therefore the WALs in
pg_xlog cannot be recycled/deleted.  The status of this bug report is
unclear. I have a feeling it's still open. Is that the case?

BUG 14340 [9] - A user(Sonu Gu

Getting an index scan to be a parallel index scan

2023-01-31 Thread Alex Kaiser
Hello,

I'm trying to get the following query to use a plan with parallelism, but I
haven't been successful and would like some advice.

The schema and table that I'm using is this:

CREATE TABLE testing(
   id INT,
   info INT,
   data_one TEXT,
   data_two TEXT,
   primary key(id, info)
);

INSERT INTO testing(id, info, data_one, data_two)
SELECT idx, idx, md5(random()::text), md5(random()::text)
FROM generate_series(1,1000) idx;

Then the query that I'm trying to run is this (I'll include the full query
at the very end of the email because it is long:

select * from testing where id in (1608377,5449811, ... <1000 random ids>
,4654284,3558460);

Essentially I have a list of 1000 ids and I would like the rows for all of
those ids.

This seems like it would be pretty easy to parallelize, if you have X
threads then you would split the list of IDs into 1000/X sub lists and give
one to each thread to go find the rows for ids in the given list.  Even
when I use the following configs I don't get a query plan that actually
uses any parallelism:

psql (15.1 (Debian 15.1-1.pgdg110+1))
Type "help" for help.

postgres=# show max_parallel_workers;
 max_parallel_workers
--
 8
(1 row)

postgres=# set max_parallel_workers_per_gather = 8;
SET
postgres=# set parallel_setup_cost = 0;
SET
postgres=# set parallel_tuple_cost = 0;
SET
postgres=# set force_parallel_mode = on;
SET
postgres=# explain select * from testing where id in (1608377,5449811, ...
 ... ,4654284,3558460);

 QUERY PLAN
---
 Gather  (cost=0.43..6138.81 rows=1000 width=74)
   Workers Planned: 1
   Single Copy: true
   ->  Index Scan using testing_pkey on testing  (cost=0.43..6138.81
rows=1000 width=74)
 Index Cond: (id = ANY ('{1608377,5449811 ... 
... 4654284,3558460}'::integer[]))
(5 rows)

postgres=# explain (analyze, buffers) select * from testing where id in
(1608377,5449811, ...  ... ,4654284,3558460);

 QUERY PLAN
---
 Gather  (cost=0.43..6138.81 rows=1000 width=74) (actual
time=22.388..59.860 rows=1000 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   Buffers: shared hit=4003
   ->  Index Scan using testing_pkey on testing  (cost=0.43..6138.81
rows=1000 width=74) (actual time=0.443..43.660 rows=1000 loops=1)
 Index Cond: (id = ANY ('{1608377,5449811 ... 
... 4654284,3558460}'::integer[]))
 Buffers: shared hit=4003
 Planning Time: 3.101 ms
 Execution Time: 60.211 ms
(10 rows)

postgres=# explain select * from testing;
QUERY PLAN
--
 Gather  (cost=0.00..153334.10 rows=1050 width=74)
   Workers Planned: 5
   ->  Parallel Seq Scan on testing  (cost=0.00..153334.10 rows=210
width=74)
(3 rows)


That last query is just to show that I can get parallel plans, so they
aren't completely turned off.

Is there a particular reason why this query can't be parallelized? Or is
there some other way I could structure the query to get it to use
parallelism?

I've tried this both on PG 15.1 (In docker, which seems to be on Debian
15.1) and PG 14.5 (on Centos 7) and got the same results

Thanks,
Alex Kaiser

Full query:
select * from testing where id in
(1608377,5449811,5334677,5458230,2053195,3572313,1949724,3559988,5061560,8479775,6604845,1531946,8578236,1767138,1887562,9224796,801839,1389073,2070024,3378634,5935175,253322,6503217,492190,1646143,6073879,6344884,3120926,6077454,7988246,2359088,2758185,2277417,6144637,7869743,450645,2675170,307844,2752378,9765759,7604173,4702773,9447882,6403407,1020813,2421819,2246889,6118484,5675269,38400,989987,5226654,2910389,9741575,5909526,8752890,1429931,3598345,9541469,6728532,2454806,6470370,6338418,2525642,2286146,9319587,5821710,4138188,8677346,2188096,3242293,9711468,8308979,6505437,5620847,5870305,5177061,7519783,1441852,8264516,7637571,1994901,3979976,8828452,6327321,4377585,608,2620337,9944860,7822890,664424,8832299,8564521,4978015,5910646,8527205,3573524,996558,1270265,7774940,1747145,104339,6867262,9128122,1303267,3810412,2694329,7145818,6719318,3789062,9870348,986684,5603862,1698361,7732472,2816324,1337682,5012390,2309943,1691809,3480539,49005,6857269,913,2599309,2515895,4568931,641192,781186,4762944,13013,4987725,8990541,5654081,193138,4012985,2884209,5352762,9816619,1363209,3019900,8276055,2129378,1121730,7607112,5210575,3288097,1489630,1163497,7136711,9799

Re: Getting an index scan to be a parallel index scan

2023-02-01 Thread Alex Kaiser
Rainier,

I tried using the any syntax (had to modify your query slightly) and it
didn't result in any change in the query plan.

postgres=# explain select * from testing where id =
ANY(array[1608377,5449811, ...  ...
,4654284,3558460]::integer[]);
QUERY PLAN
--
 Gather  (cost=0.43..6138.81 rows=1000 width=74)
   Workers Planned: 1
   Single Copy: true
   ->  Index Scan using testing_pkey on testing  (cost=0.43..6138.81
rows=1000 width=74)
 Index Cond: (id = ANY ('{1608377,5449811, ...  ... ,4654284,3558460}'::integer[]))

I've never messed around with extended statistics, but I'm not sure how
they would help here. From what I've read they seem to help when your query
is restricting over multiple columns. Since this query is only on one
column I'm not sure what a good "CREATE STATISTICS ..." command to run
would be to improve the query plan. Any suggestions?


David,

As for how I found 'force_parallel_mode', I think I found it first here:
https://postgrespro.com/list/thread-id/2574997 and then I also saw it when
I was searching for 'parallel' on https://postgresqlco.nf .

It's not that I think the parameter would help my query, it was really as a
last resort to try and force the query to be parallel. Without that
parameter, it just does a normal index scan (see the result below). My
thinking with using that parameter was to see if I could force a parallel
query plan just to see if maybe the planner just thought the parallel plan
would be more expensive. So I was surprised to see that even with that
parameter turned on it doesn't actually do anything in parallel.  Here is
the plan with that parameter turned off:

postgres=# set force_parallel_mode = off;
SET
postgres=# explain select * from testing where id =
ANY(array[1608377,5449811, ...  ...
,4654284,3558460]::integer[]);
QUERY PLAN

 Index Scan using testing_pkey on testing  (cost=0.43..6138.81 rows=1000
width=74)
   Index Cond: (id = ANY ('{1608377,5449811, ... < removed for brevity >
... 4654284,3558460}'::integer[]))
(2 rows)


Thanks,
Alex Kaiser

On Wed, Feb 1, 2023 at 3:30 AM David Rowley  wrote:

> On Wed, 1 Feb 2023 at 18:39, Alex Kaiser  wrote:
> > postgres=# set force_parallel_mode = on;
>
> There's been a bit of debate about that GUC and I'm wondering how you
> came to the conclusion that it might help you. Can you share details
> of how you found out about it and what made you choose to set it to
> "on"?
>
> David
>


Re: Getting an index scan to be a parallel index scan

2023-02-01 Thread Alex Kaiser
Justin,

I did try changing min_parallel_index_scan_size /
min_parallel_table_scan_size and didn't see any change (the below is with
force_parallel_mode = off):

postgres=# set min_parallel_index_scan_size = 0;
SET
postgres=# set min_parallel_table_scan_size = 0;
SET
postgres=# explain select * from testing where id =
ANY(array[1608377,5449811, ...  ...
,4654284,3558460]::integer[]);
QUERY PLAN

 Index Scan using testing_pkey on testing  (cost=0.43..6138.81 rows=1000
width=74)
   Index Cond: (id = ANY ('{1608377,5449811, ... < removed for brevity >
... 4654284,3558460}'::integer[]))
(2 rows)


As for 'force_parallel_mode', while this isn't "debugging PG", it isn't
something that I would actually turn on production, just something I was
playing with to see the cost of parallel queries when the planner might not
think they are the most efficient.


Thomas,

Thanks for the explanation. Yes, that is the query plan I was imagining. I
do see how chopping it up could result in an unfair distribution. But my
counter to that would be that wouldn't chopping it up still be better than
not. If things do happen to work out to be fair, now it's X times as fast,
if things are very unfair, then you haven't really lost much (besides the
parallel overhead) compared to the non-parallel query. Or maybe it should
be possible to do the parallel query if there were some statistics (either
normal ones or extended ones) that told the planner that the result would
probably be fair?

Though I do agree that the "work stealing" option would be the most
efficient, but would be a lot more complicated to code up.

I tried out inserting into a separate table, and as you guessed that
worked. For my production scenario that isn't really feasible, but still
cool to see it work.


postgres=# create table ids(
  probe_id int PRIMARY KEY
);

insert into ids(probe_id) values (774494);
insert into ids(probe_id) values (9141914);
...

postgres=# select count(*) from ids;
 count
---
  1000
(1 row)

postgres=# explain select * from testing where id in (select * from ids);
   QUERY PLAN
-
 Gather  (cost=0.43..3504.67 rows=1000 width=74)
   Workers Planned: 2
   ->  Nested Loop  (cost=0.43..3504.67 rows=417 width=74)
 ->  Parallel Seq Scan on ids  (cost=0.00..9.17 rows=417 width=4)
 ->  Index Scan using testing_pkey on testing  (cost=0.43..8.37
rows=1 width=74)
   Index Cond: (id = ids.probe_id)
(6 rows)

Thanks,
Alex Kaiser

On Wed, Feb 1, 2023 at 1:52 PM Thomas Munro  wrote:

> On Wed, Feb 1, 2023 at 6:39 PM Alex Kaiser  wrote:
> > select * from testing where id in (1608377,5449811, ... <1000 random
> ids> ,4654284,3558460);
> >
> > Essentially I have a list of 1000 ids and I would like the rows for all
> of those ids.
> >
> > This seems like it would be pretty easy to parallelize, if you have X
> threads then you would split the list of IDs into 1000/X sub lists and give
> one to each thread to go find the rows for ids in the given list.  Even
> when I use the following configs I don't get a query plan that actually
> uses any parallelism:
>
> It sounds like the plan you are imagining is something like:
>
> Gather
>   Nested Loop Join
> Outer side: 
> Inner side: Index scan of your big table
>
> Such a plan would only give the right answer if each process has a
> non-overlapping subset of the constant values to probe the index with,
> and together they have the whole set.  Hypothetically, a planner could
> chop that set up beforehand and and give a different subset to each
> process (just as you could do that yourself using N connections and
> separate queries), but that might be unfair: one process might find
> lots of matches, and the others might find none, because of the
> distribution of data.  So you'd ideally want some kind of "work
> stealing" scheme, where each worker can take more values to probe from
> whenever it needs more, so that they all keep working until the values
> run out.  We don't have a thing that can do that.  You might imagine
> that a CTE could do it, so WITH keys_to_look_up AS (VALUES (1), (2),
> ...) SELECT ... JOIN ON ..., but that also doesn't work because we
> don't have a way to do "partial" scans of CTEs either (though someone
> could invent that).  Likewise for temporary tables: they are invisible
> to parallel workers, so they can't help us.  I have contemplated
> "partial function scans" for set-returning functions, wher

Re: Getting an index scan to be a parallel index scan

2023-02-01 Thread Alex Kaiser
Okay after reading
http://rhaas.blogspot.com/2018/06/using-forceparallelmode-correctly.html I
do see that I was using force_parallel_mode incorectly and wouldn't have
gotten what I wanted even if the original query was possible to parallelize.

> Maybe, but unfairness multiplies if it's part of a larger plan

Ah, I didn't think of that, and it's a good point.

> Ok I hacked my copy of PostgreSQL to let me set parallel_setup_costs
> to negative numbers ...

Thanks for taking the time to do that and look into that.  I don't actually
think it's worth the confusion to allow this in general, but I was thinking
that setting "force_parallel_mode = on" would essentially be doing
something equivalent to this (though I now see that is wrong).

> But it's probing every index for every one of the values in the big
> list, not just the ones that have a non-zero chance of finding a
> match, which is a waste of cycles.

In my case, this would actually be quite helpful because the real
bottleneck when I run this in production is time spent waiting for IO.  I
was hoping to spread that IO wait time over multiple threads, and wouldn't
really care about the few extra wasted CPU cycles.  But I can't actually do
this as I can't set parallel_setup_costs to be negative, so I wouldn't be
able to get PG to choose the parallel plan even if I did partition the
table.

> If I had more timerons myself ...

If only we all had more timerons ... :)

Thanks,
Alex Kaiser

On Wed, Feb 1, 2023 at 6:12 PM David Rowley  wrote:

> On Thu, 2 Feb 2023 at 14:49, Thomas Munro  wrote:
> > If I had more timerons myself, I'd like to try to make parallel
> > function scans, or parallel CTE scans, work...
>
> I've not really looked in detail but I thought parallel VALUES scan
> might be easier than those two.
>
> David
>