update non-indexed value is slow if some non-related index/fk are enabled

2021-09-05 Thread Philippe Doussot

Hi all,

I'm look for some hint about this behaviour :


One UPDATE of one boolean value which is non-indexed take some time ( for many 
tuples .. 3 Millions ).

But if we disable all indexes/fk the UPDATE is 10x faster and do 10x less 
shared hit .

I don't understand why disabling all index from the table speed up the update 
because the boolean column is not indexed


For exemple , disabling the index was done like this :

UPDATE pg_index SET indisready=false WHERE indexrelid in 
(620809,620837,620839,620841,620854,618764,620855,620790,620790,620840);


In both UPDATE with or without index enabled/disable, the PLAN is the same.


Postgres 9.5 :
Explain with index : slow
Update on public.t_acte_entite (cost=0.00..52924.83 rows=2758149 width=45) 
(actual time=120908.338..120908.338 rows=0 loops=1)
Buffers: shared hit=101546408 read=164106 dirtied=171996 written=9529
-> Seq Scan on public.t_acte_entite (cost=0.00..52924.83 rows=2758149 width=45) 
(actual time=0.028..1783.946 rows=3033768 loops=1)
Output: id_acte_entite, fk_acte, fk_rubrique, fk_client, fk_salarie, 
fk_cycldet, fk_planning, act_user_modified, fk_ssiad_visite, 
fk_ssiad_visite_planning, true, acte_realise, acte_realise_modifie, ctid
Filter: ((NOT t_acte_entite.acte_prevu) AND 
((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR 
(t_acte_entite.fk_ssiad_visite IS NOT NULL)))
Rows Removed by Filter: 16115
Buffers: shared hit=10 read=22416 written=9441
Planning time: 0.133 ms
Execution time: 120908.387 ms

Explain with index disabled :
Update on public.t_acte_entite (cost=0.00..52924.83 rows=2757021 width=45) 
(actual time=9007.353..9007.353 rows=0 loops=1)
Buffers: shared hit=9145912 read=44740 dirtied=44620 written=60
-> Seq Scan on public.t_acte_entite (cost=0.00..52924.83 rows=2757021 width=45) 
(actual time=2.608..1121.776 rows=3033768 loops=1)
Output: id_acte_entite, fk_acte, fk_rubrique, fk_client, fk_salarie, 
fk_cycldet, fk_planning, act_user_modified, fk_ssiad_visite, 
fk_ssiad_visite_planning, true, acte_realise, acte_realise_modifie, ctid
Filter: ((NOT t_acte_entite.acte_prevu) AND 
((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR 
(t_acte_entite.fk_ssiad_visite IS NOT NULL)))
Rows Removed by Filter: 16115
Buffers: shared hit=1 read=22425 written=60
Planning time: 5.835 ms
Execution time: 9007.400 ms

You can see that the Buffers: shared hit= drop from 101 Millions to 10Million 
without the indexes
But as the query plan only do a Sec Scan and don't acces an index I'm little 
surprised.


Same on postgres 13 :

//without explain to know how many update are done :
test=# update t_acte_entite set acte_prevu = true where 
(fk_ssiad_visite_planning is not null or fk_ssiad_visite is not null);
UPDATE 3049406

//First explain ( with index enabled )
test=# explain (analyse, COSTS, buffers, verbose, FORMAT JSON) update 
t_acte_entite set acte_prevu = true where (fk_ssiad_visite_planning is not null 
or fk_ssiad_visite is not null);
[
  {
    "Plan": {
  "Node Type": "ModifyTable",
  "Operation": "Update",
  "Parallel Aware": false,
  "Relation Name": "t_acte_entite",
  "Schema": "public",
  "Alias": "t_acte_entite",
  "Startup Cost": 0.00,
  "Total Cost": 87603.92,
  "Plan Rows": 2748711,
  "Plan Width": 46,
  "Actual Startup Time": 74072.048,
  "Actual Total Time": 74072.049,
  "Actual Rows": 0,
  "Actual Loops": 1,
  "Shared Hit Blocks": 89724572,
  "Shared Read Blocks": 115370,
  "Shared Dirtied Blocks": 101855,
  "Shared Written Blocks": 54351,
  "Local Hit Blocks": 0,
  "Local Read Blocks": 0,
  "Local Dirtied Blocks": 0,
  "Local Written Blocks": 0,
  "Temp Read Blocks": 0,
  "Temp Written Blocks": 0,
  "Plans": [
    {
  "Node Type": "Seq Scan",
  "Parent Relationship": "Member",
  "Parallel Aware": false,
  "Relation Name": "t_acte_entite",
  "Schema": "public",
  "Alias": "t_acte_entite",
  "Startup Cost": 0.00,
  "Total Cost": 87603.92,
  "Plan Rows": 2748711,
  "Plan Width": 46,
  "Actual Startup Time": 0.047,
  "Actual Total Time": 1040.544,
  "Actual Rows": 3049406,
  "Actual Loops": 1,
  "Output": ["id_acte_entite", "fk_acte", "fk_rubrique", "fk_client", "fk_salarie", "fk_cycldet", "fk_planning", 
"act_user_modified", "fk_ssiad_visite", "fk_ssiad_visite_planning", "true", "acte_realise", "acte_realise_modifie", "ctid"],
  "Filter": "((t_acte_entite.fk_ssiad_visite_planning IS NOT NULL) OR 
(t_acte_entite.fk_ssiad_visite IS NOT NULL))",
  "Rows Removed by Filter": 477,
  "Shared Hit Blocks": 4334,
  "Shared Read Blocks": 53032,
  "Shared Dirtied Blocks": 0,
  "Shared Written Blocks": 14709,
  "Local Hit Blocks": 0,
  "Local Read Blocks": 0,
  "Local Dirtied Blocks": 0,
  "Local Written Blocks": 0,
  "Temp Rea

How to observe plan_cache_mode transition from custom to generic plan?

2021-09-05 Thread Richard Michael
TL;DR --

1/ My basic prepared statement doesn't seem to start out with a custom plan
(i.e., no parameter symbols in query text).

2/ EXPLAIN EXECUTE of prepared statement doesn't show query text, as the
documentation seems to imply.  Should it?

3/ How can I observe the effect of plan_cache_mode?
e.g., Possible plan change from custom to generic

Apologies if this belongs on -novice.


Details --

I am trying to understand `plan_cache_mode`, related to a report of a
performance problem (resolved by changing `auto` to
`force_custom_plan`).  I do not have access to the server or data in the
report, so I am trying to reproduce a similar situation myself.  I am
using PostgreSQL 12.

My first step is to observe the change in the plan, from custom to
generic, after the first five queries.  I have unsuccessfully tried to
do this with both `EXPLAIN` and the `auto_explain` extension.

I would appreciate help trying to understand what's happening and what
I've misunderstood.

(After I see the plan changing, I'll use more complicated data and
queries to investigate different performance scenarios. Eventually, I'm
interested in queries using the extended protocol. I'm unsure if
parse/bind/execute
will go through the same server code path as `PREPARE ... ; EXECUTE ..`.)

The `PREPARE` documentation
(https://www.postgresql.org/docs/12/sql-prepare.html) indicates:

 1/ Regarding `plan_cache_mode`, "... the first five executions are
 done with custom plans ..."

 2/ Regarding `EXPLAIN EXECUTE ...`, "To examine the query plan
 PostgreSQL is using for a prepared statement, use EXPLAIN (...).  If a
 generic plan is in use, it will contain parameter symbols $n, while a
 custom plan will have the supplied parameter values substituted into
 it."


Using psql, I tried preparing and explaining a very basic query: `SELECT
$1 AS data`.

  Note: Wireshark revealed psql uses simple queries (`PREPARE ...`,
  `EXPLAIN EXECUTE ...` are `Q` messages), not the extended protocol
  (i.e., parse, bind, execute).  I mention this because previous list
  posts mention some libpq prepared statement functions do not result in
  the planning heuristic used by `plan_cache_mode`.


Based on the documentation, I expected the first planned query text to
be: `SELECT 10 AS data`, since it should be a custom plan with
substituted values.  However, the query text always contains a parameter
symbol: `SELECT $1 AS data`.


My questions:


1/ Slightly related, the `EXPLAIN EXECUTE(...)` output does not show the
query text, as the documentation seems to suggest it should (so one may
look for parameter symbols).  Why not?

(Although, none of the documented EXPLAIN options mentions query text
display.  So perhaps it never does?)

  ```
  ~# PREPARE one_param AS SELECT $1 AS data;
  ~# EXPLAIN EXECUTE one_param(10);
QUERY PLAN
---
 Result  (cost=0.00..0.01 rows=1 width=32)
(1 row)
  ```

2/ The query text was logged by `auto_explain`.  However, it contains a
parameter symbol; why?  Also, why is the logged query a `PREPARE` statement
(as if a new prepared statement is being created), instead of only the
`SELECT ..` which was executed?

  ```
  LOG:  statement: EXPLAIN EXECUTE one_param(10);
  LOG:  duration: 0.000 ms  plan:
Query Text: PREPARE one_param AS SELECT $1 AS data;
Result  (cost=0.00..0.01 rows=1 width=32)
  Output: '10'::text
  ```


Let me know if I should post my postgresql.conf and `auto_explain` settings.

Thanks for any advice,
Richard


Re: update non-indexed value is slow if some non-related index/fk are enabled

2021-09-05 Thread David G. Johnston
On Friday, September 3, 2021, Philippe Doussot 
wrote:
>
> One UPDATE of one boolean value which is non-indexed take some time ( for
> many tuples .. 3 Millions ).
>
> But if we disable all indexes/fk the UPDATE is 10x faster and do 10x less
> shared hit .
>
> I don't understand why disabling all index from the table speed up the
> update because the boolean column is not indexed
>

Index entries point to physical records.  You just deleted one physical
record and added another.  The indexes need to be updated with that
information.

David J.


Re: update non-indexed value is slow if some non-related index/fk are enabled

2021-09-05 Thread Tom Lane
"David G. Johnston"  writes:
> On Friday, September 3, 2021, Philippe Doussot 
> wrote:
>> I don't understand why disabling all index from the table speed up the
>> update because the boolean column is not indexed

> Index entries point to physical records.  You just deleted one physical
> record and added another.  The indexes need to be updated with that
> information.

Yeah.  The OP's mental model is apparently update-in-place, but that's
not how Postgres does things.

The index-update overhead is avoided if the update is "HOT", which
requires that (a) no indexed column changes and (b) there is room
on the same page for the new copy of the row.  Ensuring (b) requires
running with a fairly low fill-factor, which bloats your table and
thereby creates its own costs.  Still, that might be worth doing
depending on your particular circumstances.

regards, tom lane




Re: How to observe plan_cache_mode transition from custom to generic plan?

2021-09-05 Thread Mladen Gogala



On 9/5/21 9:35 AM, Richard Michael wrote:

TL;DR --

1/ My basic prepared statement doesn't seem to start out with a custom 
plan

(i.e., no parameter symbols in query text).

2/ EXPLAIN EXECUTE of prepared statement doesn't show query text, as the
documentation seems to imply.  Should it?

3/ How can I observe the effect of plan_cache_mode?
    e.g., Possible plan change from custom to generic

Apologies if this belongs on -novice.


Details --

I am trying to understand `plan_cache_mode`, related to a report of a
performance problem (resolved by changing `auto` to
`force_custom_plan`).  I do not have access to the server or data in the
report, so I am trying to reproduce a similar situation myself. I am
using PostgreSQL 12.

My first step is to observe the change in the plan, from custom to
generic, after the first five queries.  I have unsuccessfully tried to
do this with both `EXPLAIN` and the `auto_explain` extension.

I would appreciate help trying to understand what's happening and what
I've misunderstood.

(After I see the plan changing, I'll use more complicated data and
queries to investigate different performance scenarios. Eventually, I'm
interested in queries using the extended protocol. I'm unsure if 
parse/bind/execute

will go through the same server code path as `PREPARE ... ; EXECUTE ..`.)

The `PREPARE` documentation
(https://www.postgresql.org/docs/12/sql-prepare.html 
) indicates:


 1/ Regarding `plan_cache_mode`, "... the first five executions are
 done with custom plans ..."

 2/ Regarding `EXPLAIN EXECUTE ...`, "To examine the query plan
 PostgreSQL is using for a prepared statement, use EXPLAIN (...).  If a
 generic plan is in use, it will contain parameter symbols $n, while a
 custom plan will have the supplied parameter values substituted into
 it."


Using psql, I tried preparing and explaining a very basic query: `SELECT
$1 AS data`.

  Note: Wireshark revealed psql uses simple queries (`PREPARE ...`,
  `EXPLAIN EXECUTE ...` are `Q` messages), not the extended protocol
  (i.e., parse, bind, execute).  I mention this because previous list
  posts mention some libpq prepared statement functions do not result in
  the planning heuristic used by `plan_cache_mode`.


Based on the documentation, I expected the first planned query text to
be: `SELECT 10 AS data`, since it should be a custom plan with
substituted values.  However, the query text always contains a parameter
symbol: `SELECT $1 AS data`.


My questions:


1/ Slightly related, the `EXPLAIN EXECUTE(...)` output does not show the
query text, as the documentation seems to suggest it should (so one may
look for parameter symbols).  Why not?

(Although, none of the documented EXPLAIN options mentions query text
display.  So perhaps it never does?)

  ```
  ~# PREPARE one_param AS SELECT $1 AS data;
  ~# EXPLAIN EXECUTE one_param(10);
                    QUERY PLAN
    ---
     Result  (cost=0.00..0.01 rows=1 width=32)
    (1 row)
  ```

2/ The query text was logged by `auto_explain`.  However, it contains a
parameter symbol; why?  Also, why is the logged query a `PREPARE` 
statement

(as if a new prepared statement is being created), instead of only the
`SELECT ..` which was executed?

  ```
  LOG:  statement: EXPLAIN EXECUTE one_param(10);
  LOG:  duration: 0.000 ms  plan:
        Query Text: PREPARE one_param AS SELECT $1 AS data;
        Result  (cost=0.00..0.01 rows=1 width=32)
          Output: '10'::text
  ```


Let me know if I should post my postgresql.conf and `auto_explain` 
settings.


Thanks for any advice,
Richard



Well, some other databases that shall remain unnamed have a thing called 
"bind variable peeking". Essentially, the database instance uses the 
first set of bind variables to create the execution plan. That is far 
from optimal because the first set of the bind variables encountered by 
the instance may not be representative of  what the vast majority of the 
work is comprised of. That particular database also has tricks like 
dynamic statistics sampling, cardinality feedback and adaptive plans (if 
the database finds a cheaper plan, it can substitute it on the go). All 
of that mess results in high degree of unpredictability. With that 
particular database, plan stability is like El Dorado: everybody is 
looking for it but nobody can find it. There are outlines, baselines and 
SQL patches, none of which is quite satisfactory.


Postgres has a different method: it executes the same statement with 5 
different sets of bind variables and if it finds a plan that is cheaper 
than the generic plan, it caches it and uses it in the future. Caching 
is pertinent for one session only. There is no cursor sharing between 
sessions. If it doesn't find such a plan, it uses a generic plan with 
hard coded values instead of the bind values and table statistics. 
People switching from Oracle, me being one of those, frequen

Re: How to observe plan_cache_mode transition from custom to generic plan?

2021-09-05 Thread David G. Johnston
On Sunday, September 5, 2021, Richard Michael 
wrote:

>
> Based on the documentation, I expected the first planned query text to
> be: `SELECT 10 AS data`, since it should be a custom plan with
> substituted values.  However, the query text always contains a parameter
> symbol: `SELECT $1 AS data`.
>
>
A query plan is not the same as the query text.  While the executed plan
can be generic or custom the query text is constant.

If you want to see the difference between a generic and a custom plan you
need to comprise a query that would actually have different custom and
generic plans.  Queries that don’t involve tables, indexes, or joins don’t
have any choices to make with respect to how they are executed.

David J.


RE: vacuum full

2021-09-05 Thread Ian Dauncey
Morning.

Thanks for all the replies.

What I did to remove these files was to backup of the DB, drop the DB and then 
I restored the DB.

Regards
Ian


From: Vijaykumar Jain 
Sent: Monday, 30 August 2021 20:06
To: pgsql-general 
Subject: Re: vacuum full

External email - treat with caution
On Mon, 30 Aug 2021 at 23:12, Vijaykumar Jain 
mailto:vijaykumarjain.git...@gmail.com>> wrote:
Just keeping it in a separate email, incase this is thrashed down.
vacuum full has a lot of problem stories, not just because the db gets locked, 
but also because it is mostly (mis)used when there are space issues.


ok ignore.
I think querying the disk for available space may be shell access from the 
client, a security issue.
Also, a 10GB table with all dead tuples is 100% bloat, and would not really 
need additional space for table rebuilds.
I think we can just put out some queries like bloat estimation  of relations, 
to get an idea of wasted space and used space
and estimate based on that on how much would be taken up from the disk before 
being cleaned up.

ref queries: PostgresQL Automating VACUUM FULL for bloated tables - Stack 
Overflow



--
Thanks,
Vijay
Mumbai, India

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast, a leader in email security and cyber 
resilience. Mimecast integrates email defenses with brand protection, security 
awareness training, web security, compliance and other essential capabilities. 
Mimecast helps protect large and small organizations from malicious activity, 
human error and technology failure; and to lead the movement toward building a 
more resilient world. To find out more, visit our website.


Re: How to observe plan_cache_mode transition from custom to generic plan?

2021-09-05 Thread Laurenz Albe
On Sun, 2021-09-05 at 14:20 -0400, Mladen Gogala wrote:
> On 9/5/21 9:35 AM, Richard Michael wrote:
> > TL;DR --
> > 
> > 1/ My basic prepared statement doesn't seem to start out with a custom 
> > plan
> > (i.e., no parameter symbols in query text).

That is notmal for the first five executions.  PostgreSQL will consider a
generic plan only from the sixth execution on.

> > 2/ EXPLAIN EXECUTE of prepared statement doesn't show query text, as the
> > documentation seems to imply.  Should it?

Yes, and it does for me - see below.

> > 3/ How can I observe the effect of plan_cache_mode?
> >     e.g., Possible plan change from custom to generic

By checking for the presence of $1, $2 and the like in the EXPLAIN output.


Here is a sample session that shows all that in action:


PREPARE stmt(integer) AS SELECT * FROM jobs WHERE job_id = $1;

EXPLAIN EXECUTE stmt(1);

 QUERY PLAN  
═
 Seq Scan on jobs  (cost=0.00..1.04 rows=1 width=10)
   Filter: (job_id = 1)
(2 rows)

EXPLAIN EXECUTE stmt(2);

 QUERY PLAN  
═
 Seq Scan on jobs  (cost=0.00..1.04 rows=1 width=10)
   Filter: (job_id = 2)
(2 rows)

EXPLAIN EXECUTE stmt(3);

 QUERY PLAN  
═
 Seq Scan on jobs  (cost=0.00..1.04 rows=1 width=10)
   Filter: (job_id = 3)
(2 rows)

EXPLAIN EXECUTE stmt(4);

 QUERY PLAN  
═
 Seq Scan on jobs  (cost=0.00..1.04 rows=1 width=10)
   Filter: (job_id = 4)
(2 rows)

EXPLAIN EXECUTE stmt(5);

 QUERY PLAN  
═
 Seq Scan on jobs  (cost=0.00..1.04 rows=1 width=10)
   Filter: (job_id = 5)
(2 rows)

EXPLAIN EXECUTE stmt(6);

 QUERY PLAN  
═
 Seq Scan on jobs  (cost=0.00..1.04 rows=1 width=10)
   Filter: (job_id = $1)
(2 rows)

As you see, the sixth execution uses a generic plan.

> > 2/ The query text was logged by `auto_explain`.  However, it contains a
> > parameter symbol; why?  Also, why is the logged query a `PREPARE` 
> > statement
> > (as if a new prepared statement is being created), instead of only the
> > `SELECT ..` which was executed?

You saw the log entry for the PREPARE statement. Since your statement
contains $1 in its query text, you'll get that logged.

Look for the log entries for the EXECUTEs.

By the way, here is one of the shortcomings of using prepared statements
in my opinion: if EXECUTE is slow, you will get the EXECUTE logged, but
not the statement text of the prepared statements.

> Well, some other databases that shall remain unnamed have a thing called 
> "bind variable peeking". [more off-topic things skipped]

Most of your explanations about PostgreSQL are correct, but:

> Postgres has a different method: it executes the same statement with 5 
> different sets of bind variables and if it finds a plan that is cheaper 
> than the generic plan, it caches it and uses it in the future.

That is wrong: If the estimate for the generic plan is more expensive
than the plans for the first five executions, it will keep generating a
custom plan for subsequent executions of the prepared statement.

> People switching from Oracle, me being one of those, frequently make 
> mistake of using bind variables in Postgres.

"Bind variables" just being an Oraclism for parameters, it is *not* a
mistake to use them in PostgreSQL.

Yours,
Laurenz Albe