received immediate shutdown request caused cluster failover

2020-11-17 Thread Yi Sun
Hi all

There are 3 nodes in our prd db in patroni cluster, vm01 is leader, vm02
and vm03 are standby, vm01 received immediate shutdown request  caused
failover to vm02, after that vm03 received fast shutdown request

As vm03 not in cluster so have to reinit vm03

What's the possible root caused vm01 received immediate shutdown request
please?

and What's the possible root caused vm03 received fast shutdown request

Below are detail info, please check, if need any other info, I will
provide, thanks

postgresql version: 11.4
OS: centos 7
HA: patroni upgraded from 1.6.3 to 2.0.1 these days

--vm01 pg log

[2020-11-16 12:03:56.592 UTC] p=31485:3@ c=
pgwatch2@127.0.0.1/eu4_baas_bckp_epmgr:pgwatch2 LOG:  disconnection:
session time: 0:00:00.037 user=pgwatch2 database=eu4_baas_bckp_epmgr
host=127.0.0.1 port=36916
[2020-11-16 12:03:57.005 UTC] p=32103:14@ c=@/: LOG:  received immediate
shutdown request
[2020-11-16 12:03:57.017 UTC] p=31189:3@314/0 c=
atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] WARNING:
 terminating connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31189:4@314/0 c=
atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] DETAIL:
 The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31189:5@314/0 c=
atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
[2020-11-16 12:03:57.017 UTC] p=31183:3@313/0 c=
atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] WARNING:  terminating
connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31183:4@313/0 c=
atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31183:5@313/0 c=
atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] HINT:  In a moment you should
be able to reconnect to the database and repeat your command.
[2020-11-16 12:03:57.017 UTC] p=31182:3@310/281059 c=
bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] WARNING:
 terminating connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31182:4@310/281059 c=
bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31182:5@310/281059 c=
bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] HINT:  In a moment
you should be able to reconnect to the database and repeat your command.

--vm01 patroni log
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
"2020-11-16 12:03:56,922", "name": "patroni.dcs.consul", "process": 32085,
"thread": 140350381303616, "level": "WARNING", "message": "Could not
register service: unknown role type promoted"}
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
"2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085,
"thread": 140350381303616, "level": "INFO", "message": "Lock owner:
eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
"2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085,
"thread": 140350381303616, "level": "INFO", "message": "does not have lock"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
"2020-11-16 12:03:59,158", "name": "patroni.dcs.consul", "process": 32085,
"thread": 140350381303616, "level": "INFO", "message": "Deregister service
pgcluster11/eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
"2020-11-16 12:03:59,180", "name": "patroni", "process": 32085, "thread":
140350381303616, "level": "INFO", "message": "demoting self because i do
not have the lock and i was a leader"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
"2020-11-16 12:03:59,181", "name": "patroni", "process": 32085, "thread":
140350381303616, "level": "WARNING", "message": "Loop time exceeded,
rescheduling immediately."}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
"2020-11-16 12:03:59,184", "name": "patroni.ha", "process": 32085,
"thread": 140350381303616, "level": "INFO", "message": "Lock owner:
eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
"2020-11-16 12:03:59,185", "name": "patroni", "process": 32085, "thread":
140350381303616, "level": "INFO", "message": "starting after demotion in
progress"}
Nov 16 12:03:59 eu4-b

Re: received immediate shutdown request caused cluster failover

2020-11-19 Thread Yi Sun
Hi guys,

Besides command run(like pg_ctl) can cause "received immediate shutdown
request"  any other reason can cause this please?

This production DB, support colleague said didn't run it

Yi Sun  于2020年11月18日周三 上午11:54写道:

> Hi all
>
> There are 3 nodes in our prd db in patroni cluster, vm01 is leader, vm02
> and vm03 are standby, vm01 received immediate shutdown request  caused
> failover to vm02, after that vm03 received fast shutdown request
>
> As vm03 not in cluster so have to reinit vm03
>
> What's the possible root caused vm01 received immediate shutdown request
> please?
>
> and What's the possible root caused vm03 received fast shutdown request
>
> Below are detail info, please check, if need any other info, I will
> provide, thanks
>
> postgresql version: 11.4
> OS: centos 7
> HA: patroni upgraded from 1.6.3 to 2.0.1 these days
>
> --vm01 pg log
>
> [2020-11-16 12:03:56.592 UTC] p=31485:3@ c=
> pgwatch2@127.0.0.1/eu4_baas_bckp_epmgr:pgwatch2 LOG:  disconnection:
> session time: 0:00:00.037 user=pgwatch2 database=eu4_baas_bckp_epmgr
> host=127.0.0.1 port=36916
> [2020-11-16 12:03:57.005 UTC] p=32103:14@ c=@/: LOG:  received immediate
> shutdown request
> [2020-11-16 12:03:57.017 UTC] p=31189:3@314/0 c=
> atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown]
> <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D>
> WARNING:  terminating connection because of crash of another server process
> [2020-11-16 12:03:57.017 UTC] p=31189:4@314/0 c=
> atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown]
> <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D>
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> [2020-11-16 12:03:57.017 UTC] p=31189:5@314/0 c=
> atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown]
> <http://atp_si_user@10.253.85.203/eu4_baas_software_inventory:%5Bunknown%5D>
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> [2020-11-16 12:03:57.017 UTC] p=31183:3@313/0 c=
> atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown]
> <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> WARNING:
>  terminating connection because of crash of another server process
> [2020-11-16 12:03:57.017 UTC] p=31183:4@313/0 c=
> atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown]
> <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process exited abnormally and
> possibly corrupted shared memory.
> [2020-11-16 12:03:57.017 UTC] p=31183:5@313/0 c=
> atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown]
> <http://atp_pm@10.253.86.62/eu4_baas_atp_pm:%5Bunknown%5D> HINT:  In a
> moment you should be able to reconnect to the database and repeat your
> command.
> [2020-11-16 12:03:57.017 UTC] p=31182:3@310/281059 c=
> bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown]
> <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D>
> WARNING:  terminating connection because of crash of another server process
> [2020-11-16 12:03:57.017 UTC] p=31182:4@310/281059 c=
> bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown]
> <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D>
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> [2020-11-16 12:03:57.017 UTC] p=31182:5@310/281059 c=
> bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown]
> <http://bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:%5Bunknown%5D> HINT:
>  In a moment you should be able to reconnect to the database and repeat
> your command.
>
> --vm01 patroni log
> Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:56,922", "name": "patroni.dcs.consul", "process": 32085,
> "thread": 140350381303616, "level": "WARNING", "message": "Could not
> register service: unknown role type promoted"}
> Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time":
> "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085,
> "thread": 140350381303616, "level": "INFO", "message": "Lock owner:
> eu4-baas-patroni-cluster-vm02; I am eu4-baas-patro

Re: received immediate shutdown request caused cluster failover

2020-11-20 Thread Yi Sun
Hello,

Thank you for your reply

Patroni replied this:

"It seems your system is under so much stress that there was no resources
for Patroni to execute HA loop for 35 seconds.
This interval exceeds ttl=30s, therefore the leader key expired, Patroni
noticed it and demoted Postgres.

You need to figure out what is going on with your system, and what is the
reason for cpu/memory pressure. Ideally fix these issues."

As company hundreds of clusters use ansible deployments use same
parameters, change parameters for 1 cluster is difficult

I just think maybe can get top sql from pg_stat_statements as below then
analyse and tuning

Is it correct direction? Any suggestions please, thanks

1 time IO SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by
(blk_read_time+blk_write_time)/calls desc limit 5;

total IO SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by
(blk_read_time+blk_write_time) desc limit 5;

1 time long SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by
mean_time desc limit 5;

total time long SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by
total_time desc limit 5;

average time long SQL TOP 5
select calls, total_time/calls AS avg_time, left(query,80) from
pg_stat_statements order by 2 desc limit 5;

stddev time SQL
select userid::regrole, dbid, query from pg_stat_statements order by
stddev_time desc limit 5;

share block SQL
select userid::regrole, dbid, query from pg_stat_statements order by
(shared_blks_hit+shared_blks_dirtied) desc limit 5;

temp blk SQL
select userid::regrole, dbid, query from pg_stat_statements order by
temp_blks_written desc limit 5;


Tom Lane  于2020年11月20日周五 下午2:17写道:

> Yi Sun  writes:
> > Besides command run(like pg_ctl) can cause "received immediate shutdown
> > request"  any other reason can cause this please?
>
> That message indicates that something sent the postmaster process a
> SIGQUIT signal (which is all that "pg_ctl stop -m immediate" does).
> There's no speculation to that: a look at postmaster.c will convince
> you that there is no other way to reach that message.  So you need
> to be looking for things that would be sending SIGQUIT unexpectedly.
>
> I don't know much about Patroni, but maybe something in that
> environment thinks that SIGQUIT'ing random processes is a good
> thing to do.
>
> regards, tom lane
>


Batch update million records in prd DB

2021-02-24 Thread Yi Sun
Hello,

Now need to update several million records in a table in prd DB, if can use
batch update 1000 records and commit each time, if it will affect prd
application like below sample script please?

Sample script:

DO $MAIN$
DECLARE
  affect_count integer;
  chunk_size CONSTANT integer :=1000;
  sleep_sec CONSTANT numeric :=0.1;
BEGIN

  loop

exit when affect_count=0;

  UPDATE tbl a
  SET name = ''
  WHERE a.id IN (SELECT id
   FROM tbl b
   WHERE name IS NULL
   LIMIT chunk_size);

  GET DIAGNOSTICS affect_count = ROW_COUNT;

  commit;

  PERFORM pg_sleep(sleep_sec);

  end loop;
END;
$MAIN$;

Thanks and best regards


Re: Batch update million records in prd DB

2021-02-25 Thread Yi Sun
Hi Michael,

Thank you for your reply

We found that each loop take time is different, it will become slower and
slower, as our table is big table and join other table, even using index
the last 1000 records take around 15 seconds, will it be a problem? Will
other concurrent update have to wait for 15 second until lock release?

Thanks and best regards

Michael Lewis  于2021年2月24日周三 下午11:47写道:

> Of course it will impact a system using that table, but not significant I
> expect and the production system should handle it. If you are committing
> like this, then you can kill the script at any time and not lose any work.
> The query to find the next IDs to update is probably the slowest part of
> this depending on what indexes you have.
>


Re: Batch update million records in prd DB

2021-03-01 Thread Yi Sun
Hi Michael

This is the script and explain plan info, please check, seems  Filter
remove more records took more time

DO $MAIN$
DECLARE
affect_count integer := 1000;
processed_row_count integer := 0;
BEGIN
LOOP
exit
WHEN affect_count = 0;
UPDATE
app gaa
SET
deleted_at = (
SELECT
CAST(extract(epoch FROM now() at time zone 'utc') *
10 AS bigint))
WHERE
gaa.id IN (
SELECT
gab.id
FROM
app gab
LEFT JOIN pol gp ON gab.policy_id = gp.id
WHERE
gab.policy_type = 'policy.protection.total'
AND gp.name LIKE 'Mobile backup%'
AND gab.deleted_at IS NULL
AND gp.deleted_at IS NOT NULL
LIMIT 1000);
GET DIAGNOSTICS affect_count = ROW_COUNT;
COMMIT;
processed_row_count = processed_row_count + affect_count;
END LOOP;
RAISE NOTICE 'total processed rows %', processed_row_count;
END;
$MAIN$;

--early explain plan, 1000 records update take 156.488 ms

 Update on app gaa  (cost=3307.57..6085.41 rows=1000 width=3943) (actual
time=156.347..156.347 rows=0 loops=1)
   InitPlan 1 (returns $0)
 ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.008
rows=1 loops=1)
   ->  Nested Loop  (cost=3307.54..6085.39 rows=1000 width=3943) (actual
time=18.599..33.987 rows=1000 loops=1)
 ->  HashAggregate  (cost=3306.99..3316.99 rows=1000 width=98)
(actual time=18.554..19.085 rows=1000 loops=1)
   Group Key: ("ANY_subquery".id)::text
   ->  Subquery Scan on "ANY_subquery"  (cost=2.17..3304.49
rows=1000 width=98) (actual time=0.041..18.052 rows=1000 loops=1)
 ->  Limit  (cost=2.17..3294.49 rows=1000 width=37)
(actual time=0.030..17.827 rows=1000 loops=1)
   ->  Merge Join  (cost=2.17..877396.03
rows=266497 width=37) (actual time=0.029..17.764 rows=1000 loops=1)
 Merge Cond: ((gab.policy_id)::text = (gp.id
)::text)
 ->  Index Scan using
tmp_uq_policy_id_context2 on app gab  (cost=0.56..487631.06 rows=3151167
width=74) (actual time=0.018..9.192 rows=3542 loops=1)
   Filter: ((policy_type)::text =
'policy.protection.total'::text)
   Rows Removed by Filter: 2064
 ->  Index Scan using pol_pkey on pol gp
 (cost=0.56..378322.78 rows=361105 width=37) (actual time=0.008..7.380
rows=1006 loops=1)
   Filter: ((deleted_at IS NOT NULL)
AND (name ~~ 'Mobile backup%'::text))
   Rows Removed by Filter: 3502
 ->  Index Scan using app2_pkey on app gaa  (cost=0.56..2.77 rows=1
width=3874) (actual time=0.014..0.014 rows=1 loops=1000)
   Index Cond: ((id)::text = ("ANY_subquery".id)::text)
 Planning Time: 0.852 ms
 Execution Time: 156.488 ms

--later explain plan, 1000 records update take 13301.600 ms
--
 Update on app gaa  (cost=3789.35..6567.19 rows=1000 width=3980) (actual
time=13301.466..13301.466 rows=0 loops=1)
   InitPlan 1 (returns $0)
 ->  Result  (cost=0.00..0.02 rows=1 width=8) (actual time=0.007..0.007
rows=1 loops=1)
   ->  Nested Loop  (cost=3789.32..6567.17 rows=1000 width=3980) (actual
time=12881.004..12896.440 rows=1000 loops=1)
 ->  HashAggregate  (cost=3788.77..3798.77 rows=1000 width=98)
(actual time=12880.958..12881.378 rows=1000 loops=1)
   Group Key: ("ANY_subquery".id)::text
   ->  Subquery Scan on "ANY_subquery"  (cost=2.17..3786.27
rows=1000 width=98) (actual time=12850.663..12880.505 rows=1000 loops=1)
 ->  Limit  (cost=2.17..3776.27 rows=1000 width=37)
(actual time=12850.656..12880.233 rows=1000 loops=1)
   ->  Merge Join  (cost=2.17..862421.74
rows=228510 width=37) (actual time=12850.655..12880.162 rows=1000 loops=1)
 Merge Cond: ((gab.policy_id)::text = (gp.id
)::text)
 ->  Index Scan using
tmp_uq_policy_id_context2 on app gab  (cost=0.56..474159.31 rows=2701994
width=74) (actual time=0.017..6054.269 rows=2302988 loops=1)
   Filter: ((policy_type)::text =
'policy.protection.total'::text)
   Rows Removed by Filter: 1822946
 ->  Index Scan using pol_pkey on pol gp
 (cost=0.56..378322.78 rows=361105 width=37) (actual time=0.007..5976.346
rows=936686 loops=1)
   Filter: ((deleted_at IS NOT NULL)
AND (name ~~ 'Mobile backup%'

Re: Batch update million records in prd DB

2021-03-02 Thread Yi Sun
Hi Michael,

Thank you, after create index to the temp table column, time cost become
smaller

Michael Lewis  于2021年3月2日周二 上午12:08写道:

> 1) Don't pretend it is a left join when your where clause will turn it
> into an INNER join.
> LEFT JOIN pol gp ON gab.policy_id = gp.id
> WHERE
>
> *AND gp.name  LIKE 'Mobile backup%'
> AND gp.deleted_at IS NOT NULL;*
>
> 2) It is interesting to me that the row estimates are stable, but the
> number of rows filtered out and that are found by those two index
> scans changes so dramatically. Is your underlying data changing
> significantly during this run? Maybe I am not seeing something that should
> be obvious.
>
> 3) What is the execution plan for the update based on the temp table? It
> is hard to believe it takes 2 seconds to update 1000 rows. By the way, that
> temp table needs to be analyzed after it is created & populated with data,
> or the planner won't know how many rows it contains or any other stats
> about it. One advantage of the temp table should be that you have already
> found all the candidate rows and so the time that locks are held to update
> the 1000 target rows is smaller. Given you are doing a order by & limit in
> the use of the temp table, I might actually create an index on the id
> column to help the later runs. The temp table should likely remain in
> memory (temp_buffers) but still, btree is nice for ordered use.
>
>>


Re: Batch update million records in prd DB

2021-03-02 Thread Yi Sun
Hi Kristjan,

Thank you for this information.

"postgres the memory is slowly eaten away when doing updates within plsql
loop" for this memory issue, I want to check if it exists in our current
postgresql version. And let developer change to use python for loop also
need to show them the proof, how to reproduce and check the memory slowly
eaten away please? OS level cmd or psql cmd to verify? thanks

Kristjan Mustkivi  于2021年3月2日周二 下午5:26写道:

> Hi Yi,
>
> I found that in postgres the memory is slowly eaten away when doing
> updates within plsql loop. It only gets released once the whole block
> completes. While it is ok for small tables you will eventually run out
> of memory for really big ones. The working approach was to do the loop
> in e.g a python script that called the DML statements and also called
> commit. Several million rows is fortunately relatively small number to
> update but once you get to billions this approach would not likely
> work. Note that after each batch you also should call VACUUM before
> starting a new one to avoid significant table bloat.
>
> BR,
>
> Kristjan
>
> On Wed, Feb 24, 2021 at 3:01 PM Yi Sun  wrote:
> >
> > Hello,
> >
> > Now need to update several million records in a table in prd DB, if can
> use batch update 1000 records and commit each time, if it will affect prd
> application like below sample script please?
> >
> > Sample script:
> >
> > DO $MAIN$
> > DECLARE
> >   affect_count integer;
> >   chunk_size CONSTANT integer :=1000;
> >   sleep_sec CONSTANT numeric :=0.1;
> > BEGIN
> >
> >   loop
> >
> > exit when affect_count=0;
> >
> >   UPDATE tbl a
> >   SET name = ''
> >   WHERE a.id IN (SELECT id
> >FROM tbl b
> >WHERE name IS NULL
> >LIMIT chunk_size);
> >
> >   GET DIAGNOSTICS affect_count = ROW_COUNT;
> >
> >   commit;
> >
> >   PERFORM pg_sleep(sleep_sec);
> >
> >   end loop;
> > END;
> > $MAIN$;
> >
> > Thanks and best regards
>
>
>
> --
> Kristjan Mustkivi
>
> Email: kristjan.mustk...@gmail.com
>


How to check if checkpoint is finished in sql script?

2022-09-05 Thread Yi Sun
Hello all,

We want to restart postgresql 3 nodes(2 replica nodes) by ansible as below
steps:
1. Restart 2 replica nodes one by one
2. Run checkpoint in the leader node
3. Once checkpoint finished, restart the leader node

How to check if the checkpoint is finished in sql script please? We know
that the log file will show it, but we want to check it in sql then can
easily be used by ansible, thanks

Best regards
Oliver Sun


How to know how much CPU, RAM is used by existing 1 database

2022-10-25 Thread Yi Sun
Hi,

There are many databases in our production patroni cluster and it seems it
is overloaded, so we decide to migrate the busiest database to a new
patroni cluster.

pgwatch2 is implemented, how to know how much CPU, RAM is used by the
database please? Then we can use it to prepare the new patroni cluster
hardware. Thank you

Best regards
Dennis


Re: How to know how much CPU, RAM is used by existing 1 database

2022-10-25 Thread Yi Sun
Hi Guys,

Who can help me with this please? I researched but still no result yet,
thank you

On Tue, 25 Oct 2022 at 16:30, Yi Sun  wrote:

> Hi,
>
> There are many databases in our production patroni cluster and it seems it
> is overloaded, so we decide to migrate the busiest database to a new
> patroni cluster.
>
> pgwatch2 is implemented, how to know how much CPU, RAM is used by the
> database please? Then we can use it to prepare the new patroni cluster
> hardware. Thank you
>
> Best regards
> Dennis
>


Re: How to know how much CPU, RAM is used by existing 1 database

2022-10-26 Thread Yi Sun
On Wed, 26 Oct 2022 at 18:10, jian he  wrote:

>
>
> On Wed, Oct 26, 2022 at 11:07 AM Yi Sun  wrote:
>
>> Hi Guys,
>>
>> Who can help me with this please? I researched but still no result yet,
>> thank you
>>
>> On Tue, 25 Oct 2022 at 16:30, Yi Sun  wrote:
>>
>>> Hi,
>>>
>>> There are many databases in our production patroni cluster and it seems
>>> it is overloaded, so we decide to migrate the busiest database to a new
>>> patroni cluster.
>>>
>>> pgwatch2 is implemented, how to know how much CPU, RAM is used by the
>>> database please? Then we can use it to prepare the new patroni cluster
>>> hardware. Thank you
>>>
>>> Best regards
>>> Dennis
>>>
>>
> manual:
> https://www.postgresql.org/docs/current/runtime-config-resource.html
> except min_dynamic_shared_memory (integer)
> all other parameters are used to cap the memory. almost all parameters
> mentioned "database server" which means it's on cluster level.
>
>
> --
>  I recommend David Deutsch's <>
>
>   Jian
>
>
>
 Hi Jian he

Thank you for your reply

The parameters are on cluster level, so we still can not know how much
memory is used in a specific database, for example, total memory is 64GB
1. How to get how much memory is used on cluster level? For example 40GB
2. How to get how much memory is used in a specific database? For example
30GB, then we can prepare the new patroni cluster 32GB is enough

Thank you
Dennis


pg_wal directory max size

2022-12-21 Thread Yi Sun
Hello guys,

We are planning the server disk space, pg_wal directory max size is wal
file size*wal_keep_segments? or is it also decided by other parameters
please? We tried to search for this, but could not find the answer

For example our postgresql is 9.6 below parameters value, is the pg_wal
directory max size 320*16MB please? Thank you
wal file size 16MB
wal_keep_segments = 320
min_wal_size = 1GB
max_wal_size = 2GB

Best Regards
Dennis


How to grant read only functions execute permission to read only user

2023-07-17 Thread Yi Sun
Hello guys,

Our read only user is okay to be granted read only permission of tables.

How to grant read only functions execute permission to read only user, is
there a simple way to do it please? If not, how to get the list of read
only functions please? Then can grant one by one based on the list, thanks

Best regards
Dennis Sun


postgresql custom variable in pg_settings table

2024-01-08 Thread Yi Sun
Hello,

We custom set variable

Added patroni.nodes_count = 2 in postgresql.conf

postgres=# show patroni.nodes_count;
 patroni.nodes_count
-
 2
(1 row)

postgres=# select current_setting('patroni.nodes_count');
 current_setting
-
 2
(1 row)

But can not select it from pg_settings, as we use pgwatch2 to monitor, to
avoid pgwatch2 script change prefer to use pg_setting not current_setting()
function, is it possible to get the custom variable from pg_setting please?
Thanks

Best Regards
SY


"index contains unexpected zero page" problem

2021-04-08 Thread Yi Sun
Hi guys,

We face "index contains unexpected zero page" problem in prd postgresql 11
environment

This url shows that need to backup database firstly, is it necessary?
https://cloudblue.freshdesk.com/support/solutions/articles/44001889599-error-index-tablename-contains-unexpected-zero-page

If "create index concurrently" create a new index and drop the old broken
index is a solution to avoid table lock please? Thanks

Best Regards
Dennis


postgresql version 13 repo question

2021-06-29 Thread Yi Sun
Hello,

As our env os version is different, some is centos 7.4, some is 7.5 and 7.6
..., and there is only one company repo, as I compare the packages size and
date, seems same, If we can just use 7.6 packages please?

https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7.4-x86_64/
https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7.5-x86_64/
https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7.6-x86_64/

Thank you


Re: postgresql version 13 repo question

2021-07-04 Thread Yi Sun
Yi Sun  于2021年6月30日周三 下午2:33写道:

> Hello,
>
> As our env os version is different, some is centos 7.4, some is 7.5 and
> 7.6 ..., and there is only one company repo, as I compare the packages size
> and date, seems same, If we can just use 7.6 packages please?
>
> https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7.4-x86_64/
> https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7.5-x86_64/
> https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-7.6-x86_64/
>
> Thank you
>

Hi guys

Seems they are same, so If the packages in version 7 are same can use any
one please?


pg_upgrade problem as locale difference in data centers

2021-09-16 Thread Yi Sun
Hello,

We want to upgrade some PG9.6 DB to PG13, the databases locale are
different in data centers, some are C, some are ru_RU.UTF-8 and so on... as
below

postgres=# select datname,datcollate,datctype from pg_database;
datname| datcollate  |  datctype
---+-+-
 postgres  | C   | C
 template1 | C   | C
 template0 | C   | C
 aaa_service  | C   | C

postgres=# select datname,datcollate,datctype from pg_database;
datname | datcollate  |  datctype
+-+-
 postgres   | ru_RU.UTF-8 | ru_RU.UTF-8
 template1  | ru_RU.UTF-8 | ru_RU.UTF-8
 template0  | ru_RU.UTF-8 | ru_RU.UTF-8
 bbb_service   | ru_RU.UTF-8 | ru_RU.UTF-8

We use 1 ansible script to proceed the upgrade in data centers, as
pg_upgrade will only work once postgres, template1, template0 3 databases
locale are same between PG9.6 and PG13, so we test before pg_upgrade,
update the 3 databases locale to 'en_US.UTF-8' both in PG9.6 and PG13 as
below script

update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname in ('postgres','template1','template0') and
(datcollate<>'en_US.UTF-8' or datctype<>'en_US.UTF-8')

Then test pg_upgrade completed with no error, but as pg documentation says
like"collation can't be changed after database creation. Only drop and
recreate", the question is if we can update and upgrade this way, will it
have problems?

Thanks and best regards
Sun Yi


Re: pg_upgrade problem as locale difference in data centers

2021-09-19 Thread Yi Sun
As we use ansible to deploy the upgrade, so mentioned the data centers
situation. The PostgreSQL is single node and the upgrade will be in the
same data center and same Linux server(Centos 7), just will run the ansible
to upgrade PG in all Data centers.

For example, in our aaa data center

postgres=# select datname,datcollate,datctype from pg_database;
datname| datcollate  |  datctype
---+-+-
 postgres  |  ru_RU.UTF-8|  ru_RU.UTF-8
 template1 |  ru_RU.UTF-8 |  ru_RU.UTF-8
 template0 |  ru_RU.UTF-8 |  ru_RU.UTF-8
 aaa_service  |  ru_RU.UTF-8 | ru_RU.UTF-8

we test before pg_upgrade, update the postgres, template1, template0 3
databases locale to 'en_US.UTF-8' both in PG9.6 and PG13 as below script

update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname in ('postgres','template1','template0') and
(datcollate<>'en_US.UTF-8' or datctype<>'en_US.UTF-8')

Then test pg_upgrade completed with no error and Locale is like below:

postgres=# select datname,datcollate,datctype from pg_database;
datname| datcollate  |  datctype
---+-+-
 postgres   |  en_US.UTF-8 |  en_US.UTF-8
 template1 |  en_US.UTF-8 |  en_US.UTF-8
 template0 |  en_US.UTF-8 |  en_US.UTF-8
 aaa_service  |  ru_RU.UTF-8 |  ru_RU.UTF-8

The aaa_service application database Locale is no change, and there is no
user-defined table or index in postgres, template1, template0 3 databases.
So the question is like our case, if update the postgres, template1,
template0 3 databases locale and upgrade this way has problems? If we have
to get away with it and choose pg_dump-and-restore or pglogical? Thanks

Thanks and best regards
Sun Yi


Re: pg_upgrade problem as locale difference in data centers

2021-09-26 Thread Yi Sun
Hi Tom,

Thank you for your help.

As we use ansible to deploy the upgrade, so mentioned the data centers
situation. The PostgreSQL is single node and the upgrade will be in the
same data center and same Linux server(Centos 7), just will run the ansible
to upgrade PG in all Data centers.

For example, in our aaa data center

postgres=# select datname,datcollate,datctype from pg_database;
datname| datcollate  |  datctype
---+-+-
 postgres   |  ru_RU.UTF-8 |  ru_RU.UTF-8
 template1 |  ru_RU.UTF-8 |  ru_RU.UTF-8
 template0 |  ru_RU.UTF-8 |  ru_RU.UTF-8
 aaa_service  |  ru_RU.UTF-8 |  ru_RU.UTF-8

we test before pg_upgrade, update the postgres, template1, template0 3
databases locale to 'en_US.UTF-8' both in PG9.6 and PG13 as below script

update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
where datname in ('postgres','template1','template0') and
(datcollate<>'en_US.UTF-8' or datctype<>'en_US.UTF-8')

Then test pg_upgrade completed with no error and Locale is like below:

postgres=# select datname,datcollate,datctype from pg_database;
datname| datcollate  |  datctype
---+-+-
 postgres   |  en_US.UTF-8 |  en_US.UTF-8
 template1 |  en_US.UTF-8 |  en_US.UTF-8
 template0 |  en_US.UTF-8 |  en_US.UTF-8
 aaa_service  |  ru_RU.UTF-8 |  ru_RU.UTF-8

The aaa_service application database Locale is no change, and there is no
user-defined table or index in postgres, template1, template0 3 databases.
So the question is like our case, if update the postgres, template1,
template0 3 databases locale and upgrade this way has problems and risk? If
we have to get away with it and choose pg_dump-and-restore or pglogical?
Thanks

Best regards
Sun Yi


plpython3 package installation problem

2021-10-25 Thread Yi Sun
Hello,

As we need to use the plpython3u extension, we tried to install the
plpython3 package but showed that we needed to install python3-libs,
but python36-libs was already installed for patroni usage.

1. Will installing python3-libs affect current python36-libs usage?
2. If we can do some configuration to let python36-libs work as
python3-libs then no need to install   python3-libs? Thanks

# yum localinstall /tmp/postgresql11-plpython3-11.11-1PGDG.rhel7.x86_64.rpm
...
   Requires: python3-libs
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

# yum search python3 | grep libs
python34-libs.x86_64 : Python 3 runtime libraries
python36-libs.x86_64 : Python runtime libraries
shiboken-python36-libs.x86_64 : CPython bindings generator for C++
libraries -
# yum list python36-libs.x86_64
Installed Packages
python36-libs.x86_64
 3.6.8-1.el7

Thank you


ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-11-30 Thread Yi Sun
Hi All,

OS: CentOS 7.6
PG: 11.11

Our env already configured ssl
--server postgresql.conf
ssl = 'on'
ssl_ca_file = '/var/lib/pgsql/tls/root.crt'
ssl_cert_file = '/var/lib/pgsql/tls/server.crt'
ssl_key_file = '/var/lib/pgsql/tls/server.key'

--client configuration
$ ls -alrt /var/lib/pgsql/.postgresql
total 20
-rw-r--r-- 1 postgres postgres  688 Nov 30 06:46 root.crt
-rw-r--r-- 1 postgres postgres  778 Nov 30 06:46 postgresql.crt
-rw--- 1 postgres postgres 1708 Nov 30 06:47 postgresql.key

--From client to connect ssl works
$ psql "host=master.pgcluster11.service.consul port=5432 dbname=testdb
user=test sslmode=verify-full"
Password:
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384,
bits: 256, compression: off)

--Now we want to configure the ssl_crl_file and generated the root.crl file
as below redhat doc

https://access.redhat.com/documentation/en-us/red_hat_update_infrastructure/2.1/html/administration_guide/chap-red_hat_update_infrastructure-administration_guide-certification_revocation_list_crl

--Use openssl to verify, shows "certificate revoked"
# cat /home/sunyi/tls/root.crt /home/sunyi/tls/1/root.crl > /tmp/test_1.pem
# openssl verify -extended_crl -verbose -CAfile /tmp/test_1.pem -crl_check
/home/sunyi/tls/1/server.crt
/home/sunyi/tls/1/server.crt: O = Acronis, OU = DBS, CN =
s12345y-patroni_cluster-507460701
error 23 at 0 depth lookup:certificate revoked

--copy root.crl file to /var/lib/pgsql/tls
-bash-4.2$ ls -alrt /var/lib/pgsql/tls
total 20
drwx-- 4 postgres postgres 4096 Nov 30 04:20 ..
-rw-r- 1 postgres postgres 1164 Nov 30 04:20 server.crt
-rw--- 1 postgres postgres 1679 Nov 30 04:20 server.key
-rw-r- 1 postgres postgres  688 Nov 30 04:20 root.crt
drwx-- 2 postgres postgres 4096 Nov 30 04:20 .

--Configure /var/lib/pgsql/11/data/postgresql.conf
ssl_crl_file = '/var/lib/pgsql/tls/root.crl'

--Reload postgresql
$ psql
psql (11.11)
Type "help" for help.

postgres=# select pg_reload_conf();
 pg_reload_conf

 t
(1 row)
postgres=# show ssl_crl_file;
ssl_crl_file
-
 /var/lib/pgsql/tls/root.crl
(1 row)

--From client to connect still can connect not as expect
$ psql "host=master.pgcluster11.service.consul port=5432 dbname=testdb
user=test sslmode=verify-full"
Password:
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384,
bits: 256, compression: off)

Could you please advise if need any more configuration? Thanks

Thanks and best regards
Sun Yi


Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-11-30 Thread Yi Sun
Hi Gabriel,

Thank you.

I copied root.crl again and reload postgresql

-bash-4.2$ ls -alrt /var/lib/pgsql/tls
total 24
-rw-r- 1 postgres postgres 1168 Nov 30 04:20 server.crt
-rw--- 1 postgres postgres 1679 Nov 30 04:20 server.key
-rw-r- 1 postgres postgres  688 Nov 30 04:20 root.crt
-rw-r- 1 postgres postgres  410 Nov 30 07:42 root.crl
drwx-- 4 postgres postgres 4096 Nov 30 08:02 ..
drwx-- 2 postgres postgres 4096 Nov 30 23:36 .

-bash-4.2$ psql
psql (11.11)
Type "help" for help.

postgres=# select pg_reload_conf();
 pg_reload_conf

 t
(1 row)

postgres=# show ssl_crl_file;
ssl_crl_file
-
 /var/lib/pgsql/tls/root.crl
(1 row)

--From client to connect still can connect not as expect
-bash-4.2$ psql "host=master.pgcluster11.service.consul port=5432
dbname=testdb user=test sslmode=verify-full"
Password:
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384,
bits: 256, compression: off)

Is there any more configuration need to do please? Thanks

Thanks and best regards
Sun Yi



Gabriel Cabillon  于2021年11月30日周二 下午10:03写道:

> El 30/11/2021 a las 10:53, Yi Sun escribió:
>
> Hi All,
>
> OS: CentOS 7.6
> PG: 11.11
>
> Our env already configured ssl
> --server postgresql.conf
> ssl = 'on'
> ssl_ca_file = '/var/lib/pgsql/tls/root.crt'
> ssl_cert_file = '/var/lib/pgsql/tls/server.crt'
> ssl_key_file = '/var/lib/pgsql/tls/server.key'
>
> --client configuration
> $ ls -alrt /var/lib/pgsql/.postgresql
> total 20
> -rw-r--r-- 1 postgres postgres  688 Nov 30 06:46 root.crt
> -rw-r--r-- 1 postgres postgres  778 Nov 30 06:46 postgresql.crt
> -rw--- 1 postgres postgres 1708 Nov 30 06:47 postgresql.key
>
> --From client to connect ssl works
> $ psql "host=master.pgcluster11.service.consul port=5432 dbname=testdb
> user=test sslmode=verify-full"
> Password:
> SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384,
> bits: 256, compression: off)
>
> --Now we want to configure the ssl_crl_file and generated the root.crl
> file as below redhat doc
>
>
> https://access.redhat.com/documentation/en-us/red_hat_update_infrastructure/2.1/html/administration_guide/chap-red_hat_update_infrastructure-administration_guide-certification_revocation_list_crl
>
> --Use openssl to verify, shows "certificate revoked"
> # cat /home/sunyi/tls/root.crt /home/sunyi/tls/1/root.crl > /tmp/test_1.pem
> # openssl verify -extended_crl -verbose -CAfile /tmp/test_1.pem -crl_check
> /home/sunyi/tls/1/server.crt
> /home/sunyi/tls/1/server.crt: O = Acronis, OU = DBS, CN =
> s12345y-patroni_cluster-507460701
> error 23 at 0 depth lookup:certificate revoked
>
> --copy root.crl file to /var/lib/pgsql/tls
> -bash-4.2$ ls -alrt /var/lib/pgsql/tls
> total 20
> drwx-- 4 postgres postgres 4096 Nov 30 04:20 ..
> -rw-r- 1 postgres postgres 1164 Nov 30 04:20 server.crt
> -rw--- 1 postgres postgres 1679 Nov 30 04:20 server.key
> -rw-r- 1 postgres postgres  688 Nov 30 04:20 root.crt
> drwx-- 2 postgres postgres 4096 Nov 30 04:20 .
>
> --Configure /var/lib/pgsql/11/data/postgresql.conf
> ssl_crl_file = '/var/lib/pgsql/tls/root.crl'
>
> --Reload postgresql
> $ psql
> psql (11.11)
> Type "help" for help.
>
> postgres=# select pg_reload_conf();
>  pg_reload_conf
> 
>  t
> (1 row)
> postgres=# show ssl_crl_file;
> ssl_crl_file
> -
>  /var/lib/pgsql/tls/root.crl
> (1 row)
>
> --From client to connect still can connect not as expect
> $ psql "host=master.pgcluster11.service.consul port=5432 dbname=testdb
> user=test sslmode=verify-full"
> Password:
> SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384,
> bits: 256, compression: off)
>
> Could you please advise if need any more configuration? Thanks
>
> Thanks and best regards
> Sun Yi
>
>
> Hi,
>
> according to the ls command it seems you copied root.crt instead of
> root.crl
>
> Yours,
> Gabriel
>


Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-12-01 Thread Yi Sun
Hi Kyotaro,

We want to revoke server certificate, just don't know why doesn't take
affect
https://www.postgresql.org/docs/11/ssl-tcp.html
https://www.postgresql.org/docs/11/runtime-config-connection.html#GUC-SSL-CRL-FILE

Kyotaro Horiguchi  于2021年12月1日周三 下午2:12写道:

> At Tue, 30 Nov 2021 21:53:06 +0800, Yi Sun  wrote in
> > # cat /home/sunyi/tls/root.crt /home/sunyi/tls/1/root.crl >
> /tmp/test_1.pem
> > # openssl verify -extended_crl -verbose -CAfile /tmp/test_1.pem
> -crl_check
> > /home/sunyi/tls/1/server.crt
>
> I guess what you really wanted to revoke was not server.crt but
> postgresql.crt.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-12-01 Thread Yi Sun
Hi Kyotaro

>From the description, seems  ~/.postgresql/root.crl  is store client
revoked certificate

https://www.postgresql.org/docs/11/libpq-ssl.html
~/.postgresql/root.crl certificates revoked by certificate authorities server
certificate must not be on this list
Just don't know why server parameter ssl_crl_file parameter configured but
don't take affect

https://www.postgresql.org/docs/11/runtime-config-connection.html#GUC-SSL-CRL-FILE

ssl_crl_file (string)

Specifies the name of the file containing the SSL server certificate
revocation list (CRL). Relative paths are relative to the data directory.
This parameter can only be set in the postgresql.conf file or on the server
command line. The default is empty, meaning no CRL file is loaded.


Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11

2021-12-02 Thread Yi Sun
Hi Kyotaro,

Thank you for your explanation, after putting the crl file to client, it
works now, thanks.

Kyotaro Horiguchi  于2021年12月2日周四 下午12:46写道:

> Hi.
>
> At Thu, 2 Dec 2021 11:31:26 +0800, Yi Sun  wrote in
> > Hi Kyotaro
> >
> > From the description, seems  ~/.postgresql/root.crl  is store client
> > revoked certificate
>
> No.  Revocation is checked on the peer. There's no point for a server
> to check for revocation of its own certificate, and actually that
> doesn't happen. Revocation of a client certificate is checked on
> server side referencing server.crl. Revocation of a server certificate
> is checked on client side referencing postgresql.crl. For example,
> some web browsers make use of CRL of web *servers*, which is
> automatically maintained in background.
>
> You will see it work if you duped the server.crl as
> ~/.postgresql/root.crl on the client.  (I spelled this wrongly in the
> previous message..)
>
> > https://www.postgresql.org/docs/11/libpq-ssl.html
> > ~/.postgresql/root.crl certificates revoked by certificate authorities
> server
> > certificate must not be on this list
> > Just don't know why server parameter ssl_crl_file parameter configured
> but
> > don't take affect
>
> As explained above, it is because the CRL specified by ssl_crl_file
> can only be used to verify client certificates.
>
> >
> https://www.postgresql.org/docs/11/runtime-config-connection.html#GUC-SSL-CRL-FILE
> >
> > ssl_crl_file (string)
> >
> > Specifies the name of the file containing the SSL server certificate
> > revocation list (CRL). Relative paths are relative to the data directory.
> > This parameter can only be set in the postgresql.conf file or on the
> server
> > command line. The default is empty, meaning no CRL file is loaded.
>
> Ah, the "server" in "SSL server certificate revocation list" looks
> like a noise word, rather misleading, or plain wrong, I'm not sure
> which one it actually is.
>
>
> Anyway I propose change the rephrase as "SSL client certification
> revocation list" as attached.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


VACUUM FULL missing chunk number 0 for toast value

2022-01-03 Thread Yi Sun
Hi All,

OS: CentOS 7.6
PG: 11.11

Once we tried to vacuum full a table, got the error msg "ERROR:  missing
chunk number 0 for toast value", there is a doc as below for the select
issue, but for our case select is no issue,
what's the reason caused and how to fix this please? Thanks

https://newbiedba.wordpress.com/2015/07/07/postgresql-missing-chunk-0-for-toast-value-in-pg_toast/

1. vacuum full output:
VACUUM FULL VERBOSE ANALYZE application.notes;
INFO:  vacuuming "application.notes"
ERROR:  missing chunk number 0 for toast value 183500290 in pg_toast_16977

2. explain select output:
explain select * from application.notes;
QUERY PLAN
--
 Seq Scan on notes  (cost=0.00..430924.43 rows=412443 width=767)

Thanks and regards
Sun Yi