Re: How to improve the performance of my SQL query?

2023-07-26 Thread gzh
Thank you very much for taking the time to reply to my question. 

>You might want to check your description of the table definitions.
>Going by the above EXPLAIN ANALYZE output, it very much does not look
>like ms_cd is the primary key of TBL_SHA. If it is then it's very
>weird that you have 320 rows for MS_CD = 'MLD009'. You have some
>data corruption if that's the case. I suspect you've just not

>accurately described the table definition, however.
The primary key of the SHA table has six fields, and ms_cd is just one of them. 
I'm sorry, I didn't make that clear.


>Try executing the query after having done:
>
>SET enable_seqscan TO off;
>
>What plan does it use now?
>

>Is that plan faster or slower than the seq scan plan?
There's improvement, but it's still quite slow.

QUERY PLAN (enable_seqscan=on)
Limit  (cost=2693516.87..2693516.88 rows=1 width=8) (actual 
time=167089.822..167183.058 rows=1 loops=1)
  ->  Aggregate  (cost=2693516.87..2693516.88 rows=1 width=8) (actual 
time=167089.820..167183.056 rows=1 loops=1)
->  Nested Loop  (cost=1000.29..2688558.85 rows=1983209 width=9) 
(actual time=43544.753..166906.304 rows=2413500 loops=1)
  ->  Index Only Scan using tbl_inf_pkc on tbl_inf  
(cost=0.29..8.31 rows=1 width=9) (actual time=1.034..1.038 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = 
'0001'::bpchar))
Heap Fetches: 1
  ->  Gather  (cost=1000.00..2668718.45 rows=1983209 width=18) 
(actual time=43543.714..166447.333 rows=2413500 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Seq Scan on tbl_sha  (cost=0.00..2469397.55 
rows=826337 width=18) (actual time=43537.056..166225.162 rows=804500 loops=3)
  Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = 
'0001'::bpchar))
  Rows Removed by Filter: 15362328
Planning Time: 2.942 ms
Execution Time: 167183.133 ms

SET enable_seqscan TO off;

QUERY PLAN (enable_seqscan=off)
Limit  (cost=2880973.06..2880973.07 rows=1 width=8) (actual 
time=22295.419..22320.102 rows=1 loops=1)
  ->  Aggregate  (cost=2880973.06..2880973.07 rows=1 width=8) (actual 
time=22295.418..22320.100 rows=1 loops=1)
->  Nested Loop  (cost=93112.74..2876169.16 rows=1921561 width=9) 
(actual time=265.880..22000.432 rows=2413500 loops=1)
  ->  Index Only Scan using tbl_inf_pkc on tbl_inf  
(cost=0.29..8.31 rows=1 width=9) (actual time=0.013..0.020 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = 
'0001'::bpchar))
Heap Fetches: 1
  ->  Gather  (cost=93112.45..2856945.24 rows=1921561 width=18) 
(actual time=265.864..21535.325 rows=2413500 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Bitmap Heap Scan on tbl_sha  
(cost=92112.45..2663789.14 rows=800650 width=18) (actual 
time=260.540..21442.169 rows=804500 loops=3)
  Recheck Cond: (ms_cd = 'MLD009'::bpchar)
  Rows Removed by Index Recheck: 49
  Filter: (etrys = '0001'::bpchar)
  Rows Removed by Filter: 295500
  Heap Blocks: exact=13788 lossy=10565
  ->  Bitmap Index Scan on index_search_04_mscd_cdate  
(cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 
rows=330 loops=1)
Index Cond: (ms_cd = 'MLD009'::bpchar)
Planning Time: 0.670 ms
Execution Time: 22320.153 ms





















At 2023-07-25 21:04:16, "David Rowley"  wrote:
>On Fri, 21 Jul 2023 at 13:44, gzh  wrote:
>>
>> The definitions of the columns used in SQL are as follows.
>>
>> TBL_SHA
>> ms_cd character(6) NOT NULL   -- PRIMARY KEY
>> et_cd character(8)
>> etrys character(8)
>
>> explain analyze
>> select COUNT(ET_CD)
>> from TBL_SHA
>> WHERE TBL_SHA.MS_CD = 'MLD009'
>> and TBL_SHA.ETRYS in
>>(select TBL_INF.RY_CD
>> from TBL_INF
>> WHERE TBL_INF.MS_CD = 'MLD009'
>>AND TBL_INF.RY_CD = '0001'
>>)
>> - Execution Plan -
>> Limit  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.769..124168.771 rows=1 loops=1)
>>   ->  Aggregate  (cost=2738709.57..2738709.58 rows=1 width=8) (actual 
>> time=124168.767..124168.769 rows=1 loops=1)
>> ->  Nested Loop  (cost=0.29..2730702.63 rows=3202774 width=9) 
>> (actual time=97264.166..123920.769 rows=320 loops=1)
>>   ->  Index Only Scan using TBL_INF_pkc on TBL_INF  
>> (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
>> Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = 
>> '0001'::bpchar))
>> Heap Fetches: 1
>>   ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
>> width=18) (actu

How to check for Overlapping Date Ranges with extra fields in table EXCLUDE constraint?

2023-07-26 Thread Dionisis Kontominas
Hello all,

  In the Subject I mention what I am intending to do. Letme put some
context; this is my table:

portal_user_role
(
f_id INTEGER NOT NULL,
f_portal_user_id INTEGER NOT NULL,
f_portal_role_id INTEGER NOT NULL,
f_is_active BOOLEAN NOT NULL,
f_is_deleted BOOLEAN NOT NULL,
f_start_date DATE NOT NULL,
f_end_date DATE,
f_created_on TIMESTAMP WITH TIME ZONE NOT NULL,
f_updated_on TIMESTAMP WITH TIME ZONE,
f_created_by CHARACTER VARYING(255) NOT NULL,
f_updated_by CHARACTER VARYING(255),
CONSTRAINT pk_portal_user_role PRIMARY KEY (f_id),
CONSTRAINT fk1_portal_user_role FOREIGN KEY (f_portal_user_id)
REFERENCES portal_user (f_id),
CONSTRAINT fk2_portal_user_role FOREIGN KEY (f_portal_role_id)
REFERENCES portal_role (f_id),
EXCLUDE USING gist (f_portal_user_id WITH =,
f_portal_role_id WITH =,
DATERANGE(f_start_date, COALESCE(f_end_date, 'infinity'), '[]') WITH &&)
);

So, this table has a range of dates [f_start_date, f_end_date] that I do
not want two records to overlap, for the same user, the same role and also
when the f_is_deleted is TRUE only.
I do not care for the records when the f_is_deleted is FALSE on them; i.e.
they should not be part of the restriction/constraint.

How can I achieve this?

Also, should I post this question on pgsql-sql as more appropriate?

Thank you In Advance!

Regards,
Dionisis


Re: How to check for Overlapping Date Ranges with extra fields in table EXCLUDE constraint?

2023-07-26 Thread Thomas Kellerer
Dionisis Kontominas schrieb am 26.07.2023 um 11:00:
> Hello all,
>
> In the Subject I mention what I am intending to do. Letme put some context; 
> this is my table:
>
> portal_user_role
> (
>     f_id INTEGER NOT NULL,
>     f_portal_user_id INTEGER NOT NULL,
>     f_portal_role_id INTEGER NOT NULL,
>     f_is_active BOOLEAN NOT NULL,
>     f_is_deleted BOOLEAN NOT NULL,
>     f_start_date DATE NOT NULL,
>     f_end_date DATE,
>     f_created_on TIMESTAMP WITH TIME ZONE NOT NULL,
>     f_updated_on TIMESTAMP WITH TIME ZONE,
>     f_created_by CHARACTER VARYING(255) NOT NULL,
>     f_updated_by CHARACTER VARYING(255),
>     CONSTRAINT pk_portal_user_role PRIMARY KEY (f_id),
>     CONSTRAINT fk1_portal_user_role FOREIGN KEY (f_portal_user_id) REFERENCES 
> portal_user (f_id),
>     CONSTRAINT fk2_portal_user_role FOREIGN KEY (f_portal_role_id) REFERENCES 
> portal_role (f_id),
>     EXCLUDE USING gist (f_portal_user_id WITH =,
>                         f_portal_role_id WITH =,
>     DATERANGE(f_start_date, COALESCE(f_end_date, 'infinity'), '[]') WITH &&)
> );
>
> So, this table has a range of dates [f_start_date, f_end_date] that I
> do not want two records to overlap, for the same user, the same role
> and also when the f_is_deleted is TRUE only.
>
> I do not care for the records when the f_is_deleted is FALSE on them; i.e. 
> they should not be part of the restriction/constraint. 
>
> How can I achieve this?

You can add a WHERE clause to the exclusion constraint (the condition must be 
enclosed in parentheses though):

EXCLUDE USING gist (f_portal_user_id WITH =, f_portal_role_id WITH =, 
DATERANGE(f_start_date, f_end_date, '[]') WITH &&) where (f_is_deleted)

Note that you don't need COALESCE(f_end_date, 'infinity') because a daterange 
will treat null as infinity anyways.




Re: How to check for Overlapping Date Ranges with extra fields in table EXCLUDE constraint?

2023-07-26 Thread Dionisis Kontominas
Hi Thomas,

  Thank you very much for your reply and comment.

   I am trying to avoid writing trigger code to handle this requirement.

   I will do so and try your suggestion.

  I believe also that the partial constraint you propose to me should be in
the end:   ... WHERE (NOT f_is_deleted) as I do not want the deleted
records to participate in the constraint logic.

Kindest regards,
Dionisis

On Wed, 26 Jul 2023 at 11:18, Thomas Kellerer  wrote:

> Dionisis Kontominas schrieb am 26.07.2023 um 11:00:
> > Hello all,
> >
> > In the Subject I mention what I am intending to do. Letme put some
> context; this is my table:
> >
> > portal_user_role
> > (
> > f_id INTEGER NOT NULL,
> > f_portal_user_id INTEGER NOT NULL,
> > f_portal_role_id INTEGER NOT NULL,
> > f_is_active BOOLEAN NOT NULL,
> > f_is_deleted BOOLEAN NOT NULL,
> > f_start_date DATE NOT NULL,
> > f_end_date DATE,
> > f_created_on TIMESTAMP WITH TIME ZONE NOT NULL,
> > f_updated_on TIMESTAMP WITH TIME ZONE,
> > f_created_by CHARACTER VARYING(255) NOT NULL,
> > f_updated_by CHARACTER VARYING(255),
> > CONSTRAINT pk_portal_user_role PRIMARY KEY (f_id),
> > CONSTRAINT fk1_portal_user_role FOREIGN KEY (f_portal_user_id)
> REFERENCES portal_user (f_id),
> > CONSTRAINT fk2_portal_user_role FOREIGN KEY (f_portal_role_id)
> REFERENCES portal_role (f_id),
> > EXCLUDE USING gist (f_portal_user_id WITH =,
> > f_portal_role_id WITH =,
> > DATERANGE(f_start_date, COALESCE(f_end_date, 'infinity'), '[]') WITH
> &&)
> > );
> >
> > So, this table has a range of dates [f_start_date, f_end_date] that I
> > do not want two records to overlap, for the same user, the same role
> > and also when the f_is_deleted is TRUE only.
> >
> > I do not care for the records when the f_is_deleted is FALSE on them;
> i.e. they should not be part of the restriction/constraint.
> >
> > How can I achieve this?
>
> You can add a WHERE clause to the exclusion constraint (the condition must
> be enclosed in parentheses though):
>
> EXCLUDE USING gist (f_portal_user_id WITH =, f_portal_role_id WITH =,
> DATERANGE(f_start_date, f_end_date, '[]') WITH &&) where (f_is_deleted)
>
> Note that you don't need COALESCE(f_end_date, 'infinity') because a
> daterange will treat null as infinity anyways.
>
>
>


Re: How to check for Overlapping Date Ranges with extra fields in table EXCLUDE constraint?

2023-07-26 Thread Dominique Devienne
On Wed, Jul 26, 2023 at 11:26 AM Dionisis Kontominas 
wrote:

> On Wed, 26 Jul 2023 at 11:18, Thomas Kellerer  wrote:
>
>> Dionisis Kontominas schrieb am 26.07.2023 um 11:00:
>> > do not want two records to overlap, for the same user, the same role
>> > and also when the f_is_deleted is TRUE only.
>> > I do not care for the records when the f_is_deleted is FALSE on them;
>> i.e. they should not be part of the restriction/constraint.
>> > How can I achieve this?
>>
>> EXCLUDE USING gist (f_portal_user_id WITH =, f_portal_role_id WITH =,
>> DATERANGE(f_start_date, f_end_date, '[]') WITH &&) where (f_is_deleted)
>>
>
But that requires the btree_gist extension [1] extension, no?

Just confirming, because I'm been considering a similar approach for
storing chunks of large files (> 1GB),
to enforce those chunks don't overlap, per-"file". Seems ideal to enforce
no-overlap, but OTOH,
you can't seem to see how to enforce "no-holes" for chunks. One concern is
the cost of adding that
enforcement of no-overlap. Most "files" will be small (a few bytes to a
single digit MBs), while some
definitely go into multi-GB territory. So how well do exclusion constraints
scale to 100K or 1M rows?
What's their time-complexity? In other words, should "smaller" (i.e. < 1MB)
"files" go into a separate
table w/o an exclusion constraint and w/o chunking, while only the larger
ones go to the chunked table?

Thanks, --DD

[1]: https://www.postgresql.org/docs/current/btree-gist.html


Re: How to check for Overlapping Date Ranges with extra fields in table EXCLUDE constraint?

2023-07-26 Thread Thomas Kellerer
Dominique Devienne schrieb am 26.07.2023 um 11:39:
> On Wed, Jul 26, 2023 at 11:26 AM Dionisis Kontominas  > wrote:
>
> Dionisis Kontominas schrieb am 26.07.2023 um 11:00:
> > do not want two records to overlap, for the same user, the same role
> > and also when the f_is_deleted is TRUE only.
> > I do not care for the records when the f_is_deleted is FALSE on 
> them; i.e. they should not be part of the restriction/constraint. 
> > How can I achieve this?
>
>     EXCLUDE USING gist (f_portal_user_id WITH =, f_portal_role_id 
> WITH =, DATERANGE(f_start_date, f_end_date, '[]') WITH &&) where 
> (f_is_deleted)
>
>
> But that requires the btree_gist extension [1] extension, no?
>

Yes, but that would also be the case if you didn't include the WHERE clause.

The "WITH =" is the reason you need the btree_gist extension.

> So how well do exclusion constraints scale to 100K or 1M rows?
> What's their time-complexity?

They are using a GIST index, so I would expect all restrictions and advantages 
that apply
to GIST indexes in general, also apply to exclusion constraints.

The main drawback is most probably the slower update compared to a Btree index.

Unless you have a really high update frequency, I wouldn't worry about that for 
such a small table.





logging_collector is off, but the logging is opening

2023-07-26 Thread Wen Yi
Hi community,
When I use the postgres, I check the log.(I use the -l open the log)
As you can see, I type code like this:



postgres=# show logging_collector;
 logging_collector 
---
 off
(1 row)

postgres=# ^C


Follow the document, when logging_collector is off, there's no log output, but 
when I check my 'postgres' directory:


[postgres@fedora postgres]$ ls
build.sh  logfile  src  startdb.sh  stopdb.sh  
update.sh

[postgres@fedora postgres]$ cat logfile
2023-07-24 09:54:59.668 CST [35872] LOG:  starting PostgreSQL 17devel on 
x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4), 
64-bit
2023-07-24 09:54:59.668 CST [35872] LOG:  listening on IPv6 address "::1", 
port 5432
2023-07-24 09:54:59.668 CST [35872] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2023-07-24 09:54:59.684 CST [35872] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
...


In a word, the log is actually open. (My startup command: pg_ctl -l logfile 
/home/postgres/postgres/bin/bin/pg_ctl -l logfile -D 
/home/postgres/postgres/data start)

So It's confuse me, the confilt of logging_collector & logfile.


Can someone give me some advice?
Thanks in advance!


Yours,
Wen Yi

Query on Primary_conninfo

2023-07-26 Thread Praneel Devisetty
Hi,

I have a postgresql cluster running on 13.9.13 and standby has
primary_connfino as below

host=kabc03-dev01-ins06-pgs54-dbs-asy.int.dev.example.com port=5432
user=pubuser_1 passfile='/home/postgres/.pgpass' application_name=dr_node1

hostname keng03-dev01-ins06-wfm54-dbs-asy.int.dev.example.com is internally
mapped to a fqdn, say node1. When  a failover occurs hostname  is mapped to
node2(new master) post failover via a script.

Standy is not picking this change even after a reload.

Restarting standby is working out, but as PG13 supports reload of
primary_connfino is there a reliable way to achieve this without restarting
standby database.

Thanks
Praneel


Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-26 Thread Stephen Frost
Greetings,

* Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
> > Basically, this is not a valid way to perform a backup/restore of PG.
> 
> Is it not valid only for PG 15 or even for earlier versions? I have always
> referred to this https://www.postgresql.org/docs/8.1/backup-online.html

For earlier versions too.

> > better, use an existing well maintained backup/restore tool for this
> 
> Is there any tool you could recommend? A tool to bring back the standby
> node when it was down for a day or so for some OS/firmware upgrade.
> I have tried pgBackRest, but it requires a control node. So anything that
> doesn't involve a 3rd server for this situation would be helpful.

pgBackRest doesn't actually require a control node, though it's
certainly recommended to have one.

Also, if you use physical replication slots, you can have the primary
hold on to the WAL necessary for the replica to catch back up until the
replica is back and then it can just fetch the WAL from the primary and
catch up without any kind of re-sync being necessary.

If you're worried about running out of space on the primary for this
(which is an entirely valid concern...) then you could ship the WAL
files to an archive system or location using pgBackRest or possibly some
other tool and then have the replica fetch the WAL from there once it's
back online.

pgBackRest also has the ability to do incremental backups and delta
restores.  An incremental backup will only store the data that's changed
since the prior backup, minimizing the storage space in the repo and the
time required for the backup.  A delta restore will only restore those
files on the replica which are different from what was in the backup and
that can be done using timestamp/file-size because pgBackRest tracks
that information and will set it on restore.  pgBackRest does also have
an option to do checksum-based restores, which it will automatically use
if anything looks odd regarding the timestamps.

> > this also doesn't grab and restore the absolutely required
> > backup_label file that's returned from pg_backup_stop()
>
> I tried running pg_backup_start('backup') and pg_backup_stop() on my DB,
> but I did not see any backup_label_file being created.
> psql (15.1)
> Type "help" for help.
> 
> postgres=# select pg_backup_start('backup');
>  pg_backup_start
> -
>  68/3228
> (1 row)
> 
> postgres=# select pg_backup_stop();
> NOTICE:  all required WAL segments have been archived
> pg_backup_stop
> ---
>  (68/32000100,"START WAL LOCATION: 68/3228 (file
> 000100680032)+
>  CHECKPOINT LOCATION: 68/3260
>   +
>  BACKUP METHOD: streamed
>+
>  BACKUP FROM: primary
>   +
>  START TIME: 2023-07-26 08:51:28 JST
>+
>  LABEL: backup
>+
>  START TIMELINE: 1
>+
>  ","16724 /PGDATA/datadg/tbs1
>   +
>  16725 /PGDATA/datadg/tbs2
>+
>  ")
> (1 row)
> 
> I read the documentation on this page
> https://www.postgresql.org/docs/current/functions-admin.html

What's returned from pg_backup_stop() is the backup_label that needs to
be stored with the files which were part of the backup.  Note that you
should *not* store the backup_label in the data directory of the primary
because if the system crashes then it won't come back up without someone
going in and removing that file.  That's how the old exclusive method
worked which was deprecated and then finally removed because of this
issue.

> > The desired contents of the backup label file and the tablespace map file
> > are returned as part of the result of the function and must be written to
> > files in the backup area.
> 
> I don't understand, "*must be written to files in the backup area*". Does
> it mean we need to manually create a file first on the master node and then
> rsync it to the backup node?

No, don't do that.  You should back up all the files and then store the
backup_label with those files.  Do *not* put a backup_label into the
data directory on the primary.

* Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
> I got the latest documentation
> and
> understood that I was supposed to run
> `select * from pg_backup_stop();`  and store the labelfile output to a file
> in the data directory of the secondary node.

The backup_label should be stored with the backed up files and then
restored as part of restoring the backup, yes.

> I also understood that this Low-level API backup is not really
> a recommended way and I need to move to another method.

I would recommend that, yes.

> However, till I fully set up a tool like pgBackRest (of which you are one
> of the contributors - very impressive!) for PROD, can you please tell me:
> 1) Are there any other options that are safe to use yet fast? Like you said
> rsync --size-only would miss a lot of changes, but anythin

Re: logging_collector is off, but the logging is opening

2023-07-26 Thread Adrian Klaver

On 7/25/23 14:43, Wen Yi wrote:

Hi community,
When I use the postgres, I check the log.(I use the -l open the log)
As you can see, I type code like this:

postgres=# show logging_collector;
  logging_collector
---
  off
(1 row)

postgres=# ^C

Follow the document, when logging_collector is off, there's no log 
output, but when I check my 'postgres' directory:


[postgres@fedora postgres]$ ls
build.sh  logfile  src  startdb.sh  stopdb.sh  update.sh
[postgres@fedora postgres]$ cat logfile
2023-07-24 09:54:59.668 CST [35872] LOG:  starting PostgreSQL 17devel on 
x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.1.1 20230614 (Red Hat 
13.1.1-4), 64-bit
2023-07-24 09:54:59.668 CST [35872] LOG:  listening on IPv6 address 
"::1", port 5432
2023-07-24 09:54:59.668 CST [35872] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2023-07-24 09:54:59.684 CST [35872] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"

...

In a word, the log is actually open. (My startup command: pg_ctl -l 
logfile /home/postgres/postgres/bin/bin/pg_ctl -l logfile -D 
/home/postgres/postgres/data start)

So It's confuse me, the confilt of logging_collector & logfile.

Can someone give me some advice?


https://www.postgresql.org/docs/current/runtime-config-logging.html


Note

It is possible to log to stderr without using the logging collector; the 
log messages will just go to wherever the server's stderr is directed. 
However, that method is only suitable for low log volumes, since it 
provides no convenient way to rotate log files. Also, on some platforms 
not using the logging collector can result in lost or garbled log 
output, because multiple processes writing concurrently to the same log 
file can overwrite each other's output.



https://www.postgresql.org/docs/current/app-pg-ctl.html

-l filename
--log=filename

Append the server log output to filename. If the file does not 
exist, it is created. The umask is set to 077, so access to the log file 
is disallowed to other users by default.




Thanks in advance!

Yours,
Wen Yi


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Query on Primary_conninfo

2023-07-26 Thread Michael Paquier
On Wed, Jul 26, 2023 at 05:33:50PM +0530, Praneel Devisetty wrote:
> Standy is not picking this change even after a reload.
> 
> Restarting standby is working out, but as PG13 supports reload of
> primary_connfino is there a reliable way to achieve this without restarting
> standby database.

primary_conninfo should be able to switch dynamically the upstream
server it streams WAL from on a reload.  I've quickly checked with a
simple configuration and I was able to get it done without a restart
here.  Perhaps you have found a bug, but it's hard to say without
having more details about what's happening.  For example, what do the
logs of the standby tell you?  Are you sure that the reload was done
on the correct node?  Did you check with a SHOW command that the new
value is reflected on your standby to what you want it to be?
--
Michael


signature.asc
Description: PGP signature


Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-26 Thread Abhishek Bhola
Hi Stephen

Thank you for the very detailed reply. I tried the above method and it
works. I'm still setting up pgBackRest.

I have one last question if you can answer that too please.

> A delta restore will only restore those
> files on the replica which are different from what was in the backup and
> that can be done using timestamp/file-size because pgBackRest tracks
> that information and will set it on restore.

As per my understanding, primary and standby nodes can have exactly the
same data, with no data corruption, but still have different timestamps on
the data files in the tablespace folder. Please correct me if I am wrong.
If that's the case, then will the  backup (incremental/delta)  taken from
the primary node on pgBackRest, restore the files on the secondary node,
just because they have a different timestamp? Or does pgBackRest have some
mechanism to detect this and skip those files? Please assume for this case
that we ran out of capacity to store the WAL segments from the primary
while the secondary node was down.

Thanks

On Wed, 26 Jul 2023, 22:55 Stephen Frost,  wrote:

> Greetings,
>
> * Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
> > > Basically, this is not a valid way to perform a backup/restore of PG.
> >
> > Is it not valid only for PG 15 or even for earlier versions? I have
> always
> > referred to this https://www.postgresql.org/docs/8.1/backup-online.html
>
> For earlier versions too.
>
> > > better, use an existing well maintained backup/restore tool for this
> >
> > Is there any tool you could recommend? A tool to bring back the standby
> > node when it was down for a day or so for some OS/firmware upgrade.
> > I have tried pgBackRest, but it requires a control node. So anything that
> > doesn't involve a 3rd server for this situation would be helpful.
>
> pgBackRest doesn't actually require a control node, though it's
> certainly recommended to have one.
>
> Also, if you use physical replication slots, you can have the primary
> hold on to the WAL necessary for the replica to catch back up until the
> replica is back and then it can just fetch the WAL from the primary and
> catch up without any kind of re-sync being necessary.
>
> If you're worried about running out of space on the primary for this
> (which is an entirely valid concern...) then you could ship the WAL
> files to an archive system or location using pgBackRest or possibly some
> other tool and then have the replica fetch the WAL from there once it's
> back online.
>
> pgBackRest also has the ability to do incremental backups and delta
> restores.  An incremental backup will only store the data that's changed
> since the prior backup, minimizing the storage space in the repo and the
> time required for the backup.  A delta restore will only restore those
> files on the replica which are different from what was in the backup and
> that can be done using timestamp/file-size because pgBackRest tracks
> that information and will set it on restore.  pgBackRest does also have
> an option to do checksum-based restores, which it will automatically use
> if anything looks odd regarding the timestamps.
>
> > > this also doesn't grab and restore the absolutely required
> > > backup_label file that's returned from pg_backup_stop()
> >
> > I tried running pg_backup_start('backup') and pg_backup_stop() on my DB,
> > but I did not see any backup_label_file being created.
> > psql (15.1)
> > Type "help" for help.
> >
> > postgres=# select pg_backup_start('backup');
> >  pg_backup_start
> > -
> >  68/3228
> > (1 row)
> >
> > postgres=# select pg_backup_stop();
> > NOTICE:  all required WAL segments have been archived
> > pg_backup_stop
> >
> ---
> >  (68/32000100,"START WAL LOCATION: 68/3228 (file
> > 000100680032)+
> >  CHECKPOINT LOCATION: 68/3260
> >   +
> >  BACKUP METHOD: streamed
> >+
> >  BACKUP FROM: primary
> >   +
> >  START TIME: 2023-07-26 08:51:28 JST
> >+
> >  LABEL: backup
> >+
> >  START TIMELINE: 1
> >+
> >  ","16724 /PGDATA/datadg/tbs1
> >   +
> >  16725 /PGDATA/datadg/tbs2
> >+
> >  ")
> > (1 row)
> >
> > I read the documentation on this page
> > https://www.postgresql.org/docs/current/functions-admin.html
>
> What's returned from pg_backup_stop() is the backup_label that needs to
> be stored with the files which were part of the backup.  Note that you
> should *not* store the backup_label in the data directory of the primary
> because if the system crashes then it won't come back up without someone
> going in and removing that file.  That's how the old exclusive method
> worked which was deprecated and then finally removed because of this
> issue.
>
> > > The desired contents of the backup label file and the tablespace map
> file
> > > are returned as part of the result of the function and must be written
> to
> > > files in the backup

Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-26 Thread Stephen Frost
Greetings,

On Wed, Jul 26, 2023 at 21:31 Abhishek Bhola 
wrote:

> Thank you for the very detailed reply. I tried the above method and it
> works. I'm still setting up pgBackRest.
>
> I have one last question if you can answer that too please.
>
>> A delta restore will only restore those
>> files on the replica which are different from what was in the backup and
>> that can be done using timestamp/file-size because pgBackRest tracks
>> that information and will set it on restore.
>
> As per my understanding, primary and standby nodes can have exactly the
> same data, with no data corruption, but still have different timestamps on
> the data files in the tablespace folder. Please correct me if I am wrong.
> If that's the case, then will the  backup (incremental/delta)  taken from
> the primary node on pgBackRest, restore the files on the secondary node,
> just because they have a different timestamp? Or does pgBackRest have some
> mechanism to detect this and skip those files? Please assume for this case
> that we ran out of capacity to store the WAL segments from the primary
> while the secondary node was down.
>

When pgbackrest does a restore, it will also set the timestamps for the
files that it restores to what the timestamp was in the manifest. This
allows us to detect if those files were changed since the restore happened.
If they’ve not changed since the restore, then we skip checking them to see
if they need to be restored from the repository.

Thanks,

Stephen

>


Re: How to improve the performance of my SQL query?

2023-07-26 Thread David Rowley
On Wed, 26 Jul 2023 at 19:46, gzh  wrote:
> QUERY PLAN (enable_seqscan=on)

> Execution Time: 167183.133 ms

> QUERY PLAN (enable_seqscan=off)

> Execution Time: 22320.153 ms

effective_cache_size and random_page_cost are the settings you should
be adjusting to coax the planner into using the index.

A rule of thumb for effective_cache_size would be to set it to about
75% of RAM. There are certainly cases where lower would make more
sense, certainly, 75% will make more sense than the default 4GB value
in the majority of cases.

For random_page_cost, the default of 4.0 has been the default since
HDDs were common. SSDs are common now and, comparatively to sequential
I/O, their random I/O is faster than that of an HDD, so you may get
better results by lowering random_page_cost.

David