Re: Can I get the number of results plus the results with a single query?

2022-08-17 Thread Peter J. Holzer
On 2022-08-16 14:42:48 -0700, Bryn Llewellyn wrote:
> hjp-pg...@hjp.at wrote:
> The OP wants some kind of progress indicator. To be useful, such
> an indicator should be approximately linear in time. I.e. if your
[...]
> 
> 
> I see, Peter. You’d read the OP’s mind.

Not much mind-reading involved, I hope. The first sentence in the
message was:

| I like to have what I call “baby sitting” messages such as “Completed 15 out 
of 1023”.

That's what I would call a "progress indicator". Such things were
already common when I first got involved with computers almost 40 years
ago, so there isn't much to guess or to invent.

> But I’d failed to. I saw the subject, you I assumed that the OP wanted
> the entire result set together with the count of the results. (After
> all, there’s no inflexions of “page” in the OP’s question.)

I don't think his question was about paging. That's a different although
related topic.

> It sounds like the OP wants a fast approximate count for a query whose
> restriction isn’t known until runtime.

Maybe an approximate count would be enough, but he didn't say so.

(He did later clarify that he's fetching a lot of data for each row, so
«select count(*) ,,,» is indeed much faster than «select * ...» due to
the sheer amount of data to be transferred. That wasn't obvious from his
first message, but I hedged against the possibility in my answer.)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


i added Arabic Dictionary but how I know i'm using it

2022-08-17 Thread Mohammed falih
Greetings Dear

I hope you are fine and doing well.

I have a created a dictionary in postgres successfully

but how may I know that I'm using it?

because I'm doing now queries and the results are the same  before I
created the dictionary

I downloaded an rar folder that I supposed its a dictionary

i copied a files called th_ar.dat and ar.aff and pasted them in tsearch
folder in postgres folder and renamed them as the_ar.dict and th_ar.affix

as the documentation in postgres told me


https://www.postgresql.org/docs/current/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY


I created the dictionary by this query

CREATE TEXT SEARCH DICTIONARY arabic_dict (

TEMPLATE = ispell,

   DictFile = th_ar,

AffFile = th_ar,

Stopwords = arabic);

and as you know the purpose in dictionary is like when you're searching for
an example a "house" the results will also retrieve a records with "home"
because it's a synonymous but  when I run queries nothing happens

so my questions are

1)is there anything I should do before do the query to use the dictionary?

2) is there any chance that I'm doing something wrong ? please mention them?

3) how i know that the dictionary I'm using is correct ?

thank you for your efforts

my best regards


Re: Regarding availability of 32bit client drivers for postgresql 13/14

2022-08-17 Thread Ron

WHY 32-bit in 2022 under RHEL 8.5?

On 8/16/22 23:53, Aravind Phaneendra wrote:


Thank you Adrian for the response,

We are looking for 32bit client drivers for RHEL 8.5.

An another question .. How does the enterprise customers using PostgreSQL 
can subscribe to official support ?


Thanks & Regards,

Aravind Phaneendra

CICS TX and TXSeries Development & L3 Support

India Systems Development Labs

IBM Systems

*From: *Adrian Klaver 
*Date: *Wednesday, 17 August 2022 at 9:57 AM
*To: *Aravind Phaneendra , 
pgsql-general@lists.postgresql.org 
*Subject: *[EXTERNAL] Re: Regarding availability of 32bit client drivers 
for postgresql 13/14


On 8/16/22 20:46, Aravind Phaneendra wrote:
> Hi,
>
> My name is Aravind and I am part of IBM CICS TX product development and
> support. We have a requirement from one of our customers to use IBM CICS
> TX with PostgreSQL 13/14. IBM CICS TX is a Transaction Manager
> middleware product that is deployed as container on Kubernetes
> platforms. IBM CICS TX can interact with database products such as DB2,
> Oracle, MSSQL, PostgreSQL through XA/Open standards.
>
> CICS TX is a 32bit C runtime product and uses the databases’ 32bit
> client libraries to perform embedded SQL operations. The customer
> applications are Embedded SQL C or COBOL programs deployed on CICS TX
> and CICS TX runtime executes them as transactions ensuring the data
> integrity.
>
> We observed there are no 32bit client binaries/libraries available with
> PostgreSQL 13/14 and CICS TX require them to interact with the
> PostgreSQL server. Currently we have tested with PostgreSQL 10.12.1 and
> our customer wants to upgrade to PostgreSQL 13 or 14.
>
> Based on the above requirements and details, we have few questions which
> require your support.
>
>  1. Can we get 32bit client binaries/libraries for PostgreSQL 14 ?

The ODBC driver does it:

https://odbc.postgresql.org/docs/win32-compilation.html

Also the Build farm:

https://buildfarm.postgresql.org/cgi-bin/show_status.pl

has 32bit/i686 members that show green.

>  2. We also found that the libraries can be built by using the
> PostgreSQL 14 source. Is it possible to build the 32bit client
> binaries/libraries from the source available ?

Try it and see.

>  3. Is there an official support for 32bit client libraries/binaries
> built out of source for customers ?
>  4. Can the PostgreSQL 10.12.1 client work with PostgreSQL 14 server ?
> Do you still support PostgreSQL 10.12.1 client ?
>
> Thanks & Regards,
>
> Aravind Phaneendra
>
> CICS TX and TXSeries Development & L3 Support
>
> India Systems Development Labs
>
> IBM Systems
>


--
Adrian Klaver
adrian.kla...@aklaver.com



--
Angular momentum makes the world go 'round.

Sub:column "" is of type bigint but expression is of type character varying

2022-08-17 Thread Rama Krishnan
Hi All,

I am having table name called tickets


\d tickets

Column|Type | Collation |
Nullable | Default
--+-+---+--+-
 id   | bigint  |
  | not null |
 ticket_purchase_no| bigint ||
not null |

this below table contains more than 2 years old data
\d tickets_archive

Column|Type | Collation |
Nullable | Default
--+-+---+--+-
 id   | bigint  |
  | not null |
 ticket_purchase_no| bigint ||
not null |




i have purged the old data from orginal table when i am restoring the data
from archive table into orignal table i am getting the error


*insert into tickets select * from tickets_archive;*


 column "*ticket_purchase_no*" is of type bigint but expression is of type
character varying
Regards

A.Rama Krishnan


Postgres question

2022-08-17 Thread ajay venki
Hi ,
I am new to PostgreSQL and i have a general question to clarify. is this
the right forum or the mail address to post my questions?

Thanks


Re: Postgres question

2022-08-17 Thread hubert depesz lubaczewski
On Wed, Aug 17, 2022 at 10:32:26AM +0100, ajay venki wrote:
> I am new to PostgreSQL and i have a general question to clarify. is this
> the right forum or the mail address to post my questions?

Yes, this is the right place.

Best regards,

depesz





Re: Sub:column "" is of type bigint but expression is of type character varying

2022-08-17 Thread JITEN KUMAR SHAH

1. Please check you tickets_archive.ticket_purchase_no data type.

2. check if you are using correct schema.


On 8/17/22 14:23, Rama Krishnan wrote:

Hi All,

I am having table name called tickets


\d tickets

            Column            |     Type             | Collation | 
Nullable | Default

--+-+---+--+-
 id                           | bigint                     |   | not 
null |
 ticket_purchase_no| bigint          |                        | not 
null |


this below table contains more than 2 years old data
\d tickets_archive

            Column |            Type             | Collation | 
Nullable | Default

--+-+---+--+-
 id                           | bigint                      |         
          | not null |

 ticket_purchase_no| bigint                  |         | not null |




i have purged the old data from orginal table when i am restoring the 
data from archive table into orignal table i am getting the error



*insert into tickets select * from tickets_archive;*


 column "*ticket_purchase_no*" is of type bigint but expression is of 
type character varying

Regards

A.Rama Krishnan


Re: Postgres question

2022-08-17 Thread ajay venki
Thanks. I am looking forward to install pgloader tool to migrate my MS SQL
data to postgres.
I tried searching it online and the instructions were not clear to me. Is
there any article or video which talks about pgloader installation on
windows 10 ?

Thanks
Ajay

On Wed, Aug 17, 2022 at 1:24 PM hubert depesz lubaczewski 
wrote:

> On Wed, Aug 17, 2022 at 10:32:26AM +0100, ajay venki wrote:
> > I am new to PostgreSQL and i have a general question to clarify. is this
> > the right forum or the mail address to post my questions?
>
> Yes, this is the right place.
>
> Best regards,
>
> depesz
>
>


Re: Postgres question

2022-08-17 Thread Thomas Kellerer
ajay venki schrieb am 17.08.2022 um 14:39:
> Thanks. I am looking forward to install pgloader tool to migrate my MS SQL 
> data to postgres. 
> I tried searching it online and the instructions were not clear to me. Is 
> there any article or video which talks about pgloader installation on windows 
> 10 ?

pgLoader isn't full supported on Windows as far as I know. You will have to 
build it yourself,
but the manual[1] mentions several problems and drawbacks

It's probably easier to use docker to get it running, rather than trying to 
build it yourself.

Or don't use Windows to run Postgres - it runs much better on Linux anyway.

Thomas

---
1: https://pgloader.readthedocs.io/en/latest/install.html




Re: Sub:column "" is of type bigint but expression is of type character varying

2022-08-17 Thread Adrian Klaver

On 8/17/22 01:53, Rama Krishnan wrote:

Hi All,





i have purged the old data from orginal table when i am restoring the 
data from archive table into orignal table i am getting the error



*insert into tickets select * from tickets_archive;*


The above depends on:

https://www.postgresql.org/docs/current/sql-insert.html

"The target column names can be listed in any order. If no list of 
column names is given at all, the default is all the columns of the 
table in their declared order; or the first N column names, if there are 
only N columns supplied by the VALUES clause or query. The values 
supplied by the VALUES clause or query are associated with the explicit 
or implicit column list left-to-right.

"

If this is not the case then you can get mismatched columns where a 
varchar value is being inserted into an integer field. Verify that the 
table column order is the same for both tables.







  column "*ticket_purchase_no*" is of type bigint but expression is of 
type character varying

Regards

A.Rama Krishnan



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Sub:column "" is of type bigint but expression is of type character varying

2022-08-17 Thread David G. Johnston
On Wed, Aug 17, 2022 at 5:13 AM Rama Krishnan  wrote:

> Hi All,
>
> I am having table name called tickets
>
>
> \d tickets
>
> Column|Type | Collation |
> Nullable | Default
>
> --+-+---+--+-
>  id   | bigint  |
>   | not null |
>  ticket_purchase_no| bigint ||
> not null |
>
> this below table contains more than 2 years old data
> \d tickets_archive
>
> Column|Type | Collation |
> Nullable | Default
>
> --+-+---+--+-
>  id   | bigint  |
>   | not null |
>  ticket_purchase_no| bigint ||
> not null |
>
>
>
>
> i have purged the old data from orginal table when i am restoring the data
> from archive table into orignal table i am getting the error
>
>
> *insert into tickets select * from tickets_archive;*
>
>
>  column "*ticket_purchase_no*" is of type bigint but expression is of
> type character varying
>
>
This sequence seems impossible if executed all from the same psql session.
So I presume you most likely aren't actually doing that, and so the psql
output you show is meaningless since it doesn't show what the insert/select
command is actually working with.

If you are, showing the results of "select * from {tickets|tickets_archive}
limit 1" would be informative.  Listing columns explicitly in the insert
command and then putting an explicit cast on
tickets_archive.ticket_purchase_no would also be interesting.

David J.


Re: i added Arabic Dictionary but how I know i'm using it

2022-08-17 Thread Laurenz Albe
On Wed, 2022-08-17 at 14:33 +0300, Mohammed falih wrote:
> I created the dictionary by this query 
> CREATE TEXT SEARCH DICTIONARY arabic_dict (
>     TEMPLATE = ispell,
>    DictFile = th_ar,
>     AffFile = th_ar,
>     Stopwords = arabic);
> and as you know the purpose in dictionary is like when you're searching for an
> example a "house" the results will also retrieve a records with "home" because
> it's a synonymous but  when I run queries nothing happens

That would be a "synonym dictionary":
https://www.postgresql.org/docs/current/textsearch-dictionaries.html#TEXTSEARCH-SYNONYM-DICTIONARY
An Ispell dictionary normalizes words, for example by removing suffixes for
plural and case.


You'd have to create a synonym file yourself.

Yours,
Laurenz Albe




Setting up streaming replication on large database (20+ TB) for the first time

2022-08-17 Thread Ivan N. Ivanov
I have a large database (~25 TB) and I want to set up streaming replication
for the first time.

My problem is that after completion of the pg_basebackup (which completed
for 2 days with --wal-method=none) now PG is replaying the WAL files from
the WAL archive directory but it can not keep up. The replaying of WAL
files is the same as the physical time, for example:

2022-08-17 22:42:57 EEST [13507-6] [] DETAIL:  Last completed transaction
was at log time 2022-08-15 18:24:02.155289+03.
2022-08-17 22:48:35 EEST [13507-12] [] DETAIL:  Last completed transaction
was at log time 2022-08-15 18:29:54.962822+03.
2022-08-17 22:54:35 EEST [13507-16] [] DETAIL:  Last completed transaction
was at log time 2022-08-15 18:34:20.099468+03.

>From ~22:43 to ~22:48 there are 5 minutes. And completed transactions are
at ~18:24 and ~18:29 (5 minutes).

I have even put all WAL files from the archive directly in the pg_wal
directory of the replica and now PostgreSQL skips the cp command from
restore_command, i.e. I have removed the restore_command and now the WAL
files are only recovering, this is the only operation, but it is slow:

postgres: startup   recovering 00010003FC790013
postgres: startup   recovering 00010003FC790014
postgres: startup   recovering 00010003FC790015
...

And it cannot keep up and my replication cannot start since it is 2 days
behind the master... The replica has the same SSD disks as the master.

Is there a better way to do this? How to speed up recovering of WAL files?
I have increased shared_buffers as much as I can... Is there something that
I miss from the recovery process?

I do not have problems setting up replications for the first time for small
database (10 GB - 100 GB), but for 25 TB I can not set the replication,
because of this lag.


Re: Setting up streaming replication on large database (20+ TB) for the first time

2022-08-17 Thread Christophe Pettus



> On Aug 17, 2022, at 13:06, Ivan N. Ivanov  wrote:
> 
> How to speed up recovering of WAL files?

Since you are running on your own hardware, you might take a look at:

https://github.com/TritonDataCenter/pg_prefaulter




Re: [(catch-ext)] Re: Setting up streaming replication on large database (20+ TB) for the first time

2022-08-17 Thread Ivan N. Ivanov
Thank you for your answer! I have found this tool and I will try it
tomorrow to see if this "read-ahead" feature will speed up the process.

On Wed, Aug 17, 2022 at 11:09 PM Christophe Pettus  wrote:

>
>
> > On Aug 17, 2022, at 13:06, Ivan N. Ivanov 
> wrote:
> >
> > How to speed up recovering of WAL files?
>
> Since you are running on your own hardware, you might take a look at:
>
> https://github.com/TritonDataCenter/pg_prefaulter
>


Different execution plan between PostgreSQL 8.2 and 12.5

2022-08-17 Thread gzh
Hi, 




I have had a Perl Website working for 7 years and have had no problems

until a few weeks ago I replaced my database server with a newer one.




Database server (old): PostgreSQL 8.2 32bit

Database server (new): PostgreSQL 12.5 64bit




I run following sql in PostgreSQL 8.2 and PostgreSQL 12.5, it returns different 
execution plan.




--SQL




explain select 

  crew_base.crewid

from crew_base

 left join crew_base as crew_base_introduced on 
crew_base.introduced_by=crew_base_introduced.crewid

where crew_base.status = '1';







--PostgreSQL 8.2

---

QUERY PLAN

Limit  (cost=188628.24..189521.23 rows=1 width=10)

  ->  Hash Left Join  (cost=188628.24..3800200.71 rows=40443494 width=10)

Hash Cond: (lower(crew_base.introduced_by) = 
lower(crew_base_introduced.crewid))

->  Seq Scan on crew_base  (cost=0.00..165072.69 rows=5446 width=20)

  Filter: (status = 1)

->  Hash  (cost=161359.55..161359.55 rows=1485255 width=10)

  ->  Seq Scan on crew_base crew_base_introduced  
(cost=0.00..161359.55 rows=1485255 width=10)







--PostgreSQL 12.5

---

QUERY PLAN

Limit  (cost=0.43..47861.44 rows=1 width=7)

  ->  Nested Loop Left Join  (cost=0.43..169386135.30 rows=35391255 width=7)

Join Filter: (lower(crew_base.introduced_by) = 
lower(crew_base_introduced.crewid))

->  Seq Scan on crew_base  (cost=0.00..128942.75 rows=4759 width=14)

  Filter: (status = 1)

->  Materialize  (cost=0.43..51909.70 rows=1487340 width=7)

  ->  Index Only Scan using crew_base_crewid_index on crew_base 
crew_base_introduced  (cost=0.43..38663.00 rows=1487340 width=7)







PostgreSQL 8.2 quickly queried the data, but PostgreSQL 12.5 has not responded.

I'm guessing that the lower() function of PostgreSQL 12.5 invalidates the 
index. 

But I don't understand why PostgreSQL 8.2 is normal.




What is the reason for this and is there any easy way to maintain compatibility?







Regards,




--







gzh




Re: Different execution plan between PostgreSQL 8.2 and 12.5

2022-08-17 Thread Tom Lane
gzh  writes:
> I run following sql in PostgreSQL 8.2 and PostgreSQL 12.5, it returns 
> different execution plan.

8.2 is ... well, not stone age maybe, but pretty durn ancient.
You really ought to update a bit more often than that.  (And
maybe pay more attention to staying up to date with minor releases?
Whatever was your reasoning for choosing 12.5, when the latest 12.x
release is 12.12?)

The 12.5 plan looks like it thinks that the join condition is not
hashable --- and probably not mergeable as well, else it would have
done a mergejoin.  This is odd if we assume that the lower()
outputs are just text.  But you haven't said anything about the
data types involved, nor what locale setting you're using, nor
what nondefault settings or extensions you might be using, so
speculation about the cause would just be speculation.

There is some advice here about how to ask this sort of
question in a way that would obtain useful answers:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

regards, tom lane




Re: Different execution plan between PostgreSQL 8.2 and 12.5

2022-08-17 Thread David Rowley
On Thu, 18 Aug 2022 at 15:32, Tom Lane  wrote:
> The 12.5 plan looks like it thinks that the join condition is not
> hashable --- and probably not mergeable as well, else it would have
> done a mergejoin.  This is odd if we assume that the lower()
> outputs are just text.  But you haven't said anything about the
> data types involved, nor what locale setting you're using, nor
> what nondefault settings or extensions you might be using, so
> speculation about the cause would just be speculation.

In addition to that, I couldn't help notice that the quoted SQL does
not seem to belong to the explain.  The EXPLAIN has a Limit node, but
the query does not. I'm assuming this isn't due to the relations being
views since we don't pull up subqueries with a LIMIT.

The costs for the 12.5 are cheaper than 8.4's, so I imagine the more
likely cause is the planner favouring an early startup plan.

It's probably more likely that lower() is providing the planner with
bad estimates and there's likely far less than the expected rows,
resulting in the LIMIT 1 being a much larger proportion of the
total rows than the planner expects.

David




Re: Different execution plan between PostgreSQL 8.2 and 12.5

2022-08-17 Thread Adrian Klaver

On 8/17/22 20:01, gzh wrote:

Hi,


I have had a Perl Website working for 7 years and have had no problems

until a few weeks ago I replaced my database server with a newer one.


Did you run ANALYZE on the 12.5 server after restoring the data to it?




gzh





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Setting up streaming replication on large database (20+ TB) for the first time

2022-08-17 Thread Ron
pg_backrest will certainly backup your data faster. It might be able to be 
used as a seed instead of pg_basebackup.


On 8/17/22 15:06, Ivan N. Ivanov wrote:
I have a large database (~25 TB) and I want to set up streaming 
replication for the first time.


My problem is that after completion of the pg_basebackup (which completed 
for 2 days with --wal-method=none) now PG is replaying the WAL files from 
the WAL archive directory but it can not keep up. The replaying of WAL 
files is the same as the physical time, for example:


2022-08-17 22:42:57 EEST [13507-6] [] DETAIL:  Last completed transaction 
was at log time 2022-08-15 18:24:02.155289+03.
2022-08-17 22:48:35 EEST [13507-12] [] DETAIL:  Last completed transaction 
was at log time 2022-08-15 18:29:54.962822+03.
2022-08-17 22:54:35 EEST [13507-16] [] DETAIL:  Last completed transaction 
was at log time 2022-08-15 18:34:20.099468+03.


From ~22:43 to ~22:48 there are 5 minutes. And completed transactions are 
at ~18:24 and ~18:29 (5 minutes).


I have even put all WAL files from the archive directly in the pg_wal 
directory of the replica and now PostgreSQL skips the cp command from 
restore_command, i.e. I have removed the restore_command and now the WAL 
files are only recovering, this is the only operation, but it is slow:


postgres: startup   recovering 00010003FC790013
postgres: startup   recovering 00010003FC790014
postgres: startup   recovering 00010003FC790015
...

And it cannot keep up and my replication cannot start since it is 2 days 
behind the master... The replica has the same SSD disks as the master.


Is there a better way to do this? How to speed up recovering of WAL files? 
I have increased shared_buffers as much as I can... Is there something 
that I miss from the recovery process?


I do not have problems setting up replications for the first time for 
small database (10 GB - 100 GB), but for 25 TB I can not set the 
replication, because of this lag.




--
Angular momentum makes the world go 'round.




Unable to Create or Drop Index Concurrently

2022-08-17 Thread Abdul Qoyyuum
Hi list,

We have a running Master-Slave High Availability set up. Naturally, we
can't run any changes on read-only databases on slave, so we have to do it
on the master node.

When trying to run the following command:

create index concurrently idx_cash_deposit_channel_id_batch_id on
cash_deposit (channel_id, batch_id);


Waiting for a long time, and my connection dropped. When checking the
table, we get the index as INVALID

Indexes:
"pk_cash_deposit" PRIMARY KEY, btree (id)
"idx_cash_deposit_channel_id_batch_id" btree (channel_id, batch_id)
INVALID

And when dropping the invalid index, also takes a long time, my connection
timed out, then when logging back in and check the table, it hasn't dropped.

Question is, do we have to shutdown traffic and close all existing open
connections in order to drop and properly recreate the index? Any advice
appreciated.

-- 
Abdul Qoyyuum Bin Haji Abdul Kadir
HP No: +673 720 8043


Re: Unable to Create or Drop Index Concurrently

2022-08-17 Thread Christophe Pettus



> On Aug 17, 2022, at 22:57, Abdul Qoyyuum  wrote:
> Question is, do we have to shutdown traffic and close all existing open 
> connections in order to drop and properly recreate the index?

No, you don't.

On the CREATE INDEX CONCURRENTLY command, what is likely going on is that when 
the connection drops, the session terminates, which will terminate the CREATE 
INDEX CONCURRENTLY command and leave the index in an INVALID state.  The 
problem to solve is preventing the session from disconnecting, either by 
finding a way to avoid a timeout, connecting via screen or tmux, etc.

On the DROP INDEX, what is likely going on is that the DROP INDEX is waiting 
for other transactions which are accessing that table to finish, since it needs 
to take an exclusive lock on the table.  If the session drops, the command 
isn't run, so the index hasn't been dropped.  The solution is the same as 
above.  If you are on a version that supports it,  you can use the DROP INDEX 
CONCURRENTLY command to avoid locking issues with the table, since even before 
the DROP INDEX happens, new transactions attempting to access that table will 
queue up behind the DROP INDEX.



Is it possible to keep indexes on different disk location?

2022-08-17 Thread W.P.

Hi there,

I have a PostgreSQL 11 server on OrangePi3 (ARM 64 bit, Armbian, PG from 
distro).


Database (all cluster) is located on USB disk. This approach give me 
already 2 times loosing DB contents (it is a replica of DB on i7).


But the whole thing (mainly indexes) is about 50G, and internal storage 
is only 32GB.


Is it possible to move  DB tables etc to this internal storage (sure 
connection) and put only    indexes on USB  HDD?


And will it help in case of losing connection to USB disk? (DB 
recoverable instead of total crash)?



Laurent





Re: Setting up streaming replication on large database (20+ TB) for the first time

2022-08-17 Thread Vijaykumar Jain
I just did a backup and restore of a replica using pgbackrest.
db size 28tb

nvme/ssd storage
96 cpu, 380 gb mem

zst compression, 24 workers (backup, 12 workers restore)

2.5 hours to backup
2 hours to restore.
Wal replay is something I forgot to tune, but I could now use
https://pgbackrest.org/configuration.html#section-archive/option-archive-get-queue-max
to speed up pulls too.
Everything is on prem, no cloud FYI and gentoo.



On Thu, Aug 18, 2022, 11:23 AM Ron  wrote:

> pg_backrest will certainly backup your data faster. It might be able to be
> used as a seed instead of pg_basebackup.
>
> On 8/17/22 15:06, Ivan N. Ivanov wrote:
> > I have a large database (~25 TB) and I want to set up streaming
> > replication for the first time.
> >
> > My problem is that after completion of the pg_basebackup (which
> completed
> > for 2 days with --wal-method=none) now PG is replaying the WAL files
> from
> > the WAL archive directory but it can not keep up. The replaying of WAL
> > files is the same as the physical time, for example:
> >
> > 2022-08-17 22:42:57 EEST [13507-6] [] DETAIL:  Last completed
> transaction
> > was at log time 2022-08-15 18:24:02.155289+03.
> > 2022-08-17 22:48:35 EEST [13507-12] [] DETAIL:  Last completed
> transaction
> > was at log time 2022-08-15 18:29:54.962822+03.
> > 2022-08-17 22:54:35 EEST [13507-16] [] DETAIL:  Last completed
> transaction
> > was at log time 2022-08-15 18:34:20.099468+03.
> >
> > From ~22:43 to ~22:48 there are 5 minutes. And completed transactions
> are
> > at ~18:24 and ~18:29 (5 minutes).
> >
> > I have even put all WAL files from the archive directly in the pg_wal
> > directory of the replica and now PostgreSQL skips the cp command from
> > restore_command, i.e. I have removed the restore_command and now the WAL
> > files are only recovering, this is the only operation, but it is slow:
> >
> > postgres: startup   recovering 00010003FC790013
> > postgres: startup   recovering 00010003FC790014
> > postgres: startup   recovering 00010003FC790015
> > ...
> >
> > And it cannot keep up and my replication cannot start since it is 2 days
> > behind the master... The replica has the same SSD disks as the master.
> >
> > Is there a better way to do this? How to speed up recovering of WAL
> files?
> > I have increased shared_buffers as much as I can... Is there something
> > that I miss from the recovery process?
> >
> > I do not have problems setting up replications for the first time for
> > small database (10 GB - 100 GB), but for 25 TB I can not set the
> > replication, because of this lag.
> >
>
> --
> Angular momentum makes the world go 'round.
>
>
>