Re: PostgreSQL extension for processing Graph queries (Apache AGE)

2022-11-30 Thread Alvaro Herrera
On 2022-Nov-29, Young Seung Andrew Ko wrote:

> Hello PostgreSQL users,
> 
> https://github.com/apache/age
> Apache AGE is an Apache 2-licensed open source PostgreSQL extension for
> storing Graph data.
> 
> The current version of Apache AGE is to enable PostgreSQL users to use
> Neo4j's openCypher-based graph queries in unison with existing relational
> tables

Can you show some examples of this feature in action?  What sort of data
would I use it for, how would I query it?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Finding free time period on non-continous tstzrange field values

2022-11-30 Thread Amitabh Kant
Hi

Given the following table, how do I find free time period.

CREATE TABLE test_time_range (
id SERIAL PRIMARY KEY,
time_range tstzrange);


Insert into test_time_range(time_range) values('[2022-11-28 08:00:00,
2022-11-28 20:00:00]');
Insert into test_time_range(time_range) values('[2022-11-29 12:30:00,
2022-11-29 22:00:00]');
Insert into test_time_range(time_range) values('[2022-11-30 05:00:00,
2022-11-30 19:00:00]');


In the above example, I would like the query to return something like this:
"2022-11-28 20:01:00 2022-11-29 11:29:00"
"2022-11-29 22:01:00 2022-11-30 04:59:00"


Apologies if this is a dumb question, but trying to use range for the first
time , and can't get my head around it.

Using PG14, can upgrade to 15 if that matters.


Amitabh


Re: Finding free time period on non-continous tstzrange field values

2022-11-30 Thread Marcos Pegoraro
>
> Given the following table, how do I find free time period.
>

https://www.crunchydata.com/blog/better-range-types-in-postgres-14-turning-100-lines-of-sql-into-3


Re: Finding free time period on non-continous tstzrange field values

2022-11-30 Thread Amitabh Kant
On Wed, Nov 30, 2022 at 7:20 PM Marcos Pegoraro  wrote:

> Given the following table, how do I find free time period.
>>
>
>
> https://www.crunchydata.com/blog/better-range-types-in-postgres-14-turning-100-lines-of-sql-into-3
>

Thanks Marcos .. Had seen this earlier but somehow slipped my mind to use
it for implementation. Will try implementing using tstzmultirange.


postgres large database backup

2022-11-30 Thread Atul Kumar
Hi,

I have a 10TB database running on postgres 11 version running on centos 7
"on premises", I need to schedule the backup of this database in a faster
way.

The scheduled backup will be used for PITR purposes.

So please let me know how I should do it in a quicker backup for my 10TB
database ? Is there any tool to take backups and subsequently incremental
backups in a faster way and restore it for PITR in a faster way when
required.

What should be the exact approach for scheduling such backups so that it
can be restored in a faster way ?



Regards.


Re: postgres large database backup

2022-11-30 Thread Michael Loftis
On Wed, Nov 30, 2022 at 8:40 AM Atul Kumar  wrote:
>
> Hi,
>
> I have a 10TB database running on postgres 11 version running on centos 7 "on 
> premises", I need to schedule the backup of this database in a faster way.
>
> The scheduled backup will be used for PITR purposes.
>
> So please let me know how I should do it in a quicker backup for my 10TB 
> database ? Is there any tool to take backups and subsequently incremental 
> backups in a faster way and restore it for PITR in a faster way when required.
>
> What should be the exact approach for scheduling such backups so that it can 
> be restored in a faster way ?

Faster than *what*?

If speed is the primary criteria, filesystem snapshots by using
pg_start_backup() to tell the DB cluster to be in a binary ready
backup mode, snapshot, then pg_stop_backup(), capture the WALs
generated alongside your FS snapshot, all on the same machine or
shared storage would be the fastest to restore.   To restore, bring
back the old snapshot+ the WALs captured with the DB shutdown/stopped,
startup is normal "crash recovery" or you can select PITR/LSN in the
short pg_start_backup() ... pg_stop_backup() window.  If you're
properly archiving WALs outside of JUST the full backup you can PITR
to any point after the full backup snapshot, but the more
transactions/WAL it has to process to get to the desired point the
longer the recovery.

pgbackrest can backup a PG cluster in multiple ways (including taking
a base backup while/and actively streaming WALs or being the WAL
archiver), and a restore on the same machine as the backup repository
would be basically limited by I/O (well, unless you've got all NVMe,
then CPU, bus, or memory bandwidth constraints become the limiting
factor).

Basically no matter how you backup, 10TB takes a long time to copy,
and except in the "local FS snapshot" method I outlined above, that's
going to be your limiting factor, is how fast you can move the data
back to where you need it.

For critical DBs of this nature I've actually done almost exactly the
method I just outlined, only the backup/snapshot process happens on a
replica.  *NORMAL* failure recovery in that replicated cluster is by
failovers, but, for actual backup restore due to disaster or need to
go back in time (which is...extremely rare...) there's some manual
intervention to bring up a snapshot and play back WALs to the point in
time that we want the DB cluster.

>
>
>
> Regards.



-- 

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler




Re: postgres large database backup

2022-11-30 Thread Christophe Pettus



> On Nov 30, 2022, at 07:40, Atul Kumar  wrote:
> So please let me know how I should do it in a quicker backup for my 10TB 
> database ? Is there any tool to take backups and subsequently incremental 
> backups in a faster way and restore it for PITR in a faster way when required.

For an initial backup, you'll need to copy 10TB to another medium; there's no 
getting around that.  pgBackRest supports incremental and differential backups, 
which are especially useful if there are tables that are largely for archival 
purposes and do not get frequently modified:

https://pgbackrest.org



Re: postgres large database backup

2022-11-30 Thread Vijaykumar Jain
On Wed, Nov 30, 2022, 9:10 PM Atul Kumar  wrote:

> Hi,
>
> I have a 10TB database running on postgres 11 version running on centos 7
> "on premises", I need to schedule the backup of this database in a faster
> way.
>
> The scheduled backup will be used for PITR purposes.
>
> So please let me know how I should do it in a quicker backup for my 10TB
> database ? Is there any tool to take backups and subsequently incremental
> backups in a faster way and restore it for PITR in a faster way when
> required.
>
> What should be the exact approach for scheduling such backups so that it
> can be restored in a faster way ?
>


We had a 96 cpu, 385gb ram, nvme storage and 10g network baremetal server.
We used pgbackrest for full backup.
It supports pitr and differential backup.

28tb db took 2.5 hours for backup on remote storage, and restore from the
remote storage  took 3.5 hours when immediately restored (a lot of time is
later due to wal replay to catch up)

pg_basebackup took 9 hours.


Re: postgresql 13.1: precision of spatial operations

2022-11-30 Thread Brad White

On 11/30/2022 9:48 AM, Вадим Самохин wrote:


Thank you so much Ivan, it worked!
Can you give any more detail on which approach you took, for the sake of 
future followers?





Re: postgres large database backup

2022-11-30 Thread Mladen Gogala

On 11/30/22 10:40, Atul Kumar wrote:

Hi,

I have a 10TB database running on postgres 11 version running on 
centos 7 "on premises", I need to schedule the backup of this database 
in a faster way.


The scheduled backup will be used for PITR purposes.

So please let me know how I should do it in a quicker backup for my 
10TB database ? Is there any tool to take backups and subsequently 
incremental backups in a faster way and restore it for PITR in a 
faster way when required.


What should be the exact approach for scheduling such backups so that 
it can be restored in a faster way ?


Well, that depends on your configuration details. Is your DB located on 
a SAN device or JBOD? If it's SAN, than it can do snapshots. All 
enterprise backup utilities like NetBackup, TSM, Avamar or Commvault 
support snapshots. They all can do full and incremental backups and they 
can also do PITR. Your database is an enterprise sized database and 
needs an enterprise level tool. If you want a freebie, pgbackrest is the 
best thing no money can buy. Pgbackrest supports parallelism, full and 
incremental backups.



--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: postgres large database backup

2022-11-30 Thread Ron

On 11/30/22 11:41, Vijaykumar Jain wrote:


On Wed, Nov 30, 2022, 9:10 PM Atul Kumar  wrote:

Hi,

I have a 10TB database running on postgres 11 version running on
centos 7 "on premises", I need to schedule the backup of this database
in a faster way.

The scheduled backup will be used for PITR purposes.

So please let me know how I should do it in a quicker backup for my
10TB database ? Is there any tool to take backups and subsequently
incremental backups in a faster way and restore it for PITR in a
faster way when required.

What should be the exact approach for scheduling such backups so that
it can be restored in a faster way ?



We had a 96 cpu, 385gb ram, nvme storage and 10g network baremetal server.
We used pgbackrest for full backup.
It supports pitr and differential backup.

28tb db took 2.5 hours for backup on remote storage, and restore from the 
remote storage  took 3.5 hours when immediately restored (a lot of time is 
later due to wal replay to catch up)


Another vote for pgbackrest.  It supports AES encryption, and multiple forms 
of compression.


--
Angular momentum makes the world go 'round.

Re: postgres large database backup

2022-11-30 Thread Hannes Erven

Am 30.11.22 um 20:01 schrieb Mladen Gogala:

On 11/30/22 10:40, Atul Kumar wrote:

Hi,

I have a 10TB database running on postgres 11 version running on 
centos 7 "on premises", I need to schedule the backup of this 
database in a faster way.


The scheduled backup will be used for PITR purposes.

Well, that depends on your configuration details. Is your DB located 
on a SAN device or JBOD?




You could also use a filesystem that can do atomic snapshots - like ZFS.

However, I'm wondering why none of the previous respondents mentioned 
it? Sure, ZFS may have its own performance implications... but my 
experience is that it makes "safety snapshots" and "on-demand throw-away 
testing environments" very convenient to use.



Best regards,

    -hannes






Re: postgres large database backup

2022-11-30 Thread Mladen Gogala

On 11/30/22 18:19, Hannes Erven wrote:
You could also use a filesystem that can do atomic snapshots - like ZFS. 


Uh, oh. Not so sure about that. Here is a page from the world of the big 
O: https://blog.docbert.org/oracle-on-zfs/


However, similar can be said about ZFS. ZFS snapshots will slow down the 
I/O considerably. I would definitely prefer snapshots done in hardware 
and not in software.  My favorite file systems, depending on the type of 
disk, are F2FS and XFS.


--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: postgres large database backup

2022-11-30 Thread Michael Loftis
On Wed, Nov 30, 2022 at 18:03 Mladen Gogala  wrote:

> On 11/30/22 18:19, Hannes Erven wrote:
>
> You could also use a filesystem that can do atomic snapshots - like ZFS.
>
> Uh, oh. Not so sure about that. Here is a page from the world of the big
> O: https://blog.docbert.org/oracle-on-zfs/
>
> However, similar can be said about ZFS. ZFS snapshots will slow down the
> I/O considerably. I would definitely prefer snapshots done in hardware and
> not in software.  My favorite file systems, depending on the type of disk,
> are F2FS and XFS.
>

ZFS snapshots don’t typically have much if  any performance impact versus
not having a snapshot (and already being on ZFS) because it’s already doing
COW style semantics.

Postgres write performance using ZFS is difficult because it’s super
important to match up the underlying I/O sizes to the device/ZFS ashift,
the ZFS recordsize, and the DB’s page/wal page sizes though, but not
getting this right also cause performance issues without any snapshots,
because again COW. If you’re constantly breaking a record block or sector
there’s going to be a big impact. It won’t be any worse (in my own testing)
regardless of if you have snapshots or not. Snapshots on ZFS don’t  cause
any crazy write amplification by themselves (I’m not sure they cause any
extra writes at all, I’d have to do some sleuthing)

ZFS will yes be slower than a raw disk (but that’s not an option for Pg
anyway), and may or may not be faster than a different  filesystem on a HW
RAID volume or storage array volume. It absolutely takes more
care/clue/tuning to get Pg write performance on ZFS, and ZFS does duplicate
some of Pg’s resiliency so there is duplicate work going on.

I’d say really that 2016 article is meaningless as ZFS, Oracle, and
Postgres have all evolved dramatically in six years. Even further since
there’s nothing remotely like ASM for Postgres.

>
> --
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217https://dbwhisperer.wordpress.com
>
> --

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler


Re: postgres large database backup

2022-11-30 Thread Ron

On 11/30/22 19:41, Michael Loftis wrote:



On Wed, Nov 30, 2022 at 18:03 Mladen Gogala  wrote:

On 11/30/22 18:19, Hannes Erven wrote:
You could also use a filesystem that can do atomic snapshots - like ZFS. 


Uh, oh. Not so sure about that. Here is a page from the world of the
big O: https://blog.docbert.org/oracle-on-zfs/

However, similar can be said about ZFS. ZFS snapshots will slow down
the I/O considerably. I would definitely prefer snapshots done in
hardware and not in software. My favorite file systems, depending on
the type of disk, are F2FS and XFS.


ZFS snapshots don’t typically have much if  any performance impact versus 
not having a snapshot (and already being on ZFS) because it’s already 
doing COW style semantics.


Postgres write performance using ZFS is difficult because it’s super 
important to match up the underlying I/O sizes to the device/ZFS ashift, 
the ZFS recordsize, and the DB’s page/wal page sizes though, but not 
getting this right also cause performance issues without any snapshots, 
because again COW. If you’re constantly breaking a record block or sector 
there’s going to be a big impact. It won’t be any worse (in my own 
testing) regardless of if you have snapshots or not. Snapshots on ZFS 
don’t  cause any crazy write amplification by themselves (I’m not sure 
they cause any extra writes at all, I’d have to do some sleuthing)


ZFS will yes be slower than a raw disk (but that’s not an option for Pg 
anyway), and may or may not be faster than a different  filesystem on a HW 
RAID volume or storage array volume. It absolutely takes more 
care/clue/tuning to get Pg write performance on ZFS, and ZFS does 
duplicate some of Pg’s resiliency so there is duplicate work going on.


I wonder what percentage of /Big Databases/ (like Op's and Vijaykumar's) are 
still on physical servers, as opposed to VMs connected to SANs.  Even many 
physical servers are connected to SANs.  (That is, of course, in the dreaded 
Enterprise environment.)


--
Angular momentum makes the world go 'round.

Librephotos on Synology DSM with Docker Problem with PostGres

2022-11-30 Thread Preston Zimmerer
Hello, looking for some help.

I am trying to put Librephotos on my NAS that's running DSM using Docker
but I'm getting this error on the postgres image/container
I have the "librephotos-docker-main" folder placed into the docker folder
on my host machine, and I tried creating a directory "db" in the docker
folder as well. That didn't work.

Does anyone know where this directory should be and why it's not putting it
in place on its own?
[image: image.png]
Here is that container section of the "docker-compose.yml" also

db:
image: postgres:13
container_name: db
restart: unless-stopped
environment:
  - POSTGRES_USER=${dbUser}
  - POSTGRES_PASSWORD=${dbPass}
  - POSTGRES_DB=${dbName}
volumes:
  - ${data}/db:/var/lib/postgresql/data
command: postgres -c fsync=off -c synchronous_commit=off -c
full_page_writes=off -c random_page_cost=1.0
#Checking health of Postgres db
healthcheck:
  test: psql -U ${dbUser} -d ${dbName} -c "SELECT 1;"
  interval: 5s
  timeout: 5s
  retries: 5


Re: Finding free time period on non-continous tstzrange field values

2022-11-30 Thread Amitabh Kant
On Wed, Nov 30, 2022 at 7:13 PM Amitabh Kant  wrote:

> Hi
>
> Given the following table, how do I find free time period.
>
> CREATE TABLE test_time_range (
> id SERIAL PRIMARY KEY,
> time_range tstzrange);
>
>
> Insert into test_time_range(time_range) values('[2022-11-28 08:00:00,
> 2022-11-28 20:00:00]');
> Insert into test_time_range(time_range) values('[2022-11-29 12:30:00,
> 2022-11-29 22:00:00]');
> Insert into test_time_range(time_range) values('[2022-11-30 05:00:00,
> 2022-11-30 19:00:00]');
>
>
> In the above example, I would like the query to return something like this:
> "2022-11-28 20:01:00 2022-11-29 11:29:00"
> "2022-11-29 22:01:00 2022-11-30 04:59:00"
>
>
> Apologies if this is a dumb question, but trying to use range for the
> first time , and can't get my head around it.
>
> Using PG14, can upgrade to 15 if that matters.
>
>
> Amitabh
>

Based on Marcos suggestions (
https://www.crunchydata.com/blog/better-range-types-in-postgres-14-turning-100-lines-of-sql-into-3
), I tried the following query :

SELECT tstzmultirange(tstzrange('2022-11-25 00:00:00', '2022-11-30
00:00:00', '[]')) -
  range_agg(time_range) AS availability
FROM test_time_range
WHERE time_range && tstzrange('2022-11-25 00:00:00', '2022-11-30 00:00:00',
'[]');

but then I receive the following error. My guess is I need to cast the
tstzrange output, but can't seem to find the correct cast.

ERROR: function tstzmultirange(tstzrange) does not exist
LINE 1: SELECT tstzmultirange(tstzrange('2022-11-25 00:00:00', '2022... ^
HINT: No function matches the given name and argument types. You might need
to add explicit type casts. SQL state: 42883 Character: 8