date:20251205

Re: Client-server communication for FETCH

2025-12-05 Thread Jan Claeys

On Wed, 2025-12-03 at 14:45 -0500, Tim Fors wrote:
> Do you have any further thoughts re: addressing this requirement
> (client-side cursor support)? Is this a requirement that the PGSQL
> community would consider implementing in libpq? Without it, it seems
> like a huge impediment to adoption of PGSQL for COBOL applications,
> and as you probably know, COBOL applications are still very
> pervasive. (C programs using ECPG would of course hit the same issue,
> so it's not just a COBOL-specific problem.)

Sounds to me like this ought to be implemented in ECPG and/or its COBOL
equivalent, not in libpq?

That's where the "Embedded SQL" is implemented after all, and also the
equivalent of where Oracle implements its PREFETCH, I think?

And I suppose nobody would be opposed to such improvements, if someone
contributes them or pays for their development...

-- 
Jan Claeys

(please don't CC me when replying to the list)

Seeking guidance on extremely slow pg_restore despite strong I/O performance

2025-12-05 Thread MentionTheElephant

Hello,

I would greatly appreciate your insight into an issue where pg_restore
runs significantly slower than expected, even though the underlying
storage shows very high random write throughput. I am trying to
understand which PostgreSQL mechanisms or system layers I should
investigate next in order to pinpoint the bottleneck and improve
restore performance.

The central question is: What should I examine further to understand
why checkpoint processing becomes the dominant bottleneck during
restore, despite fsync=off, synchronous_commit=off, and excellent
random write latency?

Below is a detailed description of the environment, the behavior
observed, the steps I have already taken, and the research performed
so far.

During pg_restore, execution time remains extremely long: around 2+
hours using a custom-format dump and over 4 hours using directory
format. The machine consistently demonstrates high random write
performance (median latency ~5 ms, ~45k random write IOPS), yet
PostgreSQL logs show very long checkpoints where the write phase
dominates (hundreds to thousands of seconds). Checkpoints appear to
stall the entire restore process.

I have tested multiple combinations of dump formats (custom and
directory) and parallel jobs (j = 1, 12, 18). The restore duration
barely changes. This strongly suggests that the bottleneck is not
client-side parallelism but internal server behavior—specifically the
checkpoint write phase.

Example log excerpts show checkpoint write times consistently in the
range of 600–1100 seconds, with large numbers of buffers written (from
hundreds of thousands to over 1.6 million). Sync times remain
negligible because fsync is disabled, reinforcing the suspicion that
PostgreSQL's internal buffer flushing and write throttling mechanisms
are the source of slowdown, not WAL or filesystem sync.

Given that:

* Storage is fast,
* fsync and synchronous commits are disabled,
* full_page_writes is off,
* wal_level is minimal,
* autovacuum is off,
* the restore is the only workload,

I am trying to determine what further PostgreSQL internals or Linux
I/O mechanisms may explain why these checkpoints are taking orders of
magnitude longer than the device’s raw write characteristics would
suggest.

I am particularly looking for guidance on:

* Whether backend or checkpointer write throttling may still be
limiting write concurrency even during bulk restore,
* Whether XFS on Hyper-V VHDX + LVM + battery-backed SSD could
introduce any serialization invisible to raw I/O tests,
* Whether certain parameters (e.g., effective_io_concurrency,
maintenance_io_concurrency, wal_writer settings, combine limits,
io_uring behavior) could unintentionally reduce write throughput,
* Whether parallel pg_restore is inherently constrained by global
buffer flushing behavior,
* Any other PostgreSQL mechanisms that could cause prolonged
checkpoint write durations even with crash-safety disabled.

Below are the configuration values and environment details referenced above.

Machine:
Hyper-V VM
24 vCPU
80 GB RAM
Ubuntu 24.04.3 (kernel 6.8.0-88)
PostgreSQL 18.1

Database size:
~700 GB across two tablespaces on separate disks (freshly restored)

Storage layout:
Each disk is its own VHDX
LVM on battery-backed SSD array
XFS for PGDATA
Barriers disabled

Random write performance (steady state):
Median latency: 5.1 ms
IOPS: ~45.6k

Restore tests:
pg_restore custom format: ~2h+
pg_restore directory format: ~4h+
Parallelism tested with j = 1, 12, 18, 24

Representative checkpoint log entries:
(write phases ranging 76–1079 seconds, buffer writes up to 1.6M)

postgresql.conf (relevant parts):
shared_buffers = 20GB
work_mem = 150MB
maintenance_work_mem = 8GB
effective_io_concurrency = 1
maintenance_io_concurrency = 1
io_max_combine_limit = 512kB
io_combine_limit = 1024kB
io_method = io_uring

fsync = off
synchronous_commit = off
wal_sync_method = fdatasync
full_page_writes = off
wal_compression = lz4

checkpoint_timeout = 60min
checkpoint_completion_target = 0.9
max_wal_size = 80GB
min_wal_size = 10GB

effective_cache_size = 65GB
autovacuum = off
max_locks_per_transaction = 256

If anyone has encountered similar behavior or can recommend specific
PostgreSQL subsystems, kernel settings, or I/O patterns worth
investigating, I would be very grateful for advice. My main goal is to
understand why checkpoint writes are so slow relative to the
hardware’s demonstrated capabilities, and how to safely accelerate the
restore workflow.

Thank you in advance for any guidance.

Re: Seeking guidance on extremely slow pg_restore despite strong I/O performance

2025-12-05 Thread pg254kl

If the dump was taken with pd_dump -Fd and pg_restore -j has no effect on 
restore time, that’s a good clue.
You can start with testing deferring checkpoints, by setting wal_max_size = 1TB 
and checkpoint_timeout = 10h, and see how this affects the pg_restore (should 
be limited by WAL write throughput).  Perhaps increase wal_buffers to 128MB.  
The idea being to identify (by elimination) the write chock-point, before 
starting to tune for it.

Irrelevant for your problem, you should set the *_io_concurrency to 200 since 
you use SSDs.
pg_restore rebuilds indices so also make sure the settings relevant to index 
building are set appropriately (see max_parallel_* and *_io_concurrency)

Kiriakos Georgiou

> On Dec 5, 2025, at 5:30 AM, MentionTheElephant - MentionTheElephant at 
> gmail.com  wrote:
> 
> Hello,
> 
> I would greatly appreciate your insight into an issue where pg_restore
> runs significantly slower than expected, even though the underlying
> storage shows very high random write throughput. I am trying to
> understand which PostgreSQL mechanisms or system layers I should
> investigate next in order to pinpoint the bottleneck and improve
> restore performance.
> 
> The central question is: What should I examine further to understand
> why checkpoint processing becomes the dominant bottleneck during
> restore, despite fsync=off, synchronous_commit=off, and excellent
> random write latency?
> 
> Below is a detailed description of the environment, the behavior
> observed, the steps I have already taken, and the research performed
> so far.
> 
> During pg_restore, execution time remains extremely long: around 2+
> hours using a custom-format dump and over 4 hours using directory
> format. The machine consistently demonstrates high random write
> performance (median latency ~5 ms, ~45k random write IOPS), yet
> PostgreSQL logs show very long checkpoints where the write phase
> dominates (hundreds to thousands of seconds). Checkpoints appear to
> stall the entire restore process.
> 
> I have tested multiple combinations of dump formats (custom and
> directory) and parallel jobs (j = 1, 12, 18). The restore duration
> barely changes. This strongly suggests that the bottleneck is not
> client-side parallelism but internal server behavior—specifically the
> checkpoint write phase.
> 
> Example log excerpts show checkpoint write times consistently in the
> range of 600–1100 seconds, with large numbers of buffers written (from
> hundreds of thousands to over 1.6 million). Sync times remain
> negligible because fsync is disabled, reinforcing the suspicion that
> PostgreSQL's internal buffer flushing and write throttling mechanisms
> are the source of slowdown, not WAL or filesystem sync.
> 
> Given that:
> 
> * Storage is fast,
> * fsync and synchronous commits are disabled,
> * full_page_writes is off,
> * wal_level is minimal,
> * autovacuum is off,
> * the restore is the only workload,
> 
> I am trying to determine what further PostgreSQL internals or Linux
> I/O mechanisms may explain why these checkpoints are taking orders of
> magnitude longer than the device’s raw write characteristics would
> suggest.
> 
> I am particularly looking for guidance on:
> 
> * Whether backend or checkpointer write throttling may still be
> limiting write concurrency even during bulk restore,
> * Whether XFS on Hyper-V VHDX + LVM + battery-backed SSD could
> introduce any serialization invisible to raw I/O tests,
> * Whether certain parameters (e.g., effective_io_concurrency,
> maintenance_io_concurrency, wal_writer settings, combine limits,
> io_uring behavior) could unintentionally reduce write throughput,
> * Whether parallel pg_restore is inherently constrained by global
> buffer flushing behavior,
> * Any other PostgreSQL mechanisms that could cause prolonged
> checkpoint write durations even with crash-safety disabled.
> 
> Below are the configuration values and environment details referenced above.
> 
> Machine:
> Hyper-V VM
> 24 vCPU
> 80 GB RAM
> Ubuntu 24.04.3 (kernel 6.8.0-88)
> PostgreSQL 18.1
> 
> Database size:
> ~700 GB across two tablespaces on separate disks (freshly restored)
> 
> Storage layout:
> Each disk is its own VHDX
> LVM on battery-backed SSD array
> XFS for PGDATA
> Barriers disabled
> 
> Random write performance (steady state):
> Median latency: 5.1 ms
> IOPS: ~45.6k
> 
> Restore tests:
> pg_restore custom format: ~2h+
> pg_restore directory format: ~4h+
> Parallelism tested with j = 1, 12, 18, 24
> 
> Representative checkpoint log entries:
> (write phases ranging 76–1079 seconds, buffer writes up to 1.6M)
> 
> postgresql.conf (relevant parts):
> shared_buffers = 20GB
> work_mem = 150MB
> maintenance_work_mem = 8GB
> effective_io_concurrency = 1
> maintenance_io_concurrency = 1
> io_max_combine_limit = 512kB
> io_combine_limit = 1024kB
> io_method = io_uring
> 
> fsync = off
> synchronous_commit = off
> wal_sync_method = fdatasync
> full_page_writes = off
> wal_compression = lz4
> 
> checkpoint_

Re: Client-server communication for FETCH

Seeking guidance on extremely slow pg_restore despite strong I/O performance

Re: Seeking guidance on extremely slow pg_restore despite strong I/O performance

3 matches

Site Navigation

Mail list logo

Footer information