On Wed, Sep 24, 2025 at 12:32 PM Bharath Rupireddy
<[email protected]> wrote:
>
> > On Wed, 2025-09-24 at 07:26 -0700, Bharath Rupireddy wrote:
> > > Right. Reading unflushed WAL buffers for replication was one of the
> > > motivations. But, in general, WALReadFromBuffers has more benefits
> > > since it lets WAL buffers act as a cache for reads, avoiding the need
> > > to re-read WAL from disk for (both physical and logical) replication.
> > > For example, it makes the use of direct I/O for WAL more realistic
> > > and
> > > can provide significant performance benefits [1].
>
> Thanks for looking into this. I did performance analysis with WAL directo I/O 
> to see how reading from WAL buffers affects walsenders: 
> https://www.postgresql.org/message-id/CALj2ACV6rS%2B7iZx5%2BoAvyXJaN4AG-djAQeM1mrM%3DYSDkVrUs7g%40mail.gmail.com.
>  Following is from that thread. Please let me know if you have any specific 
> cases in mind. I'm happy to run the same test for logical replication.
>
> It helps WAL DIO; since there's no OS
> page cache, using WAL buffers as read cache helps a lot. It is clearly
> evident from my experiment with WAL DIO patch [1], see the results [2]
> and attached graph. As expected, WAL DIO brings down the TPS, whereas
> WAL buffers read i.e. this patch brings it up.
>
> [2] Test case is an insert pgbench workload.
> clients HEAD | WAL DIO | WAL DIO & WAL BUFFERS READ | WAL BUFFERS READ
> 1 1404 1070 1424 1375
> 2 1487 796 1454 1517
> 4 3064 1743 3011 3019
> 8 6114 3556 6026 5954
> 16 11560 7051 12216 12132
> 32 23181 13079 23449 23561
> 64 43607 26983 43997 45636
> 128 80723 45169 81515 81911
> 256 110925 90185 107332 114046
> 512 119354 109817 110287 117506
> 768 112435 105795 106853 111605
> 1024 107554 105541 105942 109370
> 2048 88552 79024 80699 90555
> 4096 61323 54814 58704 61743

Thank you all for reviewing this. Please find the attached rebased
patch for further review.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
From b5f6fc083caaa3648f8abfdc370d0289e637931f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 20 Mar 2026 06:44:48 +0000
Subject: [PATCH v4] Use WALReadFromBuffers in more places

Commit 91f2cae introduced WALReadFromBuffers but used it only for
physical replication walsenders. There are several other callers
that use the read_local_xlog_page page_read callback, and logical
replication walsenders can also benefit from reading WAL from WAL
buffers using the new function. This commit extends the use of
WALReadFromBuffers to these callers.

Author: Bharath Rupireddy
Reviewed-by: Jingtang Zhang, Nitin Jadhav
Discussion: https://www.postgresql.org/message-id/CALj2ACVfF2Uj9NoFy-5m98HNtjHpuD17EDE9twVeJng-jTAe7A%40mail.gmail.com
---
 src/backend/access/transam/xlogutils.c | 23 +++++++-
 src/backend/replication/walsender.c    | 77 +++++++++++++++++---------
 2 files changed, 70 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 5fbe39133b8..c4c677f69fd 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -876,6 +876,7 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr,
 	int			count;
 	WALReadError errinfo;
 	TimeLineID	currTLI;
+	Size		bytesRead;
 
 	loc = targetPagePtr + reqLen;
 
@@ -995,9 +996,25 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr,
 		count = read_upto - targetPagePtr;
 	}
 
-	if (!WALRead(state, cur_page, targetPagePtr, count, tli,
-				 &errinfo))
-		WALReadRaiseError(&errinfo);
+	/* First attempt to read from WAL buffers */
+	bytesRead = WALReadFromBuffers(cur_page, targetPagePtr, count, currTLI);
+
+	/* If we still have bytes to read, get them from WAL file */
+	if (bytesRead < count)
+	{
+		if (!WALRead(state,
+					 cur_page + bytesRead,
+					 targetPagePtr + bytesRead,
+					 count - bytesRead,
+					 tli,
+					 &errinfo))
+		{
+			WALReadRaiseError(&errinfo);
+		}
+		bytesRead = count;		/* All requested bytes read */
+	}
+
+	Assert(bytesRead == count);
 
 	/* number of valid bytes in the buffer */
 	return count;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 08253103cb3..95255948eca 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1054,6 +1054,7 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 	WALReadError errinfo;
 	XLogSegNo	segno;
 	TimeLineID	currTLI;
+	Size		bytesRead;
 
 	/*
 	 * Make sure we have enough WAL available before retrieving the current
@@ -1091,16 +1092,29 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 	else
 		count = flushptr - targetPagePtr;	/* part of the page available */
 
-	/* now actually read the data, we know it's there */
-	if (!WALRead(state,
-				 cur_page,
-				 targetPagePtr,
-				 count,
-				 currTLI,		/* Pass the current TLI because only
+	/* First attempt to read from WAL buffers */
+	bytesRead = WALReadFromBuffers(cur_page, targetPagePtr, count, currTLI);
+
+	targetPagePtr += bytesRead;
+
+	/* If we still have bytes to read, get them from WAL file */
+	if (bytesRead < count)
+	{
+		if (!WALRead(state,
+					 cur_page + bytesRead,
+					 targetPagePtr,
+					 count - bytesRead,
+					 currTLI,	/* Pass the current TLI because only
 								 * WalSndSegmentOpen controls whether new TLI
 								 * is needed. */
-				 &errinfo))
-		WALReadRaiseError(&errinfo);
+					 &errinfo))
+		{
+			WALReadRaiseError(&errinfo);
+		}
+		bytesRead = count;		/* All requested bytes read */
+	}
+
+	Assert(bytesRead == count);
 
 	/*
 	 * After reading into the buffer, check that what we read was valid. We do
@@ -3219,7 +3233,7 @@ XLogSendPhysical(void)
 	Size		nbytes;
 	XLogSegNo	segno;
 	WALReadError errinfo;
-	Size		rbytes;
+	Size		bytesRead;
 
 	/* If requested switch the WAL sender to the stopping state. */
 	if (got_STOPPING)
@@ -3435,24 +3449,33 @@ XLogSendPhysical(void)
 	enlargeStringInfo(&output_message, nbytes);
 
 retry:
-	/* attempt to read WAL from WAL buffers first */
-	rbytes = WALReadFromBuffers(&output_message.data[output_message.len],
-								startptr, nbytes, xlogreader->seg.ws_tli);
-	output_message.len += rbytes;
-	startptr += rbytes;
-	nbytes -= rbytes;
-
-	/* now read the remaining WAL from WAL file */
-	if (nbytes > 0 &&
-		!WALRead(xlogreader,
-				 &output_message.data[output_message.len],
-				 startptr,
-				 nbytes,
-				 xlogreader->seg.ws_tli,	/* Pass the current TLI because
-											 * only WalSndSegmentOpen controls
-											 * whether new TLI is needed. */
-				 &errinfo))
-		WALReadRaiseError(&errinfo);
+	/* First attempt to read from WAL buffers */
+	bytesRead = WALReadFromBuffers(&output_message.data[output_message.len],
+								   startptr,
+								   nbytes,
+								   xlogreader->seg.ws_tli);
+
+	startptr += bytesRead;
+
+	/* If we still have bytes to read, get them from WAL file */
+	if (bytesRead < nbytes)
+	{
+		if (!WALRead(xlogreader,
+					 &output_message.data[output_message.len + bytesRead],
+					 startptr,
+					 nbytes - bytesRead,
+					 xlogreader->seg.ws_tli,	/* Pass the current TLI
+												 * because only
+												 * WalSndSegmentOpen controls
+												 * whether new TLI is needed. */
+					 &errinfo))
+		{
+			WALReadRaiseError(&errinfo);
+		}
+		bytesRead = nbytes;		/* All requested bytes read */
+	}
+
+	Assert(bytesRead == nbytes);
 
 	/* See logical_read_xlog_page(). */
 	XLByteToSeg(startptr, segno, xlogreader->segcxt.ws_segsize);
-- 
2.47.3

Reply via email to