> On Feb 25, 2026, at 14:52, Chao Li <[email protected]> wrote: > > > >> On Feb 20, 2026, at 07:41, Melanie Plageman <[email protected]> >> wrote: >> >> On Wed, Jan 14, 2026 at 6:35 PM Melanie Plageman >> <[email protected]> wrote: >>> >>> Thanks! v13 attached. >> >> I've added background writer write combining and then spent some time >> refactoring and benchmarking this over the last few weeks, and I've >> realized I won't be able to get it in shape for Postgres 19. I've >> attached version 14 here with my WIP code so I can come back to it in >> the next release. >> >> 0012 is an invasive refactor of GetVictimBuffer() and all the >> functions it calls that needs to be split apart into smaller commits >> and redistributed throughout the other patches in the set. Other than >> that, there are a number of todos both in the code and from a >> benchmarking perspective: >> >> - Probe for preceding as well as following blocks when looking for >> blocks to combine >> - Experiment with different batch sizes >> - Enable and test bulkread and vacuum write combining >> >> I also periodically get this troubling warning: "WARNING: invalid >> page in block 2 of relation "base/16384/2608_fsm"; zeroing out page". >> So clearly there's at least one bug somewhere. >> >> FWIW benchmarks showed a large improvement from backend and bgwriter >> write combining, but I'll provide more detailed output from benchmarks >> when I post a more reviewable patch set. >> >> 0001 and 0002 are completely independent cleanup that should be pushed >> anyway, so I might do that, but I didn't want to do so without having >> posted on the list at least once. >> >> - Melanie >> <v14-0001-Make-FlushUnlockedBuffer-use-its-parameters.patch><v14-0002-Make-ScheduleBufferTagForWriteback-static.patch><v14-0003-Streamline-buffer-rejection-for-bulkreads-of-unl.patch><v14-0004-Allow-PinBuffer-to-skip-increasing-usage-count.patch><v14-0005-Eagerly-flush-bulkwrite-strategy-ring.patch><v14-0006-Write-combining-for-BAS_BULKWRITE.patch><v14-0007-Add-database-Oid-to-CkptSortItem.patch><v14-0008-Implement-checkpointer-data-write-combining.patch><v14-0009-Remove-SyncOneBuffer-and-refactor-BgBufferSync.patch><v14-0010-Normal-backend-write-combining.patch><v14-0011-Add-write-combining-for-background-writer.patch><v14-0012-WIP-refactor.patch> > > A few comments for v14: > > 1 - 0001 > ``` > - FlushBuffer(buf, reln, IOOBJECT_RELATION, IOCONTEXT_NORMAL); > + FlushBuffer(buf, reln, io_object, io_context); > ``` > > This changes the hardcode IOOBJECT_RELATION to io_object when calling > FlushBuffer(). But FlushBuffer() itself still uses IOOBJECT_RELATION instead > of io_object, so the change will actually not take effective: > ``` > pgstat_count_io_op_time(IOOBJECT_RELATION, io_context, > IOOP_WRITE, io_start, 1, BLCKSZ); > ``` > > So, an update in FlushBuffer() is also needed. > > 2 - 0003 - freelist.c > ``` > bool > StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool > from_ring) > { > + Assert(strategy); > + > /* We only do this in bulkread mode */ > if (strategy->btype != BAS_BULKREAD) > return false; > @@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, > BufferDesc *buf, bool from_r > strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf)) > return false; > > + Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf))); > + Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED)); > ``` > > I don’t quite understand Assert(!(pg_atomic_read_u64(&buf->state) & > BM_LOCKED)); > > The comment says “the buffer header spinlock must not be held,” but I doubt > this Assert actually ensures that. Since BM_LOCKED is just a bit in a shared > state, isn't it possible for a concurrent backend to grab the lock right when > we’re checking it? > > If a concurrent process locks the header a split-second before the Assert, > we’ll get a false-positive crash in a dev environment even though the code is > fine. If they lock it a split-second after, then the Assert didn't really > catch anything. It feels like a race condition either way. > > I think the only way to truly ensure a spinlock is "not held" is to actually > acquire it. If we’re just checking the bit like this, it’s just a snapshot > that might be stale by the time the instruction finishes. What was the > specific worry here—a self-deadlock, or something else? > > 3 - 0004 > ``` > +typedef enum BufferUsageCountChange > +{ > + BUC_ZERO, > + BUC_MAX_ONE, > + BUC_ONE, > +} BufferUsageCountChange; > ``` > > Can we add some brief comments to explain every enum item? I feel hard to > guess the meaning from the names without further reading the code. > > 4 - 0005 - bufmgr.c > ``` > +/* > + * Prepare bufdesc for eager flushing. > + * > + * Given bufnum, return the buffer descriptor of the buffer to eagerly flush, > + * pinned and locked and with BM_IO_IN_PROGRESS set, or NULL if this buffer > + * does not contain a block that should be flushed. > + * > + * If returning a buffer, also return its LSN. > + */ > +static BufferDesc * > +PrepareOrRejectEagerFlushBuffer(Buffer bufnum) > ``` > > This header comment looks stale. > > * It says “and with BM_IO_IN_PROGRESS set”, but I don’t see where > BM_IO_IN_PROGRESS is set in this function. > * It says “also return its LSN”, but I don’t see LSN is returned. > > 5 - 0006 > ``` > + /* Start IO on the first buffer */ > + if (!StartBufferIO(buf_hdr, false, false)) > + goto clean; > ``` > > This failure branch feels like leaking the content lock. The buffer was > already share-locked via BufferLockConditional() earlier, and I don’t see the > “clean" path unlock that content lock before returning. > > 6 - 0009 > ``` > + FlushUnlockedBuffer(bufHdr, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL); > + UnpinBuffer(bufHdr); > + > + ScheduleBufferTagForWriteback(wb_context, IOCONTEXT_NORMAL, &bufHdr->tag); > ``` > > bufHdr is unlocked and unpinned, it might be unsafe to still use bufHdr->tag. > I think we can copy bufHdr->tag to a local variable before unlock the buffer. > > ~~ My brain is stuck, I will continue 0010 tomorrow ~~ > 7 - 0010 ``` - * batch_limit is the largest batch we are allowed to construct given the - * remaining blocks in the table, the number of available pins, and the - * current configuration. + * max_batch_size is the maximum number of blocks that can be combined into a + * single write in general. This function, based on the block number of start, + * will determine the maximum IO size for this particular write given how much + * of the file remains. max_batch_size is provided by the caller so it doesn't + * have to be recalculated for each write. ``` I don’t see the function FindStrategyFlushAdjacents() has a parameter named max_batch_size, but batch_limit is still there. 8 - 0010 ``` static void +FindFlushAdjacents(BufferDesc *batch_start, + uint32 batch_limit, + BufferWriteBatch *batch) +{ + BufferTag require; /* requested block's buffer tag */ + uint32 newHash; /* hash value for require */ + LWLock *newPartitionLock; /* buffer partition lock for it */ + int buf_id; + + /* create a tag so we can lookup the buffers */ + InitBufferTag(&require, &batch->reln->smgr_rlocator.locator, + batch->forkno, InvalidBlockNumber); + + for (; batch->n < batch_limit; batch->n++) + { + XLogRecPtr lsn; + + require.blockNum = batch->start + batch->n; + + Assert(BlockNumberIsValid(require.blockNum)); ``` When I first time read the Assert, I was confused how it can assume the block to be valid, then I realized that WriteBatchInit() ensures a safe “limit”. Maybe add a brief comment for the Assert. 9 - 0010 ``` + batch->bufdescs[batch->n] = + PrepareOrRejectEagerFlushBuffer(buf_id + 1, + &require, + newPartitionLock, + &lsn); + if (lsn > batch->max_lsn) + batch->max_lsn = lsn; + + if (batch->bufdescs[batch->n] == NULL) + break; ``` I think we can move if (batch->bufdescs[batch->n] == NULL) to before if (lsn > batch->max_lsn). This is a correctness issue, because PrepareOrRejectEagerFlushBuffer will set lsn to InvalidXLogRecPtr when returns NULL, but doing the NULL check early just feels more reasonable. 10 - 0010 ``` + * max_lsn may be updated if the provided buffer LSN exceeds the current max + * LSN. */ static BufferDesc * PrepareOrRejectEagerFlushBuffer(Buffer bufnum, BufferTag *require, + LWLock *buftable_lock, XLogRecPtr *lsn) ``` The function doesn’t have a parameter named max_lsn, is it “lsn”? 11 - 0010 ``` +FindFlushAdjacents(BufferDesc *batch_start, + uint32 batch_limit, + BufferWriteBatch *batch) ``` Looks like batch_start is not used at all in this function. 12 - 0011 ``` + * Callers specify if and by how much they want to bump the buffer's usage + * count. ``` I don’t get what this comment means. ~~ As 0012 is WIP, I didn't review it. ~~ Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
