Re: Content Sync Refresh Required

Geert Hendrickx Tue, 28 Feb 2023 07:46:07 -0800

On Tue, Feb 28, 2023 at 16:12:25 +0100, Ondřej Kuzník wrote:
> On Tue, Feb 28, 2023 at 01:42:20PM +0100, Geert Hendrickx wrote:
> > We've had (and still have) this issue with large attributes and large 
> > multi-valued attributes with Zimbra (see previous discussion with Quanah),
> > where we applied sortvals and multival.  But in this scenario it's not the
> > case; all objects are of similar small size, with (mostly) single valued
> > attributes.  Yet our freelist reaches 200K+ free pages during periods with
> > heavy updates (mostly deletes/adds), which has a measurable impact on write
> > performance.
> 
> Hi Geert,
> are you sure it's the freelist and not the random access as pages become
> non-contiguous? The former would represent a constant decline in
> performance where the latter would eventually taper from high (best
> case) performance to regular performance you should be able to expect?
> Have you been able to rule that out?



mdb_copy -c fixes it, so I assume it's only the freelist size, not actual
fragmentation (mdb_copy doesn't reorder any data, right?).
Random access shouldn't matter much, as it's all on an SSD-based SAN.

Also, the decline isn't constant.  In normal operations, the freelist stays
fairly small (it is "consumed" all the time by regular updates).  Only
during batch updates (because of a currently ongoing migration) it explodes
and doesn't get "consumed" in time for the next batch update, and causes
performance degradation for subsequent batches.


> After you kill accesslog, you disable deltasync. Since you're also
> restarting, the provider has no data on how to replay anything and needs
> to send the list of all entries (at least their UUIDs). This is
> expensive and slow. Replication seems to proceed in slow leaps that cost
> a *lot* of processing on the provider and a fair amount of bandwidth.
> Isn't that what you're seeing?


Yes, this is indeed the case and it keeps doing that as long as updates are
coming in.  Once there are no updates for a full refresh cycle (eg. during
the night, or because we pause updates) it is able to revert to delta sync.


> After you kill accesslog, you disable deltasync. 

This is the essential part.  I always assumed it could proceed with
deltasync of the provider and replica have the same contextCSN, even with
an empty accesslog.

This probably went un-noticed for a long time since dropping the accesslog
on a non-active master causes no (visible) delays.  Only on an active master.


Thanks for your insights, things are much clearer now, and we have adjusted
our processes accordingly.


        Geert

Re: Content Sync Refresh Required

Reply via email to