Re: Implications of duplicate UUIDs on a server

2025-04-30 Thread Andreas Stieger



On 2025-04-29 18:53, LWChris wrote:
Therefore we suspect it was some kind of caching issue in the SVN 
server (WANdisco) due to same path same UUID same commit number; after 
restarting the SVN server, the issue went away. But I don't know if 
WANdisco is a "typical server implementation", or if the issue was 
something deeper, or if the issues are related at all or pure 
coincidence, etc.



WANdisco is not a typical Subversion server implementation. For 
synchronous multi-site replication (as opposed to svn sync which is 
asynchronous), it provides a proxy layer with a consensus protocol  
(think PAXOS, wsrep, raft). Notable the replication requires the 
representation of candidate transaction into a serialzied format that. 
It is conceivable that non-unique UUIDs may cause hick-ups, but I would 
not exclude the possibility that this is also a general problem of plain 
svn.


Andreas



Re: Implications of duplicate UUIDs on a server

2025-04-30 Thread Doug Robinson
Andreas, et. al.:

On Wed, Apr 30, 2025 at 4:17 AM Andreas Stieger 
wrote:

>
> On 2025-04-29 18:53, LWChris wrote:
> > Therefore we suspect it was some kind of caching issue in the SVN
> > server (WANdisco) due to same path same UUID same commit number; after
> > restarting the SVN server, the issue went away. But I don't know if
> > WANdisco is a "typical server implementation", or if the issue was
> > something deeper, or if the issues are related at all or pure
> > coincidence, etc.
>
> WANdisco is not a typical Subversion server implementation. For
> synchronous multi-site replication (as opposed to svn sync which is
> asynchronous), it provides a proxy layer with a consensus protocol
> (think PAXOS, wsrep, raft). Notable the replication requires the
> representation of candidate transaction into a serialzied format that.
> It is conceivable that non-unique UUIDs may cause hick-ups, but I would
> not exclude the possibility that this is also a general problem of plain
> svn.
>

First, there are 2 flavors of WANdisco Subversion bits:

1. "vanilla"
2. "replicated"

The "vanilla" are exactly the unmodified Apache Subversion source
distribution compiled up, tested and packaged for each supported
OS flavor.  They are distributed free of charge to anyone and
everyone.  See here [0].

The "replicated" are EXACTLY a typical Subversion server implementation
up until the point of the actual transaction commit execution.
Nearly everything else is identical.  Specifically, the READ-ONLY
side of the server is COMPLETELY IDENTICAL to typical Subversion.
That includes ALL of the Apache caching.  Nothing that we do touches
the read-path - that was a critical part of the design principle.

In terms of the WANdisco "WRITE PATH", there are the normal Subversion
repo-UUIDs for the repos, but they play an almost negligible part
in the update process.  Each repository is associated with its own
"distributed state machine" (DSM) that has its own UUID (never
repeated) and it is that DSM that is specifically tasked making the
updates occur.  Many of our customers have a LOT of non-unique
repository UUIDs and I have never seen a single hiccup due to that
sort of issue from the perspective of repository updates.

I have not yet seen in this discussion any disclosure of the version
of Subversion nor Apache.  Could that information be added to the
conversation?  It makes a difference since in earlier versions of
Subversion there was only a single UUID for each repository; now
there are 2.  Part of that, IIRC, was to enable some better cache
invalidation so that {path,repo-UUID} was not the only distinguishing
factor since it was causing confusion in the long-lived Apache
process (I'm sure someone will correct me if I'm wrong).  So the
answer of "path-only" or "{path,repo-UUID}" or
"{path,repo-UUID,repo-UUID2}" is likely dependent on the version
of Subversion (or at least some of the conversations I've read in
the past have made it seem that way).

All of that said, the use case that was enumerated previously is
definitely broken in terms of Subversion itself (nothing to do with
WANdisco).  By creating the same repository on-disk in the same
path using the same repo-UUID and then populating it with even
remotely similar contents will definitely cause the Apache cache
to be confused.  The same thing will happen if you ever restore a
repository from backup.  When those types of operations occur you
must restart Apache in order to clear its cache.

Finally, to prevent any confusion going forward, WANdisco changed
its company name to Cirata in 2024.

Cheers.

Doug

[0] https://cirata.com/resources/support/subversion-binaries
-- 
*Doug Robinson*  Senior Product Manager
P +1 925 396 1125
*E* doug.robin...@cirata.com

-- 





THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY 
BE PRIVILEGED


If this message was misdirected, Cirata Ltd. and its 
subsidiaries, ("Cirata") does not waive any confidentiality or privilege. 
If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone. Any 
distribution, use or copying of this email or the information it contains 
by other than an intended recipient is unauthorized. The views and opinions 
expressed in this email message are the author's own and may not reflect 
the views and opinions of Cirata, unless the author is authorized by Cirata 
to express such views or opinions on its behalf. All email sent to or from 
this address is subject to electronic storage and review by Cirata. 
Although Cirata operates anti-virus programs, it does not accept 
responsibility for any damage whatsoever caused by viruses being passed.


Re: Implications of duplicate UUIDs on a server

2025-04-30 Thread Doug Robinson
Folks:

I missed replying to one more detail that should be corrected.

On Wed, Apr 30, 2025 at 10:25 AM Doug Robinson 
wrote:

> On Wed, Apr 30, 2025 at 4:17 AM Andreas Stieger 
> wrote:
>
>> WANdisco is not a typical Subversion server implementation. For
>> synchronous multi-site replication (as opposed to svn sync which is
>> asynchronous), it provides a proxy layer with a consensus protocol
>> (think PAXOS, wsrep, raft).
>
>
The WANdisco implementation is still asynchronous even though we try
our hardest to make the replication occur absolutely as fast as possible.
The PAXOS implementation is used to schedule the order of the repository
updates.  The updates then occur in that order at every site as soon as the
data is available and the prior update has been applied.

While a synchronous implementation might be great within a single data
center, it comes with way too many problems for a world-wide WAN
environment.

Cheers.

Doug
-- 
*Doug Robinson*  Senior Product Manager
P +1 925 396 1125
*E* doug.robin...@cirata.com

-- 





THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY 
BE PRIVILEGED


If this message was misdirected, Cirata Ltd. and its 
subsidiaries, ("Cirata") does not waive any confidentiality or privilege. 
If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone. Any 
distribution, use or copying of this email or the information it contains 
by other than an intended recipient is unauthorized. The views and opinions 
expressed in this email message are the author's own and may not reflect 
the views and opinions of Cirata, unless the author is authorized by Cirata 
to express such views or opinions on its behalf. All email sent to or from 
this address is subject to electronic storage and review by Cirata. 
Although Cirata operates anti-virus programs, it does not accept 
responsibility for any damage whatsoever caused by viruses being passed.


Re: Implications of duplicate UUIDs on a server

2025-04-30 Thread Andreas Stieger


On 2025-04-30 17:07, Doug Robinson wrote:
On Wed, Apr 30, 2025 at 10:25 AM Doug Robinson 
 wrote:


On Wed, Apr 30, 2025 at 4:17 AM Andreas Stieger
 wrote:

WANdisco is not a typical Subversion server implementation. For
synchronous multi-site replication (as opposed to svn sync
which is
asynchronous), it provides a proxy layer with a consensus
protocol
(think PAXOS, wsrep, raft).


The WANdisco implementation is still asynchronous even though we try
our hardest to make the replication occur absolutely as fast as possible.
The PAXOS implementation is used to schedule the order of the repository
updates.  The updates then occur in that order at every site as soon 
as the

data is available and the prior update has been applied.

While a synchronous implementation might be great within a single data
center, it comes with way too many problems for a world-wide WAN
environment.



Certainly, I should have been more clear with that I mean: In a 
multi-master replication setup you need consensus at a point when you 
want a candidate transaction to become a revision. If there is only such 
candidate ready, you can commit to both your global sequence and to the 
committing the client rather quickly. Not every other node may see it 
instantaneously, but would not be able to commit to another until they 
are caught up. I briefly played with the wsrep replication API between 
svn_fs layers to make a FLOSS version of that but never got beyond a 
prototype.


Andreas