Jesse Norell writes:
Hello,
---- Original Message ----
From: Aaron Stone <[email protected]>
To: [email protected]
Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
Sent: Thu, 12 Jun 2003 07:21:42 -0700 (PDT)
In fact, I would highly recommend that a database is used. I envision a
table that has a row for each server in the cluster and a "uuid prefix" or
something to the like. Synchronization information might also be stored in
this table, such as the IP address of each server as it links up with a
row in the database and the timestamp of when it last attached.
Naturally this table will have to be replicated, and so it should not have
an auto_increment column, but something else more unique. Hostnames or IP
addresses are an obvious answer, if not a good one ;-)
I don't think ip addresses would be unique enough - some cluster
implimentations have multiple machines with the same address (eg. via
load-balancing hardware switches). Nor hostname (eg. we have multiple
machines for mail.kci.net - while they do have unique hostnames also,
there's no reason they would necessarily have to). The mac addr
seems like the best almost-always-unique identifier that's readily
available cross-platform.
Why mac addr the entropy is constantly growing, there is nothing more unique
than a generated id, and there are tons of generators out there to do so,
however you can alwayes use specific sequences which are in a way
predictable where you basically know what exactly you're going to get,
synchronizing is a different matter which can be done on the fly, let say
if you have an N machine which joins an A cluster where the machine
identifies itself and waits for a specific id from which it derivates the
sequence factor. With these words I assume that you're aware of such things
like negotiation algorithms and so..
what i personally use: (except postfix and dbmail itself, with pgsql), the
begining was easy, let pgsql handle the sequence generations where _ANY_ one
who have access to the database can alter the sequence generation on the
fly, in other words the app will be able to re-assign new sequence to the
different servers by simply altering the sequence factor which is contained
in the sequence table itself.
Being aware of the algorithm which is used for the generation it can simply
calculate what would be the factor for the next server, and this can be
totally automated and in a way unique.
In my eyes a unique email addrs and login ids is a different case, where the
things get more complicated, but since in my approach I'd prefer to escape
from the collisions which are thereof produced by some not-finished-mad-mah
async replication processes. I'd search for more complex and sofisticated
solution.
However the above also solves the problem if any of those machines have to
work on its own due to link failure or whatever.. we wont get a bloody
collisions since each machine is already using a unique factor for this
generation, the IDs itself doesnt matter the factor is the one that should
be unique and it would be a huge advantage if it's predictable by any of the
servers.
Guys, if I'm being annoying or I'm not writing on the right topic pretty
please let me know, i dont want to be boring and stuff, but again if you
have any agruments against this solution spread them across the list before
jumping into something like UUIDs, not that i have something against it,
just email is so atomic that it'd not need such a complicated solution.
For clustering, here how my stuff work:
Postfix + PgSQL Patch
DBmail + some dumb connection checks. + PgSQL
PgSQL itself is used with PgReplicator.
I use pgsql sequences to generate the ids (which was the first approach with
dbmail) I chose PgSQL because sequences are highly granulated and it's easy
to control them. Each machine in the cluster has it's own ClusterID, aslo
with a kinda HeartBeat monitor it's aware how many servers are there and
what are their IDs, basically reading from a conf file where the primary
source for those settings is a table inside PgSQL, this file is just a
redundant option if somehow PgSQL on this machine fails to respond.
Basically both, database and files are updated at the same time.
I use them in the following order:
mx1: RR(A records) dmx1 and dmx2
mx2: RR(A records) dmx2 and dmx3
so sos
also when i install a dbmail system on a new cluster server, it generates
the PgSQL scheme on the fly, being aware what is the cluster ID which was
negotiated using the HB monitor, also a huge role is played by the
PgReplicator which gives me the ability _NOT_ to replicate sequence and
other tables like postfix aliases (which in fact are totally useless, but
somehow have to tell postfix to shut up with the annoying msg), for the case
this setup is not a free mailserver but a dedicated corporate use, so I dont
have users coming around and registering.
For now I havent seen much problems, not to say any.
One thing is for sure, after the crash I had with MySQL and I'm so staying
away from it, as my CTO's says MySqueel :)
I'm not sure what the "non-volatile" storage is needed for in your
proposal beyond what I see as a unique prefix for each dbmail in the
cluster as it writes to a replicated database server...
Saved state info is basically for rollbacks in time (eg. machine
reboots) and to make multiple uuids generated w/in the same clock
tick be unique (because they're based largely upon time).
here you mean fail-over support which is supposed to be handled by the
replication process, or in dbmail itself for maximum portability? or I'm
assuming the wrong?
cheers,
-lou