RE: [Dbmail-dev] Re: unique_id discussion/problem

Jesse Norell Fri, 13 Jun 2003 16:58:40 +0200 (CEST)

Hello,

  I keep noticing everyone commenting on "is it worth all this?"
in this thread, and I've asked myself that several times... so
I'm going to guess no, it's not.  The current implimentation
works pretty well for pop3 unique_id's - our problems were actually
caused by me ignorantly putting a unique constraint on that field.
There is a small probability that the random numbers generated for
each id will collide, and I'll probably give up on the uuid thing
and just make sure they never do by incorporating the message_idnr.


  As far as security, the uuid stuff isn't meant to circumvent
crackers or anything, it's just a means of generating unique numbers
that will be unique on every machine at any time, and future id's
do fall within a very predictable set possibilities (which is the
basis of knowing that they have been/are/will be unique).

  Lou - as for clustering, you seem quite a bit more experienced than
others on the list (or those posting, at least), and probably a lot
of the discussion there is meaningless rambling to some (I only catch
about half of what you explain in the algorithm stuff..), but don't
stop!  Working the semantics and code out would be a huge boon to the
dbmail community, and I'm sure others will join in as they can.  Plus
it'll all be logged for review, etc. in the future.  :)

Later,
Jesse


---- Original Message ----
From: Lou Kamenov  <[email protected]>
To: [email protected]
Subject: [Dbmail-dev] Re: unique_id discussion/problem
Sent: Thu, 12 Jun 2003 23:55:44 +0100

> Jesse Norell writes: 
> 
> > 
> > Hello, 
> > 
> > ---- Original Message ----
> > From: Aaron Stone <[email protected]>
> > To: [email protected]
> > Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
> > Sent: Thu, 12 Jun 2003 07:21:42 -0700 (PDT) 
> > 
> >> In fact, I would highly recommend that a database is used. I envision a
> >> table that has a row for each server in the cluster and a "uuid prefix" or
> >> something to the like. Synchronization information might also be stored in
> >> this table, such as the IP address of each server as it links up with a
> >> row in the database and the timestamp of when it last attached. 
> >> 
> >> Naturally this table will have to be replicated, and so it should not have
> >> an auto_increment column, but something else more unique. Hostnames or IP
> >> addresses are an obvious answer, if not a good one ;-)
> >
> >   I don't think ip addresses would be unique enough - some cluster
> > implimentations have multiple machines with the same address (eg. via
> > load-balancing hardware switches).  Nor hostname (eg. we have multiple
> > machines for mail.kci.net - while they do have unique hostnames also,
> > there's no reason they would necessarily have to).  The mac addr
> > seems like the best almost-always-unique identifier that's readily
> > available cross-platform.
> 
> Why mac addr the entropy is constantly growing, there is nothing more unique 
> than a generated id, and there are tons of generators out there to do so, 
> however you can alwayes use specific sequences which are in a way 
> predictable where you basically know what exactly you're going to get, 
> synchronizing is a different matter which can be done on the fly, let say
> if you have an N machine which joins an A cluster where the machine 
> identifies itself and waits for a specific id from which it derivates the 
> sequence factor. With these words I assume that you're aware of such things 
> like negotiation algorithms and so.. 
> 
> what i personally use: (except postfix and dbmail itself, with pgsql), the 
> begining was easy, let pgsql handle the sequence generations where _ANY_ one 
> who have access to the database can alter the sequence generation on the 
> fly, in other words the app will be able to re-assign new sequence to the 
> different servers by simply altering the sequence factor which is contained 
> in the sequence table itself.
> Being aware of the algorithm which is used for the generation it can simply 
> calculate what would be the factor for the next server, and this can be 
> totally automated and in a way unique. 
> 
> In my eyes a unique email addrs and login ids is a different case, where the 
> things get more complicated, but since in my approach I'd prefer to escape 
> from the collisions which are thereof produced by some not-finished-mad-mah 
> async replication processes. I'd search for more complex and sofisticated 
> solution. 
> 
> However the above also solves the problem if any of those machines have to 
> work on its own due to link failure or whatever.. we wont get a bloody 
> collisions since each machine is already using a unique factor for this 
> generation, the IDs itself doesnt matter the factor is the one that should 
> be unique and it would be a huge advantage if it's predictable by any of the 
> servers. 
> 
> 
> Guys, if I'm being annoying or I'm not writing on the right topic pretty 
> please let me know, i dont want to be boring and stuff, but again if you 
> have any agruments against this solution spread them across the list before 
> jumping into something like UUIDs, not that i have something against it, 
> just email is so atomic that it'd not need such a complicated solution. 
> 
> For clustering, here how my stuff work:
> Postfix + PgSQL Patch
> DBmail + some dumb connection checks. + PgSQL
> PgSQL itself is used with PgReplicator. 
> 
> I use pgsql sequences to generate the ids (which was the first approach with 
> dbmail) I chose PgSQL because sequences are highly granulated and it's easy 
> to control them. Each machine in the cluster has it's own ClusterID, aslo 
> with a kinda HeartBeat monitor it's aware how many servers are there and 
> what are their IDs, basically reading from a conf file where the primary 
> source for those settings is a table inside PgSQL, this file is just a 
> redundant option if somehow PgSQL on this machine fails to respond.
> Basically both, database and files are updated at the same time. 
> 
> I use them in the following order:
> mx1: RR(A records) dmx1 and dmx2
> mx2: RR(A records) dmx2 and dmx3
> so sos 
> 
> also when i install a dbmail system on a new cluster server, it generates 
> the PgSQL scheme on the fly, being aware what is the cluster ID which was 
> negotiated using the HB monitor, also a huge role is played by the 
> PgReplicator which gives me the ability _NOT_ to replicate sequence and 
> other tables like postfix aliases (which in fact are totally useless, but 
> somehow have to tell postfix to shut up with the annoying msg), for the case 
> this setup is not a free mailserver but a dedicated corporate use, so I dont 
> have users coming around and registering. 
> 
> For now I havent seen much problems, not to say any. 
> 
> One thing is for sure, after the crash I had with MySQL and I'm so staying 
> away from it, as my CTO's says MySqueel :) 
> 
> 
> >> I'm not sure what the "non-volatile" storage is needed for in your
> >> proposal beyond what I see as a unique prefix for each dbmail in the
> >> cluster as it writes to a replicated database server...
> > 
> >   Saved state info is basically for rollbacks in time (eg. machine
> > reboots) and to make multiple uuids generated w/in the same clock
> > tick be unique (because they're based largely upon time).
> 
> here you mean fail-over support which is supposed to be handled by the 
> replication process, or in dbmail itself for maximum portability? or I'm 
> assuming the wrong? 
> 
> cheers,
>  -lou 
> 
> _______________________________________________
> Dbmail-dev mailing list
> [email protected]
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> 
-- End Original Message --


--
Jesse Norell
jesse (at) kci.net

RE: [Dbmail-dev] Re: unique_id discussion/problem

Reply via email to