Hi Jonathan, let me explain my itch.
Our Cassandra deployment consists of 2 data centers, AWS us-east and private cloud at Rackspace built on OpenStack. OpenStack emulates EC2 APIs pretty well and Cassandra's EC2 support works on top of it - using Ec2MultiRegionSnitch all the private/public IPs handling works, cluster gets connected well etc. The only problem is in naming conventions presumed by Ec2Snitch. The snitch presumes the AWS naming conventions and splits the availability zone name using "-" as token separator - see the constructor of Ec2Snitch. The other DC of ours, the OpenStack one, doesn't respect these naming conventions (a decision made long ago and set to stone). The availability zone is in almost the same format as the EC2 one, but slightly different - it follows this regex: ^(na)\.(.*)-.*$. Please mind the dot between the first and second group. If stock Cassandra EC2 support is used Cassandra incorrectly uses the whole availability zone as the DC name, which results in all my nodes in the OpenStack based DC to be handled as in different DCs. My setup currently uses my own snitch which extends my fork of Ec2MultiRegionSnitch. It tries to guess which datacenter the snitch is running on and parses the availability zone according to the guess, using AWS or our OpenStack specific regex. Besides parsing the availability zone name the snitch does nothing and delegates all the real work to the hierarchy above. Unfortunately I had to fork Ec2MultiRegionSnitch and Ec2Snitch in order to avoid code duplication - in original versions a lot of work is done in constructors and there is no clean way to extend the classes. I'd love to get rid of the fork of these classes I have to maintain with every Cassandra release (for instance https://issues.apache.org/jira/browse/CASSANDRA-5432). What I suggest is the following: - make the Ec2 snitch parsing format configurable with default parser being the current (so that pure Ec2 users don't have to do anything and the support just works as today) - keep it simple - let the parser always presume three groups as in us-east-1a or our naming na.prod-hostname - add the format to an optional configuration parameter in cassandra.yaml If done this way, my configuration would use Ec2MultiRegionSnitch as is on AWS side and configured with custom regex on the OpenStack side. If accepted, cassandra will support more deployment use cases, I will get rid of my private fork and current users will not be hit. I will do the coding. regards, ondřej černoš On Fri, May 3, 2013 at 3:04 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > I don't understand what you're trying to solve. A snitch can get > asked for an endpoint from any DC, so you can't just configure > different nodes with different snitches and figure it will all be > good. > > On Thu, May 2, 2013 at 10:35 AM, Ondřej Černoš <cern...@gmail.com> wrote: > > Hi all, > > > > We use Cassandra in mixed Ec2/OpenStack environment. Unfortunately due to > > decisions made long ago the OpenStack availability zone name obtainable > > through > http://169.254.169.254/latest/meta-data/placement/availability-zone is > > not compatible with Cassandra's parsing in o.a.c.locator.Ec2Snitch - the > > format uses dot instead of minus as field separator. Currently I manage > my > > own fork of Cassandra's snitches, which is error prone. I thought I might > > patch Cassandra so that it understands custom formats: > > > > - make the format a regex configurable in cassandra.yaml with defaults > > (option not set at all) set to current implementation > > - make it easy - presume three groups (us-east-1a, > > openstack.something-computenode and the like) where the first two groups > > form datacenter name and the last one the rack (plus keeping > CASSANDRA-4026 > > functionality in place) > > > > For users not configuring the regex nothing will change, others, like me, > > will have the option to parse different availability zone names. > > > > What do you think? Does it have a chance being accepted? > > > > regards, > > > > ondřej černoš > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder, http://www.datastax.com > @spyced >