Hi Michael, 

   Thanks for you investigation, I have created 
https://issues.apache.org/jira/browse/CASSANDRA-4432 to cover related changes 
and going to start with that ASAP. 

Best Regards
-- 
Pavel Yaskevich


On Tuesday 10 July 2012 at 19:36, Michael Theroux wrote:

> Hello,
> 
> We are in the process of upgrading out cassandra installation from a single 
> instance to a 6 node cluster with a replication factor of 3. We are using 
> Cassandra 1.1.2. This is something I've done before in other environments, 
> but now I've hit an interesting issue. The cluster has been setup and all the 
> nodes have joined. I was about to update the replication factor to 3 via 
> cassandra-cli:
> 
> [open@unknown] use open; 
> Authenticated to keyspace: open
> 
> [default@open] update keyspace open with 
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' and 
> strategy_options={us-east:3};
> 4698e471-5a1d-30f2-aa11-761d204581ff
> Waiting for schema agreement...
> ... schemas agree across the cluster
> 
> The above looks normal, but when I look at the schema, the replication factor 
> is unchanged:
> 
> [default@open] describe open; 
> Keyspace: open: 
> Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
> Durable Writes: true
> Options: [us-east:1]
> Column Families:
> ...
> 
> I couldn't figure out why this was, but then I saw this thread:
> 
> http://www.datastax.com/support-forums/topic/cassandra-111-update-keyspace-not-working
> 
> So I tried creating a new keyspace "ks" and looked at the results:
> 
> [default@open] use system;
> Authenticated to keyspace: system
> [default@system] list schema_keyspace;
> schema_keyspace not found in current keyspace.
> [default@system] list schema_keyspaces;
> Using default limit of 100
> Using default column limit of 100
> -------------------
> RowKey: open
> => (column=durable_writes, value=true, timestamp=530617107329814)
> => (column=name, value=open, timestamp=530617107329814)
> => (column=strategy_class, 
> value=org.apache.cassandra.locator.NetworkTopologyStrategy, 
> timestamp=530617107329814)
> => (column=strategy_options, value={"us-east":"1"}, timestamp=530617107329814)
> -------------------
> RowKey: ks
> => (column=durable_writes, value=true, timestamp=42396175198913)
> => (column=name, value=ky, timestamp=42396175198913)
> => (column=strategy_class, 
> value=org.apache.cassandra.locator.NetworkTopologyStrategy, 
> timestamp=42396175198913)
> => (column=strategy_options, value={"datacenter1":"1"}, 
> timestamp=42396175198913)
> 
> 
> Notice the "timestamp" on the new keyspace is MUCH younger than "open" (by 
> more than a factor of 10). 
> 
> I didn't understand how this could be, as time has always been in sync. 
> 
> I decided to look at the code to see if I could spot anything. When 
> cassandra-cli attempts to create a new keyspace, it uses thrift, and ends up 
> here (in CassandraServer.java):
> 
> public String system_update_keyspace(KsDef ks_def)
> throws InvalidRequestException, SchemaDisagreementException, TException
> {
> logger.debug("update_keyspace");
> ThriftValidation.validateKeyspaceNotSystem(ks_def.name (http://ks_def.name));
> ...
> MigrationManager.announceKeyspaceUpdate(KSMetaData.fromThrift(ks_def));
> return Schema.instance.getVersion().toString();
> ...
> }
> 
> 
> Which then calls:
> 
> public static void announceKeyspaceUpdate(KSMetaData ksm) throws 
> ConfigurationException
> {
> ksm.validate();
> 
> KSMetaData oldKsm = Schema.instance.getKSMetaData(ksm.name (http://ksm.name));
> if (oldKsm == null)
> throw new ConfigurationException(String.format("Cannot update non existing 
> keyspace '%s'.", ksm.name (http://ksm.name)));
> 
> announce(oldKsm.toSchemaUpdate(ksm, System.nanoTime()));
> }
> 
> It then uses the results of System.nanoTime in the timestamp.
> 
> I wrote a simple java program to output System.nanoTime on the system in 
> which I attempted to add the new keyspace, and the output was:
> 
> 46627528340034
> 
> Which is in the realm of the keyspace I added above. System.nanoTime() is 
> java instance dependent (nanoTime). You will get different values depending 
> on what machine you run on and is not necessarily associated with you system 
> clock. I ran this on several different machines, all verified to be in sync 
> with NTP, and got massively different results. In fact, when I stopped and 
> started my instance, my nanoTime became:
> 
> 97234377869
> 
> I then created another keyspace "kw":
> 
> [default@system] list schema_keyspaces;
> Using default limit of 100
> Using default column limit of 100
> -------------------
> RowKey: open
> => (column=durable_writes, value=true, timestamp=530617107329814)
> => (column=name, value=open, timestamp=530617107329814)
> => (column=strategy_class, 
> value=org.apache.cassandra.locator.NetworkTopologyStrategy, 
> timestamp=530617107329814)
> => (column=strategy_options, value={"us-east":"1"}, timestamp=530617107329814)
> -------------------
> RowKey: ks
> => (column=durable_writes, value=true, timestamp=42396175198913)
> => (column=name, value=ky, timestamp=42396175198913)
> => (column=strategy_class, 
> value=org.apache.cassandra.locator.NetworkTopologyStrategy, 
> timestamp=42396175198913)
> => (column=strategy_options, value={"datacenter1":"1"}, 
> timestamp=42396175198913)
> -------------------
> RowKey: kw
> => (column=durable_writes, value=true, timestamp=236211433609)
> => (column=name, value=kw, timestamp=236211433609)
> => (column=strategy_class, 
> value=org.apache.cassandra.locator.NetworkTopologyStrategy, 
> timestamp=236211433609)
> => (column=strategy_options, value={"datacenter1":"1"}, 
> timestamp=236211433609)
> 
> 
> What I believe is happening is updates are not working because, as the thread 
> I linked above indicated, Cassandra is seeing my update as older than the 
> current entries, and is not honoring it. However, this is because it is using 
> System.nanoTime in thrift, which has no relation to the system clock time.
> 
> I tried to find something in JIRA, but I couldn't really find any issue that 
> matched (and wasn't fixed for other reasons in earlier releases). Is there 
> something simpler going on?
> 
> Thanks,
> -Mike
> 
> 


Reply via email to