Hello,
We are in the process of upgrading out cassandra installation from a single
instance to a 6 node cluster with a replication factor of 3. We are using
Cassandra 1.1.2. This is something I've done before in other environments, but
now I've hit an interesting issue. The cluster has been setup and all the
nodes have joined. I was about to update the replication factor to 3 via
cassandra-cli:
[open@unknown] use open;
Authenticated to keyspace: open
[default@open] update keyspace open with
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' and
strategy_options={us-east:3};
4698e471-5a1d-30f2-aa11-761d204581ff
Waiting for schema agreement...
... schemas agree across the cluster
The above looks normal, but when I look at the schema, the replication factor
is unchanged:
[default@open] describe open;
Keyspace: open:
Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Durable Writes: true
Options: [us-east:1]
Column Families:
...
I couldn't figure out why this was, but then I saw this thread:
http://www.datastax.com/support-forums/topic/cassandra-111-update-keyspace-not-working
So I tried creating a new keyspace "ks" and looked at the results:
[default@open] use system;
Authenticated to keyspace: system
[default@system] list schema_keyspace;
schema_keyspace not found in current keyspace.
[default@system] list schema_keyspaces;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: open
=> (column=durable_writes, value=true, timestamp=530617107329814)
=> (column=name, value=open, timestamp=530617107329814)
=> (column=strategy_class,
value=org.apache.cassandra.locator.NetworkTopologyStrategy,
timestamp=530617107329814)
=> (column=strategy_options, value={"us-east":"1"}, timestamp=530617107329814)
-------------------
RowKey: ks
=> (column=durable_writes, value=true, timestamp=42396175198913)
=> (column=name, value=ky, timestamp=42396175198913)
=> (column=strategy_class,
value=org.apache.cassandra.locator.NetworkTopologyStrategy,
timestamp=42396175198913)
=> (column=strategy_options, value={"datacenter1":"1"},
timestamp=42396175198913)
Notice the "timestamp" on the new keyspace is MUCH younger than "open" (by more
than a factor of 10).
I didn't understand how this could be, as time has always been in sync.
I decided to look at the code to see if I could spot anything. When
cassandra-cli attempts to create a new keyspace, it uses thrift, and ends up
here (in CassandraServer.java):
public String system_update_keyspace(KsDef ks_def)
throws InvalidRequestException, SchemaDisagreementException, TException
{
logger.debug("update_keyspace");
ThriftValidation.validateKeyspaceNotSystem(ks_def.name);
...
MigrationManager.announceKeyspaceUpdate(KSMetaData.fromThrift(ks_def));
return Schema.instance.getVersion().toString();
...
}
Which then calls:
public static void announceKeyspaceUpdate(KSMetaData ksm) throws
ConfigurationException
{
ksm.validate();
KSMetaData oldKsm = Schema.instance.getKSMetaData(ksm.name);
if (oldKsm == null)
throw new ConfigurationException(String.format("Cannot update non
existing keyspace '%s'.", ksm.name));
announce(oldKsm.toSchemaUpdate(ksm, System.nanoTime()));
}
It then uses the results of System.nanoTime in the timestamp.
I wrote a simple java program to output System.nanoTime on the system in which
I attempted to add the new keyspace, and the output was:
46627528340034
Which is in the realm of the keyspace I added above. System.nanoTime() is java
instance dependent (nanoTime). You will get different values depending on what
machine you run on and is not necessarily associated with you system clock. I
ran this on several different machines, all verified to be in sync with NTP,
and got massively different results. In fact, when I stopped and started my
instance, my nanoTime became:
97234377869
I then created another keyspace "kw":
[default@system] list schema_keyspaces;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: open
=> (column=durable_writes, value=true, timestamp=530617107329814)
=> (column=name, value=open, timestamp=530617107329814)
=> (column=strategy_class,
value=org.apache.cassandra.locator.NetworkTopologyStrategy,
timestamp=530617107329814)
=> (column=strategy_options, value={"us-east":"1"}, timestamp=530617107329814)
-------------------
RowKey: ks
=> (column=durable_writes, value=true, timestamp=42396175198913)
=> (column=name, value=ky, timestamp=42396175198913)
=> (column=strategy_class,
value=org.apache.cassandra.locator.NetworkTopologyStrategy,
timestamp=42396175198913)
=> (column=strategy_options, value={"datacenter1":"1"},
timestamp=42396175198913)
-------------------
RowKey: kw
=> (column=durable_writes, value=true, timestamp=236211433609)
=> (column=name, value=kw, timestamp=236211433609)
=> (column=strategy_class,
value=org.apache.cassandra.locator.NetworkTopologyStrategy,
timestamp=236211433609)
=> (column=strategy_options, value={"datacenter1":"1"}, timestamp=236211433609)
What I believe is happening is updates are not working because, as the thread I
linked above indicated, Cassandra is seeing my update as older than the current
entries, and is not honoring it. However, this is because it is using
System.nanoTime in thrift, which has no relation to the system clock time.
I tried to find something in JIRA, but I couldn't really find any issue that
matched (and wasn't fixed for other reasons in earlier releases). Is there
something simpler going on?
Thanks,
-Mike