0.7RC1 local_quorum -> TimedOutException

2010-12-28 Thread Thor Carpenter
Hi All,

I am a bit stumped.  When reading at a CL of local_quorum I get 
TimedOutExceptions however when reading at any other CL, including ALL, 
everything works as expected.  Any write CL, including local_quorum, also works 
as expected.

I am running a 4 node 0.7 RC1 cluster in datacenter "DC1".  "DC2" is planned 
for the future but currently has no nodes.  I am using Hector 0.7.0-20.  

Here is my testing schema:

create keyspace GlobalCacheKeyspaceDC1RF4 with 
replication_factor = 4 and 
placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' 
and
strategy_options = [{DC1:4, DC2:0}];

use GlobalCacheKeyspaceDC1RF4;

create column family GlobalCacheCF with
comment = 'General purpose cache' and
column_type = 'Standard' and
comparator = 'BytesType' and
keys_cached = 20 and
rows_cached = 2000 and
gc_grace = 0 and
read_repair_chance = 0.0;


Here is my cassandra-topology.properties:

10.5.64.15=DC1:R1
10.5.64.28=DC1:R1
10.5.64.56=DC1:R1
10.5.64.39=DC1:R1

# default for unknown nodes
default=DC2:R1


Here is the exception I get when reading at a CL of local_quorum:

1:44:14.176 [main] ERROR m.p.c.connection.HConnectionManager - Could not 
fullfill request on this host CassandraClient
11:44:14.182 [main] ERROR m.p.c.connection.HConnectionManager - Exception: 
me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:27)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$15.execute(KeyspaceServiceImpl.java:598)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$15.execute(KeyspaceServiceImpl.java:587)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:89)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:101)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:604)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:65)
 [hector-0.7.0-20.jar:na]
at 
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51)
 [hector-0.7.0-20.jar:na]


There are no warnings or errors in cassandra.log.

Any guidance would be appreciated.

Thanks,
-Thor



Re: 0.7RC1 local_quorum -> TimedOutException

2010-12-28 Thread Thor Carpenter
OK, attached are the logs. 

At time 15:55 I performed a write with CL = ONE, a subsequent read with CL = 
ONE and received correct results.  Similar results occur with any read CL other 
than LOCAL_QUORUM.

At time 15:58 I performed a write with CL = ONE, a subsequent read with CL = 
LOCAL_QUORUM and received a TimedOutException.

System.log and cassandra.log are from server dlc05-sea to corresponds with the 
hector error message "Closing client CassandraClient".  I 
haven't found anything in the logs that indicates the problem but maybe others 
with more discerning eyes can.




Please let me know if I can provide any additional information.

Thanks,
-Thor

On Dec 28, 2010, at 12:21 PM, Jonathan Ellis wrote:

> Turn on debug logging to see what nodes the request gets sent to, and
> which ones replied.
> 
> [dev to BCC]
> 
> On Tue, Dec 28, 2010 at 1:59 PM, Thor Carpenter
>  wrote:
>> Hi All,
>> 
>> I am a bit stumped.  When reading at a CL of local_quorum I get 
>> TimedOutExceptions however when reading at any other CL, including ALL, 
>> everything works as expected.  Any write CL, including local_quorum, also 
>> works as expected.
>> 
>> I am running a 4 node 0.7 RC1 cluster in datacenter "DC1".  "DC2" is planned 
>> for the future but currently has no nodes.  I am using Hector 0.7.0-20.
>> 
>> Here is my testing schema:
>> 
>> create keyspace GlobalCacheKeyspaceDC1RF4 with
>>replication_factor = 4 and
>>placement_strategy = 
>> 'org.apache.cassandra.locator.NetworkTopologyStrategy' and
>>strategy_options = [{DC1:4, DC2:0}];
>> 
>> use GlobalCacheKeyspaceDC1RF4;
>> 
>> create column family GlobalCacheCF with
>>comment = 'General purpose cache' and
>>column_type = 'Standard' and
>>comparator = 'BytesType' and
>>keys_cached = 20 and
>>rows_cached = 2000 and
>>gc_grace = 0 and
>>read_repair_chance = 0.0;
>> 
>> 
>> Here is my cassandra-topology.properties:
>> 
>> 10.5.64.15=DC1:R1
>> 10.5.64.28=DC1:R1
>> 10.5.64.56=DC1:R1
>> 10.5.64.39=DC1:R1
>> 
>> # default for unknown nodes
>> default=DC2:R1
>> 
>> 
>> Here is the exception I get when reading at a CL of local_quorum:
>> 
>> 1:44:14.176 [main] ERROR m.p.c.connection.HConnectionManager - Could not 
>> fullfill request on this host CassandraClient
>> 11:44:14.182 [main] ERROR m.p.c.connection.HConnectionManager - Exception:
>> me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
>>at 
>> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:27)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl$15.execute(KeyspaceServiceImpl.java:598)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl$15.execute(KeyspaceServiceImpl.java:587)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:89)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:101)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:604)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:65)
>>  [hector-0.7.0-20.jar:na]
>>at 
>> me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51)
>>  [hector-0.7.0-20.jar:na]
>> 
>> 
>> There are no warnings or errors in cassandra.log.
>> 
>> Any guidance would be appreciated.
>> 
>> Thanks,
>> -Thor
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com



Re: 0.7RC1 local_quorum -> TimedOutException

2010-12-29 Thread Thor Carpenter
In an attempt to replicate this error on a simpler system I have come across a 
different error but seemingly related problem.  Now I am getting the following 
exception in system.log.

DEBUG [pool-1-thread-3] 2010-12-29 12:10:38,897 CassandraServer.java (line 362) 
insert
ERROR [pool-1-thread-3] 2010-12-29 12:10:38,906 Cassandra.java (line 2960) 
Internal error processing insert
java.lang.AssertionError
at 
org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:392)
at 
org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:417)
at 
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:95)
at 
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:99)
at 
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1411)
at 
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1394)
at 
org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:109)
at 
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:442)
at 
org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:379)
at 
org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2952)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Looking at TokenMetadata.java:392, the assert that is triggered is:
assert ring.size() > 0;

Which led me to run nodetool ring which outputs a seemingly correct config:
-sh-3.2$ nodetool -h localhost -p 8080 ring
Address Status State   LoadOwnsToken
10.5.64.26  Up Normal  47.16 KB100.00% 1


To reproduce:

* I have upgraded to 0.7RC3 so install the latest riptano 0.7rc3 rpm on a 
single node in "DC1".

* In cassandra.yaml set initial_token = 1

* cassandra-topology.properties:

10.5.64.26=DC1:R1
default=DC2:R1

* Schema loaded via cassandra-cli:

create keyspace KeyspaceDC1 with
replication_factor = 1 and
placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' 
and
strategy_options = [{DC1:1, DC2:0}];

use KeyspaceDC1;

create column family TestCF with
column_type = 'Standard' and
comparator = 'BytesType' and
keys_cached = 20 and
rows_cached = 2000 and
gc_grace = 0 and
read_repair_chance = 0.0;

* In cassandra-cli execute the following:

[defa...@unknown] use KeyspaceDC1;
Authenticated to keyspace: KeyspaceDC1
[defa...@keyspacedc1] set TestCF['some key']['some col'] = 'some value';
Internal error processing insert

* Check system.log where you should find the above assertion error.


Can anybody find a flaw in this setup?  Do others agree something looks buggy?

Thanks,
-Thor

On Dec 28, 2010, at 4:33 PM, Thor Carpenter wrote:

Third try for good luck.  Maybe this time I'll get it all in one email.  :-)

At time 15:55 I performed a write with CL = ONE, a subsequent read with CL = 
ONE and received correct results.  Similar results occur with any read CL other 
than LOCAL_QUORUM.

At time 15:58 I performed a write with CL = ONE, a subsequent read with CL = 
LOCAL_QUORUM and received a TimedOutException.

System.log and cassandra.log are from server dlc05-sea to corresponds with the 
hector error message "Closing client CassandraClient".  I 
haven't found anything in the logs that indicates the problem but maybe others 
with more discerning eyes can.

Please let me know if I can provide any additional information.

Thanks,
-Thor





[SOLVED] Re: 0.7RC1 local_quorum -> TimedOutException

2010-12-30 Thread Thor Carpenter
So, I found a solution, but I think there is a (trivial) bug.

The broken keyspace:

create keyspace KeyspaceDC1 with
replication_factor = 1 and
placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' 
and
strategy_options = [{DC1:1, DC2:0}];

The fixed keyspace:

create keyspace KeyspaceDC1 with
replication_factor = 1 and
placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' 
and
strategy_options = [{DC1:1}];

I'm not sure if the issue is that there are no nodes in DC2 or that I'm saying 
DC2 shouldn't get any replicas, or a combination of the two.  If I get time 
I'll try to poke around the code base.

-Thor

On Dec 29, 2010, at 12:38 PM, Thor Carpenter wrote:

In an attempt to replicate this error on a simpler system I have come across a 
different error but seemingly related problem.  Now I am getting the following 
exception in system.log.

DEBUG [pool-1-thread-3] 2010-12-29 12:10:38,897 CassandraServer.java (line 362) 
insert
ERROR [pool-1-thread-3] 2010-12-29 12:10:38,906 Cassandra.java (line 2960) 
Internal error processing insert
java.lang.AssertionError
at 
org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:392)
at 
org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:417)
at 
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:95)
at 
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:99)
at 
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1411)
at 
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1394)
at 
org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:109)
at 
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:442)
at 
org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:379)
at 
org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2952)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Looking at TokenMetadata.java:392, the assert that is triggered is:
assert ring.size() > 0;

Which led me to run nodetool ring which outputs a seemingly correct config:
-sh-3.2$ nodetool -h localhost -p 8080 ring
Address Status State   LoadOwnsToken
10.5.64.26  Up Normal  47.16 KB100.00% 1


To reproduce:

* I have upgraded to 0.7RC3 so install the latest riptano 0.7rc3 rpm on a 
single node in "DC1".

* In cassandra.yaml set initial_token = 1

* cassandra-topology.properties:

10.5.64.26=DC1:R1
default=DC2:R1

* Schema loaded via cassandra-cli:

create keyspace KeyspaceDC1 with
replication_factor = 1 and
placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' 
and
strategy_options = [{DC1:1, DC2:0}];

use KeyspaceDC1;

create column family TestCF with
column_type = 'Standard' and
comparator = 'BytesType' and
keys_cached = 20 and
rows_cached = 2000 and
gc_grace = 0 and
read_repair_chance = 0.0;

* In cassandra-cli execute the following:

[defa...@unknown] use KeyspaceDC1;
Authenticated to keyspace: KeyspaceDC1
[defa...@keyspacedc1] set TestCF['some key']['some col'] = 'some value';
Internal error processing insert

* Check system.log where you should find the above assertion error.


Can anybody find a flaw in this setup?  Do others agree something looks buggy?

Thanks,
-Thor

On Dec 28, 2010, at 4:33 PM, Thor Carpenter wrote:

Third try for good luck.  Maybe this time I'll get it all in one email.  :-)

At time 15:55 I performed a write with CL = ONE, a subsequent read with CL = 
ONE and received correct results.  Similar results occur with any read CL other 
than LOCAL_QUORUM.

At time 15:58 I performed a write with CL = ONE, a subsequent read with CL = 
LOCAL_QUORUM and received a TimedOutException.

System.log and cassandra.log are from server dlc05-sea to corresponds with the 
hector error message "Closing client CassandraClient".  I 
haven't found anything in the logs that indicates the problem but maybe others 
with more discerning eyes can.

Please let me know if I can provide any additional information.

Thanks,
-Thor