0.7RC1 local_quorum -> TimedOutException
Hi All, I am a bit stumped. When reading at a CL of local_quorum I get TimedOutExceptions however when reading at any other CL, including ALL, everything works as expected. Any write CL, including local_quorum, also works as expected. I am running a 4 node 0.7 RC1 cluster in datacenter "DC1". "DC2" is planned for the future but currently has no nodes. I am using Hector 0.7.0-20. Here is my testing schema: create keyspace GlobalCacheKeyspaceDC1RF4 with replication_factor = 4 and placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:4, DC2:0}]; use GlobalCacheKeyspaceDC1RF4; create column family GlobalCacheCF with comment = 'General purpose cache' and column_type = 'Standard' and comparator = 'BytesType' and keys_cached = 20 and rows_cached = 2000 and gc_grace = 0 and read_repair_chance = 0.0; Here is my cassandra-topology.properties: 10.5.64.15=DC1:R1 10.5.64.28=DC1:R1 10.5.64.56=DC1:R1 10.5.64.39=DC1:R1 # default for unknown nodes default=DC2:R1 Here is the exception I get when reading at a CL of local_quorum: 1:44:14.176 [main] ERROR m.p.c.connection.HConnectionManager - Could not fullfill request on this host CassandraClient 11:44:14.182 [main] ERROR m.p.c.connection.HConnectionManager - Exception: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:27) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$15.execute(KeyspaceServiceImpl.java:598) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$15.execute(KeyspaceServiceImpl.java:587) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:89) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:101) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:604) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:65) [hector-0.7.0-20.jar:na] at me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51) [hector-0.7.0-20.jar:na] There are no warnings or errors in cassandra.log. Any guidance would be appreciated. Thanks, -Thor
Re: 0.7RC1 local_quorum -> TimedOutException
OK, attached are the logs. At time 15:55 I performed a write with CL = ONE, a subsequent read with CL = ONE and received correct results. Similar results occur with any read CL other than LOCAL_QUORUM. At time 15:58 I performed a write with CL = ONE, a subsequent read with CL = LOCAL_QUORUM and received a TimedOutException. System.log and cassandra.log are from server dlc05-sea to corresponds with the hector error message "Closing client CassandraClient". I haven't found anything in the logs that indicates the problem but maybe others with more discerning eyes can. Please let me know if I can provide any additional information. Thanks, -Thor On Dec 28, 2010, at 12:21 PM, Jonathan Ellis wrote: > Turn on debug logging to see what nodes the request gets sent to, and > which ones replied. > > [dev to BCC] > > On Tue, Dec 28, 2010 at 1:59 PM, Thor Carpenter > wrote: >> Hi All, >> >> I am a bit stumped. When reading at a CL of local_quorum I get >> TimedOutExceptions however when reading at any other CL, including ALL, >> everything works as expected. Any write CL, including local_quorum, also >> works as expected. >> >> I am running a 4 node 0.7 RC1 cluster in datacenter "DC1". "DC2" is planned >> for the future but currently has no nodes. I am using Hector 0.7.0-20. >> >> Here is my testing schema: >> >> create keyspace GlobalCacheKeyspaceDC1RF4 with >>replication_factor = 4 and >>placement_strategy = >> 'org.apache.cassandra.locator.NetworkTopologyStrategy' and >>strategy_options = [{DC1:4, DC2:0}]; >> >> use GlobalCacheKeyspaceDC1RF4; >> >> create column family GlobalCacheCF with >>comment = 'General purpose cache' and >>column_type = 'Standard' and >>comparator = 'BytesType' and >>keys_cached = 20 and >>rows_cached = 2000 and >>gc_grace = 0 and >>read_repair_chance = 0.0; >> >> >> Here is my cassandra-topology.properties: >> >> 10.5.64.15=DC1:R1 >> 10.5.64.28=DC1:R1 >> 10.5.64.56=DC1:R1 >> 10.5.64.39=DC1:R1 >> >> # default for unknown nodes >> default=DC2:R1 >> >> >> Here is the exception I get when reading at a CL of local_quorum: >> >> 1:44:14.176 [main] ERROR m.p.c.connection.HConnectionManager - Could not >> fullfill request on this host CassandraClient >> 11:44:14.182 [main] ERROR m.p.c.connection.HConnectionManager - Exception: >> me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() >>at >> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:27) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.service.KeyspaceServiceImpl$15.execute(KeyspaceServiceImpl.java:598) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.service.KeyspaceServiceImpl$15.execute(KeyspaceServiceImpl.java:587) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:89) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:101) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:604) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:65) >> [hector-0.7.0-20.jar:na] >>at >> me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51) >> [hector-0.7.0-20.jar:na] >> >> >> There are no warnings or errors in cassandra.log. >> >> Any guidance would be appreciated. >> >> Thanks, >> -Thor >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com
Re: 0.7RC1 local_quorum -> TimedOutException
In an attempt to replicate this error on a simpler system I have come across a different error but seemingly related problem. Now I am getting the following exception in system.log. DEBUG [pool-1-thread-3] 2010-12-29 12:10:38,897 CassandraServer.java (line 362) insert ERROR [pool-1-thread-3] 2010-12-29 12:10:38,906 Cassandra.java (line 2960) Internal error processing insert java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:392) at org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:417) at org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:95) at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:99) at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1411) at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1394) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:109) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:442) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:379) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2952) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Looking at TokenMetadata.java:392, the assert that is triggered is: assert ring.size() > 0; Which led me to run nodetool ring which outputs a seemingly correct config: -sh-3.2$ nodetool -h localhost -p 8080 ring Address Status State LoadOwnsToken 10.5.64.26 Up Normal 47.16 KB100.00% 1 To reproduce: * I have upgraded to 0.7RC3 so install the latest riptano 0.7rc3 rpm on a single node in "DC1". * In cassandra.yaml set initial_token = 1 * cassandra-topology.properties: 10.5.64.26=DC1:R1 default=DC2:R1 * Schema loaded via cassandra-cli: create keyspace KeyspaceDC1 with replication_factor = 1 and placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:1, DC2:0}]; use KeyspaceDC1; create column family TestCF with column_type = 'Standard' and comparator = 'BytesType' and keys_cached = 20 and rows_cached = 2000 and gc_grace = 0 and read_repair_chance = 0.0; * In cassandra-cli execute the following: [defa...@unknown] use KeyspaceDC1; Authenticated to keyspace: KeyspaceDC1 [defa...@keyspacedc1] set TestCF['some key']['some col'] = 'some value'; Internal error processing insert * Check system.log where you should find the above assertion error. Can anybody find a flaw in this setup? Do others agree something looks buggy? Thanks, -Thor On Dec 28, 2010, at 4:33 PM, Thor Carpenter wrote: Third try for good luck. Maybe this time I'll get it all in one email. :-) At time 15:55 I performed a write with CL = ONE, a subsequent read with CL = ONE and received correct results. Similar results occur with any read CL other than LOCAL_QUORUM. At time 15:58 I performed a write with CL = ONE, a subsequent read with CL = LOCAL_QUORUM and received a TimedOutException. System.log and cassandra.log are from server dlc05-sea to corresponds with the hector error message "Closing client CassandraClient". I haven't found anything in the logs that indicates the problem but maybe others with more discerning eyes can. Please let me know if I can provide any additional information. Thanks, -Thor
[SOLVED] Re: 0.7RC1 local_quorum -> TimedOutException
So, I found a solution, but I think there is a (trivial) bug. The broken keyspace: create keyspace KeyspaceDC1 with replication_factor = 1 and placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:1, DC2:0}]; The fixed keyspace: create keyspace KeyspaceDC1 with replication_factor = 1 and placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:1}]; I'm not sure if the issue is that there are no nodes in DC2 or that I'm saying DC2 shouldn't get any replicas, or a combination of the two. If I get time I'll try to poke around the code base. -Thor On Dec 29, 2010, at 12:38 PM, Thor Carpenter wrote: In an attempt to replicate this error on a simpler system I have come across a different error but seemingly related problem. Now I am getting the following exception in system.log. DEBUG [pool-1-thread-3] 2010-12-29 12:10:38,897 CassandraServer.java (line 362) insert ERROR [pool-1-thread-3] 2010-12-29 12:10:38,906 Cassandra.java (line 2960) Internal error processing insert java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:392) at org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:417) at org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:95) at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:99) at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1411) at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1394) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:109) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:442) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:379) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:2952) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Looking at TokenMetadata.java:392, the assert that is triggered is: assert ring.size() > 0; Which led me to run nodetool ring which outputs a seemingly correct config: -sh-3.2$ nodetool -h localhost -p 8080 ring Address Status State LoadOwnsToken 10.5.64.26 Up Normal 47.16 KB100.00% 1 To reproduce: * I have upgraded to 0.7RC3 so install the latest riptano 0.7rc3 rpm on a single node in "DC1". * In cassandra.yaml set initial_token = 1 * cassandra-topology.properties: 10.5.64.26=DC1:R1 default=DC2:R1 * Schema loaded via cassandra-cli: create keyspace KeyspaceDC1 with replication_factor = 1 and placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:1, DC2:0}]; use KeyspaceDC1; create column family TestCF with column_type = 'Standard' and comparator = 'BytesType' and keys_cached = 20 and rows_cached = 2000 and gc_grace = 0 and read_repair_chance = 0.0; * In cassandra-cli execute the following: [defa...@unknown] use KeyspaceDC1; Authenticated to keyspace: KeyspaceDC1 [defa...@keyspacedc1] set TestCF['some key']['some col'] = 'some value'; Internal error processing insert * Check system.log where you should find the above assertion error. Can anybody find a flaw in this setup? Do others agree something looks buggy? Thanks, -Thor On Dec 28, 2010, at 4:33 PM, Thor Carpenter wrote: Third try for good luck. Maybe this time I'll get it all in one email. :-) At time 15:55 I performed a write with CL = ONE, a subsequent read with CL = ONE and received correct results. Similar results occur with any read CL other than LOCAL_QUORUM. At time 15:58 I performed a write with CL = ONE, a subsequent read with CL = LOCAL_QUORUM and received a TimedOutException. System.log and cassandra.log are from server dlc05-sea to corresponds with the hector error message "Closing client CassandraClient". I haven't found anything in the logs that indicates the problem but maybe others with more discerning eyes can. Please let me know if I can provide any additional information. Thanks, -Thor