Re: smallest/largest UUIDs for LexicalUUIDType

2013-06-07 Thread John R. Frank
Follow-up question: it seems that range queries on the *second* field of a CompositeType(UUIDType(), UUIDType()) do not work. If I concatenate the two UUID.hex values into a 32-character string instead of a CompositeType of two UUIDs, then range queries work correctly. This is illustrated b

Re: smallest/largest UUIDs for LexicalUUIDType

2013-06-06 Thread John R. Frank
I'll note that if you have the choice, you can use UUIDType rather than LexicalUUIDType. UUIDType fixes that behavior and use a proper lexical comparison for non-type-1 uuids (the other behavior of UUIDType is that for type 1 uuid, it compares them by time first, i.e. it is equivalent to TimeUU

smallest/largest UUIDs for LexicalUUIDType

2013-06-05 Thread John R. Frank
C*, I'm trying to use composite column names to organize 10**8 records. Each record has a unique pair of UUIDs. The first UUID is often repeated, so I want to use column_start and column_finish to find all the records that have a given UUID as the first UUID in the pair. I thought a simple

Re: consistency level for "create keyspace"?

2013-06-04 Thread John R. Frank
try/except all such failures and sleep it off? This is particularly cumbersome for writing tests that setup/teardown keyspaces repeatedly. jrf On Mon, 3 Jun 2013, John R. Frank wrote: C* When I create a keyspace with pycassa on a multi-node cluster, it takes some time before all the nodes

unable to delete

2013-06-03 Thread John R. Frank
C*, Is it considered normal for cassandra to experience this error: ERROR [NonPeriodicTasks:1] 2013-06-03 18:17:05,374 SSTableDeletingTask.java (line 72) Unable to delete /raid0/cassandra/data///--ic-19-Data.db (it will be removed on server restart; we'll also retry after GC) This is on the D

consistency level for "create keyspace"?

2013-06-03 Thread John R. Frank
C* When I create a keyspace with pycassa on a multi-node cluster, it takes some time before all the nodes know about the keyspace. So, if I do this: sm = SystemManager(random.choice(server_list)) sm.create_keyspace(keyspace, SIMPLE_STRATEGY, {'replication_factor': '1'}) sm.close()

Re: pycassa failures in large batch cycling

2013-05-17 Thread John R. Frank
IMHO you are going to have more success breaking up your work load to work with the current settings.  The buffers created by thrift are going to eat up the server side memory. They grow dynamically but persist for the life of the connection.  Amen to that. Already refactoring our workload to

Re: pycassa failures in large batch cycling

2013-05-16 Thread John R. Frank
On Tue, 14 May 2013, aaron morton wrote: After several cycles, pycassa starts getting connection failures. Do you have the error stack ?Are the TimedOutExceptions or socket time outs or something else. I figured out the problem here and made this ticket in jira: https://issues.apa

pycassa fails to write values larger than one-tenth thrift_framed_transport_size_in_mb, defaults to 15MB --> 1.5MB limit on values !?

2013-05-10 Thread John R. Frank
C* users, The simple code below demonstrates pycassa failing to write values containing more than one-tenth that thrift_framed_transport_size_in_mb. It writes a single column row using a UUID key. For example, with the default of thrift_framed_transport_size_in_mb: 15 the code below

pycassa failures in large batch cycling

2013-05-09 Thread John R. Frank
C* users, We have a process that loads a large batch of rows from Cassandra into many separate compute workers. The rows are one-column wide and range in size for a couple KB to ~100 MB. After manipulating the data for a while, each compute worker writes the data back with *new* row keys com

Re: RPM for cassandra with CQL 3.0.2?

2013-05-02 Thread John R. Frank
Cassandra 1.2.4 is the current release, even the current 1.2 branch still uses CQL 3.0.1.  You may need to use the trunk to get CQL 3.0.2 (I've not looked).  Is there something specific you are looking for ? We're using token() and were hitting an issue that we thought might be fixed by 3.0

RPM for cassandra with CQL 3.0.2?

2013-05-02 Thread John R. Frank
Is there a pre-built RPM for cassandra with CQL 3.0.2? It appears that this latest community RPM does not: http://rpm.datastax.com/community/noarch/ cassandra12-1.2.4-1.noarch.rpm 2013-Apr-26 22:31:32 Am I missing something? jrf

Re: loading all rows from cassandra using multiple (python) clients in parallel

2013-04-24 Thread John R. Frank
On Wed, 24 Apr 2013, aaron morton wrote:  EDIT: works after switching to testing against the lastest version of the cassandra database (doh!), and also updating the syntax per notes below: http://stackoverflow.com/questions/16137944/loading-all-rows-from-cassandra-using-multiple-python

loading all rows from cassandra using multiple (python) clients in parallel

2013-04-22 Thread John R. Frank
Cassandra Experts, I understand that when using Cassandra's recommended RandomPartitioner (or Murmur3Partitioner), it is not possible to do meaningful range queries on keys, because the rows are distributed around the cluster using the md5 hash of the key. These hashes are called "tokens."