Re: Summarizing Timestamp datatype

2014-06-17 Thread DuyHai Doan
Hello Jason If you want to check for presence / absence of data for a day, you can add the date as a composite component to your partition key. Cassandra will rely on the bloom filter and avoid hitting disk for maximum performance. The only drawback of this modelling is that you need to provide t

Re: Minimum Cluster size to accommodate a single node failure

2014-06-17 Thread Ben Bromhead
Yes your thinking is correct. This article from TLP sums it all up beautifully http://thelastpickle.com/blog/2011/06/13/Down-For-Me.html Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 18 Jun 2014, at 4:18 pm, Prabath Abeysekara wrote: > Sorry, the title o

Re: Minimum Cluster size to accommodate a single node failure

2014-06-17 Thread Prabath Abeysekara
Sorry, the title of this thread has to be "*Minimum cluster size to survive a single node failure*". On Wed, Jun 18, 2014 at 11:38 AM, Prabath Abeysekara < prabathabeysek...@gmail.com> wrote: > Hi Everyone, > > First of all, apologies if the $subject was discussed previously in this > list befor

Minimum Cluster size to accommodate a single node failure

2014-06-17 Thread Prabath Abeysekara
Hi Everyone, First of all, apologies if the $subject was discussed previously in this list before. I've already gone through quite a few email trails on this but still couldn't find a convincing answer which really made me raise this question again here in this list. If my understanding is correc

Re: Problem with /etc/cassandra for cassandra 2.0.8

2014-06-17 Thread Michael Shuler
On 06/17/2014 05:09 PM, Donald Smith wrote: I installed a package version of cassandra via “sudo yum install cassandra20.noarch” into a clean host and got: cassandra20.noarch 2.0.8-2 @datastax That resulted in a problem: /etc/cassandra/ did not exist. So I did “sudo yum downgrade

Re: Summarizing Timestamp datatype

2014-06-17 Thread Jason Lewis
That's how my schema is built. So far, I'm pulling the data out by a range of 30 days. I want to see if I have data for every day, just wondering if it's possible in the CQL, as opposed to how i'm doing it now, in python. On Tue, Jun 17, 2014 at 9:46 PM, Laing, Michael wrote: > If you can arrang

Re: Summarizing Timestamp datatype

2014-06-17 Thread Laing, Michael
If you can arrange to index your rows by: (, ) Then you can select ranges as you wish. This works because is the "partition key", arrived at by hash (really it's a hash key), whereas is the "clustering key" (really it is a range key) which is kept in sorted order both in memory and on disk. I

Summarizing Timestamp datatype

2014-06-17 Thread Jason Lewis
I have data stored with the timestamp datatype. Is it possible to use CQL to return results based on if a row falls in a range for a day? Ex. If I have 20 rows that occur on 2014-06-10, no rows for 2014-06-11 and 15 rows that occured on 2014-06-12, I'd like to only return results that data exists

Problem with /etc/cassandra for cassandra 2.0.8

2014-06-17 Thread Donald Smith
I installed a package version of cassandra via "sudo yum install cassandra20.noarch" into a clean host and got: cassandra20.noarch 2.0.8-2 @datastax That resulted in a problem: /etc/cassandra/ did not exist. So I did "sudo yum downgrade cassandra20.noarch" and got version 2.0.7.

Re: Tweaking SizeTieredCompactionStrategy for heavy writes (47K files created)

2014-06-17 Thread Redmumba
I am using the SizeTieredCompactionStrategy, all with the default settings, with C* 2.0.7. I figured with a high compaction rate (999), it would be able to keep up--there's no major IO times on the hosts. Should I remove the threshold entirely (set to 0)? On Tue, Jun 17, 2014 at 11:36 AM, Rober

Re: Tweaking SizeTieredCompactionStrategy for heavy writes (47K files created)

2014-06-17 Thread Robert Coli
On Tue, Jun 17, 2014 at 11:26 AM, Redmumba wrote: > Alright, that's perfectly reasonable--I'm not quite sure which settings > will affect the number of writes. I have set the compaction throughput in > the past to 999, but I'm not sure how that correlates to the _number_ of > files created--and

Re: Tweaking SizeTieredCompactionStrategy for heavy writes (47K files created)

2014-06-17 Thread Redmumba
Alright, that's perfectly reasonable--I'm not quite sure which settings will affect the number of writes. I have set the compaction throughput in the past to 999, but I'm not sure how that correlates to the _number_ of files created--and sans doing a major compaction, I'm not sure how to actually

Re: Tweaking SizeTieredCompactionStrategy for heavy writes (47K files created)

2014-06-17 Thread Robert Coli
On Tue, Jun 17, 2014 at 11:14 AM, Redmumba wrote: > I have a very write heavy workload, and noticed that the default settings > for min_ and max_compaction_threshold resulted in around 47k files in my > table directory. In general, files were fairly small (ranging in the > single digits of megab

Tweaking SizeTieredCompactionStrategy for heavy writes (47K files created)

2014-06-17 Thread Redmumba
I have a very write heavy workload, and noticed that the default settings for min_ and max_compaction_threshold resulted in around 47k files in my table directory. In general, files were fairly small (ranging in the single digits of megabytes to gigabytes). What is the best way to tweak these val

Re: Questions about timestamp set at writetime

2014-06-17 Thread DuyHai Doan
Thank you Sylvain for the very clear explanations On Tue, Jun 17, 2014 at 2:44 PM, Sylvain Lebresne wrote: > > >> 1) Who is responsible for this micro-second timestamp ? The coordinator >> which receives the insert request or each replica which actually do persist >> the data ? >> > > The coo

Re: Questions about timestamp set at writetime

2014-06-17 Thread Sylvain Lebresne
> > 1) Who is responsible for this micro-second timestamp ? The coordinator > which receives the insert request or each replica which actually do persist > the data ? > The coordinator. > > 2) In a case of a batch insert (CQL3 batch, not batch mutation Thrift > API), if no user defined timestamp

Re: Questions about timestamp set at writetime

2014-06-17 Thread tommaso barbugli
thats going to be the timestamp for the data affected. what I meant is that you cant have different timestamps (insert x timestamp y; insert x' timestamp y') 2014-06-17 14:27 GMT+02:00 DuyHai Doan : > "that is not possible to define different timestamps within a batch" --> > It is possible : > h

Re: Questions about timestamp set at writetime

2014-06-17 Thread DuyHai Doan
"that is not possible to define different timestamps within a batch" --> It is possible : http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/batch_r.html On Tue, Jun 17, 2014 at 2:17 PM, tommaso barbugli wrote: > when inserting with a batch every row have the same timestamp; I also

Re: Questions about timestamp set at writetime

2014-06-17 Thread tommaso barbugli
when inserting with a batch every row have the same timestamp; I also think (not 100%) that is not possible to define different timestamps within a batch. Tommaso 2014-06-17 14:10 GMT+02:00 DuyHai Doan : > Hello all > > I know that at write time a timestamp is automatically generated by the >

Re: Read data from Cassandra ordered by writetime using cql.

2014-06-17 Thread Abhishek Mukherjee
Thanks I'll have a look. On 17 Jun 2014 17:42, "Abhishek Mukherjee" <4271...@gmail.com> wrote: > Thanks Jen. I'll r > On 17 Jun 2014 17:37, "Jens Rantil" wrote: > >> Hi Abhishek, >> >> You can't. You need to use a clustering key to keep track of your >> ordering. See >> http://www.datastax.com/do

Re: Read data from Cassandra ordered by writetime using cql.

2014-06-17 Thread Abhishek Mukherjee
Thanks Jen. I'll r On 17 Jun 2014 17:37, "Jens Rantil" wrote: > Hi Abhishek, > > You can't. You need to use a clustering key to keep track of your > ordering. See > http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__querying-compound-prim

Questions about timestamp set at writetime

2014-06-17 Thread DuyHai Doan
Hello all I know that at write time a timestamp is automatically generated by the server and assigned to each column. My questions are: 1) Who is responsible for this micro-second timestamp ? The coordinator which receives the insert request or each replica which actually do persist the data ?

Re: Read data from Cassandra ordered by writetime using cql.

2014-06-17 Thread Jens Rantil
Hi Abhishek, You can't. You need to use a clustering key to keep track of your ordering. See http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__querying-compound-primary-keys-and-sorting-results Cheers, Jens On Tue, Jun 17, 2014 at 1:48

Re: Read data from Cassandra ordered by writetime using cql.

2014-06-17 Thread subhankar biswas
this question has been raised several times. But only one answer u can/should not. On 17-Jun-2014, at 5:18 pm, Abhishek Mukherjee <4271...@gmail.com> wrote: > Hi Everyone, > > I am trying to read data from my Cassandra database in the order in which it > got written into the DB. There is a WRIT

Read data from Cassandra ordered by writetime using cql.

2014-06-17 Thread Abhishek Mukherjee
Hi Everyone, I am trying to read data from my Cassandra database in the order in which it got written into the DB. There is a WRITETIME function which gives me the write time for a column. How can I use this so that the data when I do a returned from my query gets ordered by write time. I am tryi