Hello Jason
If you want to check for presence / absence of data for a day, you can add
the date as a composite component to your partition key. Cassandra will
rely on the bloom filter and avoid hitting disk for maximum performance.
The only drawback of this modelling is that you need to provide t
Yes your thinking is correct.
This article from TLP sums it all up beautifully
http://thelastpickle.com/blog/2011/06/13/Down-For-Me.html
Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
On 18 Jun 2014, at 4:18 pm, Prabath Abeysekara
wrote:
> Sorry, the title o
Sorry, the title of this thread has to be "*Minimum cluster size to survive
a single node failure*".
On Wed, Jun 18, 2014 at 11:38 AM, Prabath Abeysekara <
prabathabeysek...@gmail.com> wrote:
> Hi Everyone,
>
> First of all, apologies if the $subject was discussed previously in this
> list befor
Hi Everyone,
First of all, apologies if the $subject was discussed previously in this
list before. I've already gone through quite a few email trails on this but
still couldn't find a convincing answer which really made me raise this
question again here in this list.
If my understanding is correc
On 06/17/2014 05:09 PM, Donald Smith wrote:
I installed a package version of cassandra via “sudo yum install
cassandra20.noarch” into a clean host and got:
cassandra20.noarch 2.0.8-2 @datastax
That resulted in a problem: /etc/cassandra/ did not exist. So I did
“sudo yum downgrade
That's how my schema is built. So far, I'm pulling the data out by a
range of 30 days. I want to see if I have data for every day, just
wondering if it's possible in the CQL, as opposed to how i'm doing it
now, in python.
On Tue, Jun 17, 2014 at 9:46 PM, Laing, Michael
wrote:
> If you can arrang
If you can arrange to index your rows by:
(, )
Then you can select ranges as you wish.
This works because is the "partition key", arrived at by
hash (really it's a hash key), whereas is the "clustering
key" (really it is a range key) which is kept in sorted order both in
memory and on disk.
I
I have data stored with the timestamp datatype. Is it possible to use
CQL to return results based on if a row falls in a range for a day?
Ex. If I have 20 rows that occur on 2014-06-10, no rows for 2014-06-11
and 15 rows that occured on 2014-06-12, I'd like to only return
results that data exists
I installed a package version of cassandra via "sudo yum install
cassandra20.noarch" into a clean host and got:
cassandra20.noarch 2.0.8-2 @datastax
That resulted in a problem: /etc/cassandra/ did not exist. So I did "sudo yum
downgrade cassandra20.noarch" and got version 2.0.7.
I am using the SizeTieredCompactionStrategy, all with the default settings,
with C* 2.0.7.
I figured with a high compaction rate (999), it would be able to keep
up--there's no major IO times on the hosts. Should I remove the threshold
entirely (set to 0)?
On Tue, Jun 17, 2014 at 11:36 AM, Rober
On Tue, Jun 17, 2014 at 11:26 AM, Redmumba wrote:
> Alright, that's perfectly reasonable--I'm not quite sure which settings
> will affect the number of writes. I have set the compaction throughput in
> the past to 999, but I'm not sure how that correlates to the _number_ of
> files created--and
Alright, that's perfectly reasonable--I'm not quite sure which settings
will affect the number of writes. I have set the compaction throughput in
the past to 999, but I'm not sure how that correlates to the _number_ of
files created--and sans doing a major compaction, I'm not sure how to
actually
On Tue, Jun 17, 2014 at 11:14 AM, Redmumba wrote:
> I have a very write heavy workload, and noticed that the default settings
> for min_ and max_compaction_threshold resulted in around 47k files in my
> table directory. In general, files were fairly small (ranging in the
> single digits of megab
I have a very write heavy workload, and noticed that the default settings
for min_ and max_compaction_threshold resulted in around 47k files in my
table directory. In general, files were fairly small (ranging in the
single digits of megabytes to gigabytes).
What is the best way to tweak these val
Thank you Sylvain for the very clear explanations
On Tue, Jun 17, 2014 at 2:44 PM, Sylvain Lebresne
wrote:
>
>
>> 1) Who is responsible for this micro-second timestamp ? The coordinator
>> which receives the insert request or each replica which actually do persist
>> the data ?
>>
>
> The coo
>
> 1) Who is responsible for this micro-second timestamp ? The coordinator
> which receives the insert request or each replica which actually do persist
> the data ?
>
The coordinator.
>
> 2) In a case of a batch insert (CQL3 batch, not batch mutation Thrift
> API), if no user defined timestamp
thats going to be the timestamp for the data affected.
what I meant is that you cant have different timestamps (insert x timestamp
y; insert x' timestamp y')
2014-06-17 14:27 GMT+02:00 DuyHai Doan :
> "that is not possible to define different timestamps within a batch" -->
> It is possible :
> h
"that is not possible to define different timestamps within a batch" --> It
is possible :
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/batch_r.html
On Tue, Jun 17, 2014 at 2:17 PM, tommaso barbugli
wrote:
> when inserting with a batch every row have the same timestamp; I also
when inserting with a batch every row have the same timestamp; I also think
(not 100%) that is not possible to define different timestamps within a
batch.
Tommaso
2014-06-17 14:10 GMT+02:00 DuyHai Doan :
> Hello all
>
> I know that at write time a timestamp is automatically generated by the
>
Thanks I'll have a look.
On 17 Jun 2014 17:42, "Abhishek Mukherjee" <4271...@gmail.com> wrote:
> Thanks Jen. I'll r
> On 17 Jun 2014 17:37, "Jens Rantil" wrote:
>
>> Hi Abhishek,
>>
>> You can't. You need to use a clustering key to keep track of your
>> ordering. See
>> http://www.datastax.com/do
Thanks Jen. I'll r
On 17 Jun 2014 17:37, "Jens Rantil" wrote:
> Hi Abhishek,
>
> You can't. You need to use a clustering key to keep track of your
> ordering. See
> http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__querying-compound-prim
Hello all
I know that at write time a timestamp is automatically generated by the
server and assigned to each column.
My questions are:
1) Who is responsible for this micro-second timestamp ? The coordinator
which receives the insert request or each replica which actually do persist
the data ?
Hi Abhishek,
You can't. You need to use a clustering key to keep track of your ordering.
See
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__querying-compound-primary-keys-and-sorting-results
Cheers,
Jens
On Tue, Jun 17, 2014 at 1:48
this question has been raised several times. But only one answer u can/should
not.
On 17-Jun-2014, at 5:18 pm, Abhishek Mukherjee <4271...@gmail.com> wrote:
> Hi Everyone,
>
> I am trying to read data from my Cassandra database in the order in which it
> got written into the DB. There is a WRIT
Hi Everyone,
I am trying to read data from my Cassandra database in the order in which
it got written into the DB. There is a WRITETIME function which gives me
the write time for a column. How can I use this so that the data when I do
a returned from my query gets ordered by write time.
I am tryi
25 matches
Mail list logo