Re: Timeseries analysis using Cassandra and partition by date period

2015-04-06 Thread Serega Sheypak
Thank you, we'll see that instrument, 2015-04-06 12:30 GMT+02:00 Srinivasa T N : > Comparison to OpenTSDB HBase > > For one we do not use id’s for strings. The string data (metric names and > tags) are written to row keys and the appropriate indexes. Because > Cassandra has much wider rows there

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-06 Thread Srinivasa T N
Comparison to OpenTSDB HBase For one we do not use id’s for strings. The string data (metric names and tags) are written to row keys and the appropriate indexes. Because Cassandra has much wider rows there are far fewer keys written to the database. The space saved by using id’s is minor and by n

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-06 Thread Serega Sheypak
Thanks, is it a kind of opentsdb? 2015-04-05 18:28 GMT+02:00 Kevin Burton : > > Hi, I switched from HBase to Cassandra and try to find problem solution > for timeseries analysis on top Cassandra. > > Depending on what you’re looking for, you might want to check out KairosDB. > > 0.95 beta2 just s

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-05 Thread Kevin Burton
> Hi, I switched from HBase to Cassandra and try to find problem solution for timeseries analysis on top Cassandra. Depending on what you’re looking for, you might want to check out KairosDB. 0.95 beta2 just shipped yesterday as well so you have good timing. https://github.com/kairosdb/kairosdb

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Serega Sheypak
Okay, so bucketing by day/week/month is a capacity planning stuff and actual questions I want to ask. As as a conclusion: I have a table events CREATE TABLE user_plans ( id timeuuid, user_id timeuuid, event_ts timestamp, event_type int, some_other_attr text PRIMARY KEY (user_id, ends) )

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Jack Krupansky
It sounds like your time bucket should be a month, but it depends on the amount of data per user per day and your main query range. Within the partition you can then query for a range of days. Yes, all of the rows within a partition are stored on one physical node as well as the replica nodes. --

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Serega Sheypak
>non-equal relation on a partition key is not supported Ok, can I generate select query: select some_attributes from events where ymd = 20150101 or ymd = 20150102 or 20150103 ... or 20150331 > The partition key determines which node can satisfy the query So you mean that all rows with the same *(y

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Jack Krupansky
Unfortunately, a non-equal relation on a partition key is not supported. You would need to bucket by some larger unit, like a month, and then use the date/time as a clustering column for the row key. Then you could query within the partition. The partition key determines which node can satisfy the

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Serega Sheypak
Hi, we plan to have 10^8 users and each user could generate 10 events per day. So we have: 10^8 records per day 10^8*30 records per month. Our timewindow analysis could be from 1 to 6 months. Right now PK is PRIMARY KEY (user_id, ends) where endts is exact ts of event. So you suggest this approac

Re: Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Jack Krupansky
It depends on the actual number of events per user, but simply bucketing the partition key can give you the same effect - clustering rows by time range. A composite partition key could be comprised of the user name and the date. It also depends on the data rate - is it many events per day or just

Timeseries analysis using Cassandra and partition by date period

2015-04-04 Thread Serega Sheypak
Hi, I switched from HBase to Cassandra and try to find problem solution for timeseries analysis on top Cassandra. I have a entity named "Event". "Event" has attributes: user_id - a guy who triggered event event_ts - when even happened event_type - type of event some_other_attr - some other attrs we