Thanks for the update. Good to know that TWCS give you more stability
On Wed, Feb 8, 2017 at 6:20 PM, John Sanda wrote:
> I wanted to provide a quick update. I was able to patch one of the
> environments that is hitting the tombstone problem. It has been running
> TWCS for five days now, and thi
I wanted to provide a quick update. I was able to patch one of the
environments that is hitting the tombstone problem. It has been running
TWCS for five days now, and things are stable so far. I also had a patch to
the application code to implement date partitioning ready to go, but I
wanted to see
In theory, you're right and Cassandra should possibly skip reading cells
having time < 50. But it's all theory, in practice Cassandra read chunks of
xxx kilobytes worth of data (don't remember the exact value of xxx, maybe
64k or far less) so you may end up reading tombstones.
On Sun, Jan 29, 2017
Check out our post on how to use TWCS before 3.0.
http://thelastpickle.com/blog/2017/01/10/twcs-part2.html
On Sun, Jan 29, 2017 at 11:20 AM John Sanda wrote:
> It was with STCS. It was on a 2.x version before TWCS was available.
>
> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan wrote:
>
> Did y
Thanks for the clarification. Let's say I have a partition in an SSTable
where the values of time range from 100 to 10 and everything < 50 is
expired. If I do a query with time < 100 and time >= 50, are there
scenarios in which Cassandra will have to read cells where time < 50? In
particular I am w
"Should the data be sorted by my time column regardless of the compaction
strategy" --> It does
What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
compacted together with a new chunk of SSTABLE-2 containing fresh data so
in the new resulting SSTable will contain tombstones AND
>
> Since STCS does not sort data based on timestamp, your wide partition may
> span over multiple SSTables and inside each SSTable, old data (+
> tombstones) may sit on the same partition as newer data.
Should the data be sorted by my time column regardless of the compaction
strategy? I didn't t
Ok so give it a try with TWCS. Since STCS does not sort data based on
timestamp, your wide partition may span over multiple SSTables and inside
each SSTable, old data (+ tombstones) may sit on the same partition as
newer data.
When reading by slice, even if you request for fresh data, Cassandra ha
It was with STCS. It was on a 2.x version before TWCS was available.
On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan wrote:
> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
>
> If you're using DTCS, beware of its weird behavior and tricky
> configuration.
>
> On Sun, Jan
Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
If you're using DTCS, beware of its weird behavior and tricky configuration.
On Sun, Jan 29, 2017 at 3:52 PM, John Sanda wrote:
> Your partitioning key is text. If you have multiple entries per id you are
>> likely hitti
>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit to
Your partitioning key is text. If you have multiple entries per id you are
likely hitting older cells that have expired. Descending only affects how
the data is stored on disk, if you have to read the whole partition to find
whichever time you are querying for you could potentially hit tombstones i
> STCS. Your TTL'ed data is becoming a tombstone. TWCS is a better strategy
> for this type of workload.
> On Sat, Jan 28, 2017 at 8:30 AM John Sanda wrote:
>
>> I have a time series data model that is basically:
>>
>> CREATE TABLE metrics (
>> id text,
>
On Sat, Jan 28, 2017 at 8:30 AM John Sanda wrote:
>
>> I have a time series data model that is basically:
>>
>> CREATE TABLE metrics (
>> id text,
>> time timeuuid,
>> value double,
>> PRIMARY KEY (id, time)
>> ) WITH CLUSTERING ORDE
Since you didn't specify a compaction strategy I'm guessing you're using
STCS. Your TTL'ed data is becoming a tombstone. TWCS is a better strategy
for this type of workload.
On Sat, Jan 28, 2017 at 8:30 AM John Sanda wrote:
> I have a time series data model that is basi
at 5:30 PM, John Sanda wrote:
> I have a time series data model that is basically:
>
> CREATE TABLE metrics (
> id text,
> time timeuuid,
> value double,
> PRIMARY KEY (id, time)
> ) WITH CLUSTERING ORDER BY (time DESC);
>
> I do append-only writes,
I have a time series data model that is basically:
CREATE TABLE metrics (
id text,
time timeuuid,
value double,
PRIMARY KEY (id, time)
) WITH CLUSTERING ORDER BY (time DESC);
I do append-only writes, no deletes, and use a TTL of seven days. Data
points are written every seconds
On 20 October 2016 at 09:29, wxn...@zjqunshuo.com
wrote:
> I do need to align the time windows to day bucket to prevent one row
> become too big, and event_time is timestamp since unix epoch. If I use
> bigint as type of event_time, can I do queries as you mentioned?
Yes.
Kurt Greaves
k...@ins
series data model
If event_time is timestamps since unix epoch you 1. may want to use the
in-built timestamps type, and 2. order by event_time DESC. 2 applies if you
want to do queries such as "select * from eventdata where ... and event_time >
x" (i.e; get latest events).
Other t
| speed
--+--+---+-+---++---
186628 | 20160928 | 1474992002005 | 48 | 30.343443 | 120.087514 |41
-Simon Wu
From: kurt Greaves
Date: 2016-10-20 16:23
To: user
Subject: Re: time series data model
Ah didn't pick up on that but looks like he's storing JSON within posit
If event_time is timestamps since unix epoch you 1. may want to use the
in-built timestamps type, and 2. order by event_time DESC. 2 applies if you
want to do queries such as "select * from eventdata where ... and
event_time > x" (i.e; get latest events).
Other than that your model seems workable,
Ah didn't pick up on that but looks like he's storing JSON within position.
Is there any strong reason for this or as Vladimir mentioned can you store
the fields under "position" in separate columns?
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com
On 20 October 2016 at 08:17, Vladimir Yudov
Hi Simon,
Why position is text and not float? Text takes much more place.
Also speed and headings can be calculated basing on latest positions, so you
can also save them. If you really need it in data base you can save them as
floats, or compose single float value like speed.heading: 41.173 (or
Hi All,
I'm trying to migrate my time series data which is GPS trace from mysql to C*.
I want a wide row to hold one day data. I designed the data model as below.
Please help to see if there is any problem. Any suggestion is appreciated.
Table Model:
CREATE TABLE cargts.eventdata (
deviceid
this will work.I have tried both gave one day unique bucket.
I just realized, If I sync all clients to one zone then date will remain
same for all.
One Zone date will give materialize view to row.
On Mon, Apr 30, 2012 at 11:43 PM, samal wrote:
> hhmm. I will try both. thanks
>
>
> On Mon, Apr
hhmm. I will try both. thanks
On Mon, Apr 30, 2012 at 11:29 PM, Tyler Hobbs wrote:
> Err, sorry, I should have said ts - (ts % 86400). Integer division does
> something similar.
>
>
> On Mon, Apr 30, 2012 at 12:39 PM, samal wrote:
>
>> thanks I didn't noticed.
>> run script for 5 minutes => d
Err, sorry, I should have said ts - (ts % 86400). Integer division does
something similar.
On Mon, Apr 30, 2012 at 12:39 PM, samal wrote:
> thanks I didn't noticed.
> run script for 5 minutes => divide seems to produce result ,modulo is
> still changing. If divide is ok will do the trick.
> I
thanks I didn't noticed.
run script for 5 minutes => divide seems to produce result ,modulo is
still changing. If divide is ok will do the trick.
I will run this script on Singapore, East coast server, and New delhi
server whole night today.
==
unix => 133580698
getTime() returns the number of milliseconds since the epoch, not the
number of seconds: http://www.w3schools.com/jsref/jsref_gettime.asp
If you divide that number by 1000, it should work.
On Mon, Apr 30, 2012 at 11:28 AM, samal wrote:
> I did it with node.js but it is changing after some inter
I did it with node.js but it is changing after some interval.
setInterval(function(){
var d =new Date().getTime();
console.log("== ");
console.log("unix => ",d);
i=parseInt(d)
console.log("Divid i/86400=> ",i/86400);
console.log("Modulo i%86400=> ",i%8640
Correct, that's exactly what I'm saying.
On Mon, Apr 30, 2012 at 10:37 AM, samal wrote:
> thanks tyler for reply.
>
> are you saying user1uuid_*{ts%86400}* would lead to unique day bucket
> which will be timezone {NZ to US} independent? I will try.
>
>
> On Mon, Apr 30, 2012 at 8:25 PM, Tyler H
thanks tyler for reply.
are you saying user1uuid_*{ts%86400}* would lead to unique day bucket
which will be timezone {NZ to US} independent? I will try.
On Mon, Apr 30, 2012 at 8:25 PM, Tyler Hobbs wrote:
> Don't use dates or datestamps as the buckets for your row keys, use a unix
> timestamp
Don't use dates or datestamps as the buckets for your row keys, use a unix
timestamp modulo whatever size you want your bucket to be instead.
Timestamps don't involve time zones or any of that nonsense.
So, instead of having keys like "user1uuid_30042012", the second half would
be replaced the cur
Hello List,
I need suggestion/ recommendation on time series data.
I have requirement where users belongs to different timezone and they can
subscribe to global group.
When users at specific timezone send update to group it is available to
every user in different timezone.
I am using GroupSubscr
This is actually fairly similar to how we store metrics at Cloudkick.
Below has a much more in depth explanation of some of that
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/
So we store each natural point in the NumericArchive table.
our keys look like:
.
Anyway
On Thu, 15 Apr 2010 11:27:47 +0200 Jean-Pierre Bergamin
wrote:
JB> Am 14.04.2010 15:22, schrieb Ted Zlatanov:
>> On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin"
>> wrote:
>>
JB> The metrics are stored together with a timestamp. The queries we want to
JB> perform are:
JB> * The last
Am 14.04.2010 15:22, schrieb Ted Zlatanov:
On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin"
wrote:
JB> The metrics are stored together with a timestamp. The queries we want to
JB> perform are:
JB> * The last value of a specific metric of a device
JB> * The values of a specific m
Hi Jean-Pierre,
I'm investigating using Cassandra for a very similar use case, maybe
we can chat and compare notes sometime. But basically, I think you
want to pull the metric name into the row key and use simple CF
instead of SCF. So, your example:
"my_server_1": {
"cpu_usage": {
James,
i'm a big fan of Cassandra, but have you looked at
http://en.wikipedia.org/wiki/RRDtool
is is natively built for this type of problem
Alex
On Wed, Apr 14, 2010 at 9:02 AM, Jean-Pierre Bergamin wrote:
> Hello everyone
>
> We are currently evaluating a new DB system (replacing MySQL) to st
On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin"
wrote:
JB> The metrics are stored together with a timestamp. The queries we want to
JB> perform are:
JB> * The last value of a specific metric of a device
JB> * The values of a specific metric of a device between two timestamps t1 and
first of all I am a new bee by Non-SQL. I try write down my opinions as
references:
If I were you, I will use 2 columnfamilys:
1.CF, key is devices
2.CF, key is timeuuid
how do u think about that?
Mike
On Wed, Apr 14, 2010 at 3:02 PM, Jean-Pierre Bergamin wrote:
> Hello everyone
>
> We are
Hello everyone
We are currently evaluating a new DB system (replacing MySQL) to store
massive amounts of time-series data. The data are various metrics from
various network and IT devices and systems. Metrics i.e. could be CPU usage
of the server "xy" in percent, memory usage of server "xy" in MB,
42 matches
Mail list logo