Re: User click count

2014-12-29 Thread Ajay
Thanks Janne, Alain and Eric. Now say I go with counters (hourly, daily, monthly) and also store UUID as below: user Id : /mm/dd as row key and dynamic columns for each click with column key as timestamp and value as empty. Periodically count the columns and rows and correct the counters. Now

Re: CQL3 vs Thrift

2014-12-29 Thread Peter Lin
The kind of query language I'm thinking of is closer to Datalog, which is what Datomic uses. It's a personal bias, but I find it easier and cleaner to express joins, subqueries and correlated subqueries in a LISP-like/datalog like syntax than SQL. Since CQL is modeled/inspired by SQL, it inherits

Re: Internal pagination in secondary index queries

2014-12-29 Thread Jonathan Haddad
Secondary indexes are there for convenience, not performance. If you're looking for something performant, you'll need to maintain your own indexes. On Mon Dec 29 2014 at 3:22:58 PM Sam Klock wrote: > Hi folks, > > Perhaps this is a question better addressed to the Cassandra developers > direct

Re: CQL3 vs Thrift

2014-12-29 Thread Eric Stevens
So while not exactly the same, this seems like a good analogy for suggesting a third interface to fix problems with existing interfaces: http://xkcd.com/927/ Even if the CQL parsing code in Cassandra is subpar (I haven't studied it), that's not an especially compelling case to suggest replacing th

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread mck
> Perf is better, correctness seems less so. I value latter more than > former. Yeah no doubt. Especially in CASSANDRA-6285 i see some scary stuff went down. But there are no outstanding bugs that we know of, are there? (CASSANDRA-6815 remains just a wrap up of how options are to be presente

Internal pagination in secondary index queries

2014-12-29 Thread Sam Klock
Hi folks, Perhaps this is a question better addressed to the Cassandra developers directly, but I thought I'd ask it here first. We've recently been benchmarking certain uses of secondary indexes in Cassandra 2.1.x, and we've noticed that when the number of items in an index reaches beyond so

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread Robert Coli
On Mon, Dec 29, 2014 at 2:03 PM, mck wrote: > We saw an improvement when we switched to HSHA, particularly for our > offline (hadoop/spark) nodes. > Sorry i don't have the data anymore to support that statement, although > i can say that improvement paled in comparison to cross_node_timeout > whi

Re: Nodes Dying in 2.1.2

2014-12-29 Thread Robert Coli
Might be https://issues.apache.org/jira/browse/CASSANDRA-8061 or one of the linked/duplicate tickets. =Rob On Mon, Dec 29, 2014 at 1:40 PM, Robert Coli wrote: > On Wed, Dec 24, 2014 at 9:41 AM, Phil Burress > wrote: > >> Just upgraded our cluster from 2.1.1 to 2.1.2 and our nodes keep dying. >

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread mck
> > Should I stick to 2048 or try > > with something closer to 128 or even something else ? 2048 worked fine for us. > > About HSHA, > > I anti-recommend hsha, serious apparently unresolved problems exist with > it. We saw an improvement when we switched to HSHA, particularly for our offline

Re: Changing replication factor of Cassandra cluster

2014-12-29 Thread Robert Coli
On Mon, Dec 29, 2014 at 1:40 PM, Pranay Agarwal wrote: > I want to understand what is the best way to increase/change the replica > factor of the cassandra cluster? My priority is consistency and probably I > am tolerant about some down time of the cluster. Is it totally weird to try > changing r

Re: Node down during move

2014-12-29 Thread Robert Coli
On Tue, Dec 23, 2014 at 12:29 AM, Jiri Horky wrote: > just a follow up. We've seen this behavior multiple times now. It seems > that the receiving node loses connectivity to the cluster and thus > thinks that it is the sole online node, whereas the rest of the cluster > thinks that it is the only

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread Robert Coli
On Mon, Dec 29, 2014 at 2:29 AM, Alain RODRIGUEZ wrote: > Sorry about the gravedigging, but what would be a good start value to tune > "rpc_max_threads" ? > Depends on whether you prefer that clients get a slow thread or none. > I mean, default is unlimited, the value commented is 2048. Native

Re: Nodes Dying in 2.1.2

2014-12-29 Thread Robert Coli
On Wed, Dec 24, 2014 at 9:41 AM, Phil Burress wrote: > Just upgraded our cluster from 2.1.1 to 2.1.2 and our nodes keep dying. > The kernel is killing the process due to out of memory: > https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ Appears to only occur during comp

Re: Changing replication factor of Cassandra cluster

2014-12-29 Thread Pranay Agarwal
Thanks Ryan. I want to understand what is the best way to increase/change the replica factor of the cassandra cluster? My priority is consistency and probably I am tolerant about some down time of the cluster. Is it totally weird to try changing replica later or are there people doing it for produ

Re: CQL3 vs Thrift

2014-12-29 Thread Peter Lin
In my bias opinion something else should replace CQL and it needs a proper rewrite on the sever side. I've studied the code and having written query parsers and planners, what is there today isn't going to work long term. Whatever replaced both thrift and CQL needs to provide 100% of the featu

Re: CQL3 vs Thrift

2014-12-29 Thread Robert Coli
On Tue, Dec 23, 2014 at 10:26 AM, Peter Lin wrote: > > I'm bias in favor of using both thrift and CQL3, though many people on the > list probably think I'm crazy. > I don't think you're "crazy" but I do think you will ultimately face the deprecation of thrift. Briefly, I disbelieve the idea tha

Re: diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Alain RODRIGUEZ
I made an error on Topic title. We are indeed going to do it (that's why I made the mistake), but I am speaking of 1.2 --> 2.0 here, and we will start by this before going to 2.1, since we want to do it in rolling upgrade way. Thanks for your enlightening pointer about this vanished "pressure val

Re: diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Jason Wee
What you are asking maybe answer in the code level and pretty deep stuff, at least from user (like me) point of view. But to quote Jonathan in CASSANDRA-3534, Then you will be able to say "use X amount of memory for memtables, Y amount for the cache (and monitor Z amount for the bloom filters)" whi

Re: Repair/Compaction Completion Confirmation

2014-12-29 Thread Alain RODRIGUEZ
I noticed (and reported) a bug that made me drop this tool --> https://github.com/BrianGallew/cassandra_range_repair/issues/16 Might this be related somehow ? C*heers Alain 2014-11-21 13:30 GMT+01:00 Paulo Ricardo Motta Gomes < paulo.mo...@chaordicsystems.com>: > Hey guys, > > Just reviving th

Re: Why a cluster don't start after cassandra.yaml range_timeout parameter change ?

2014-12-29 Thread Alain RODRIGUEZ
Did you solved this issue ? I guess nobody answers you because this is very weird. I also guess you've made some mistake on the configuration. Anyway, let me know if you managed to get out of the mess somehow or if you still need help. C*heers 2014-12-03 15:57 GMT+01:00 Castelain, Alain : > >

Re: diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Alain RODRIGUEZ
Thanks for the pointer Jason, Yet, I thought that cache and memtables went off-heap only in version 2.1 and not 2.0 ("As of Cassandra 2.0, there are two major pieces of the storage engine that still depend on the JVM heap: memtables and the key cache." --> http://www.datastax.com/dev/blog/off-heap

Re: User click count

2014-12-29 Thread Eric Stevens
> If the counters get incorrect, it could't be corrected You'd have to store something that allowed you to correct it. For example, the TimeUUID approach to keep true counts, which are slow to read but accurate, and a background process that trues up your counter columns periodically. On Mon, De

Re: User click count

2014-12-29 Thread Ajay
Thanks for the clarification. In my case, Cassandra is the only storage. If the counters get incorrect, it could't be corrected. For that if we store raw data, we can as well go that approach. But the granularity has to be as seconds level as more than one user can click the same link. So the data

Re: diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Jason Wee
https://issues.apache.org/jira/browse/CASSANDRA-3534 On Mon, Dec 29, 2014 at 6:58 PM, Alain RODRIGUEZ wrote: > Hi guys, > > I am looking at added and dropped option in Cassandra between 1.2.18 and > 2.0.11 and this makes me wonder: > > Why has the index_interval option been removed from cassandr

Re: Best practice for sorting on frequent updated column?

2014-12-29 Thread Eric Stevens
This is a bit difficult. Depending on your access patterns and data volume, I'd be inclined to keep a separate table with a (count, foreign_key) clustering key. Then do a client-side join to read the data back in the order you're looking for. That will at least make the heavily updated table hav

Re: User click count

2014-12-29 Thread Alain RODRIGUEZ
Hi Ajay, Here is a good explanation you might want to read. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters Though we use counters for 3 years now, we used them from start C* 0.8 and we are happy with them. Limits I can see in both ways are: Count

Re: User click count

2014-12-29 Thread Ajay
Hi, So you mean to say counters are not accurate? (It is highly likely that multiple parallel threads trying to increment the counter as users click the links). Thanks Ajay On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen wrote: > > Hi! > > It’s really a tradeoff between accurate and fast and

Re: User click count

2014-12-29 Thread Janne Jalkanen
Hi! It’s really a tradeoff between accurate and fast and your read access patterns; if you need it to be fairly fast, use counters by all means, but accept the fact that they will (especially in older versions of cassandra or adverse network conditions) drift off from the true click count. If

User click count

2014-12-29 Thread Ajay
Hi, Is it better to use Counter to User click count than maintaining creating new row as user id : timestamp and count it. Basically we want to track the user clicks and use the same for hourly/daily/monthly report. Thanks Ajay

diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Alain RODRIGUEZ
Hi guys, I am looking at added and dropped option in Cassandra between 1.2.18 and 2.0.11 and this makes me wonder: Why has the index_interval option been removed from cassandra.yaml ? I know we can also define it on a per table basis, yet, this global option was quite useful to tune memory usage.

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread Alain RODRIGUEZ
Hi, Sorry about the gravedigging, but what would be a good start value to tune " rpc_max_threads" ? I mean, default is unlimited, the value commented is 2048. Native protocol seems to only allow 128 simultaneous threads. Should I stick to 2048 or try with something closer to 128 or even something