Re: Lot of GC on two nodes out of 7

2016-03-01 Thread Jeff Jirsa
Compaction falling behind will likely cause additional work on reads (more sstables to merge), but I’d be surprised if it manifested in super long GC. When you say twice as many sstables, how many is that?. In cfstats, does anything stand out? Is max row size on those nodes larger than on othe

Lot of GC on two nodes out of 7

2016-03-01 Thread Anishek Agarwal
Hello, we have a cassandra cluster of 7 nodes, all of them have the same JVM GC configurations, all our writes / reads use the TokenAware Policy wrapping a DCAware policy. All nodes are part of same Datacenter. We are seeing that two nodes are having high GC collection times. Then mostly seem to

Re: List of List

2016-03-01 Thread Sandeep Kalra
Thanks Everyone. I am not using thrift. I am reading CQL and understanding to use it. Best Regards, Sandeep Kalra On Tue, Mar 1, 2016 at 9:51 PM, Dani Traphagen wrote: > Hey Sandeep, > > It's good to understand why using Thrift isn't a good idea so I'll help > with that. You'll mostly hear pe

Re: Commit log size vs memtable total size

2016-03-01 Thread Vlad
Tyler, thanks for explanation! So commit segment can contain both data from flushed table A and non-flushed table B.How is it replayed on start up? Does C* skip portions belonging to table A that already were written to SSTable? Regards, Vlad On Tuesday, March 1, 2016 11:37 PM, Tyler Hobbs

Re: List of List

2016-03-01 Thread Dani Traphagen
Hey Sandeep, It's good to understand why using Thrift isn't a good idea so I'll help with that. You'll mostly hear people say RUN AWAY FROM THRIFT WITH THE MIGHTY STRIDE OF A GAZELLE. The reason why is that it's old and not supported. You'll end up with a broken pile of parts and you definitely do

Re: Practical limit on number of column families

2016-03-01 Thread Jack Krupansky
It is the total table count, across all key spaces. Memory is memory. -- Jack Krupansky On Tue, Mar 1, 2016 at 6:26 PM, Brian Sam-Bodden wrote: > Eric, > Is the keyspace as a multitenancy solution as bad as the many tables > pattern? Is the memory overhead of keyspaces as heavy as that of tab

DATA replication from Oracle DB to Cassandra

2016-03-01 Thread anil_ah
Hi    I want to run spark job to do incremental sync from oracle to cassandra,job interval could be one minute.we are looking for a real time replication with latency of 1 or 2 min. Please advise  what would be best Approch 1)oracle db->spark sql ->spark->cassandra.2)oracle db ->sqoop->cass

Broken links in Apache Cassandra home page

2016-03-01 Thread ANG ANG
Reference: http://stackoverflow.com/questions/35712166/broken-links-in-apache-cassandra-home-page/35724686#35724686 The following links are broken in the Apache Cassandra Home/Welcome page: 1. "materialized views": http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views

Fwd: DATA replication from Oracle DB to Cassandra

2016-03-01 Thread anil_ah
Original message From: anil_ah Date: 03/02/2016 9:11 am (GMT+08:00) To: User Cassandra Subject: DATA replication from Oracle DB to Cassandra Hi    I want to run spark job to do incremental sync from oracle to cassandra,job interval could be one minute.we are look

Re: Querying on index

2016-03-01 Thread Jonathan Haddad
That feels like a serious bug. Definitely file a JIRA with as many details as possible. https://issues.apache.org/jira/browse/CASSANDRA/ On Tue, Mar 1, 2016 at 4:38 PM Rakesh Kumar wrote: > Looks like Bloom filter size was the issue. Once I disabled it, the query > returns rows correctly, bu

Re: Querying on index

2016-03-01 Thread Rakesh Kumar
Looks like Bloom filter size was the issue. Once I disabled it, the query returns rows correctly, but it was terrible slow (expected since it will hit SStable every time). -Original Message- From: Rakesh Kumar To: user Sent: Tue, Mar 1, 2016 4:57 pm Subject: Re: Querying on index

Re: Snitch for AWS EC2 nondefaultVPC

2016-03-01 Thread Robert Coli
On Tue, Mar 1, 2016 at 12:12 PM, Arun Sandu wrote: > All our nodes are launched in AWS EC2 VPC (private). We have 2 > datacenters(1 us-east , 1- asiapacific) and all communication is through > private IP's and don't have any public IPs. What is the recommended snitch > to be used? We currently ha

Re: List of List

2016-03-01 Thread Robert Coli
On Tue, Mar 1, 2016 at 3:23 PM, Jonathan Haddad wrote: > Thrift is deprecated, and will be removed in Cassandra 4.0 Don't do any > new development with it. > +infinity this. =Rob

Re: Practical limit on number of column families

2016-03-01 Thread Brian Sam-Bodden
Eric, Is the keyspace as a multitenancy solution as bad as the many tables pattern? Is the memory overhead of keyspaces as heavy as that of tables? Cheers, Brian On Tuesday, March 1, 2016, Eric Stevens wrote: > It's definitely not true for every use case of a large number of tables, > but for

Re: List of List

2016-03-01 Thread Jonathan Haddad
Thrift is deprecated, and will be removed in Cassandra 4.0 Don't do any new development with it. What video says to use thrift? On Tue, Mar 1, 2016 at 2:29 PM Sandeep Kalra wrote: > I am in very early stage , so, I can change. Infact, the videos you > pointed also says to do so... > > > Best R

Re: List of List

2016-03-01 Thread Sandeep Kalra
I am in very early stage , so, I can change. Infact, the videos you pointed also says to do so... Best Regards, Sandeep Kalra On Tue, Mar 1, 2016 at 3:58 PM, Jack Krupansky wrote: > Thrift? Hah! Sorry, I can't help you if you are going that route. I > recommend CQL - only. > > -- Jack Krupans

Re: List of List

2016-03-01 Thread Jack Krupansky
Thrift? Hah! Sorry, I can't help you if you are going that route. I recommend CQL - only. -- Jack Krupansky On Tue, Mar 1, 2016 at 4:47 PM, Sandeep Kalra wrote: > The way I was planning is to give a restful interface to lookup details of > a question, and then user must get complete list of ans

Re: Commit log size vs memtable total size

2016-03-01 Thread Jack Krupansky
It would be nice to get this info into the doc or at least a blog post. -- Jack Krupansky On Tue, Mar 1, 2016 at 4:37 PM, Tyler Hobbs wrote: > > On Tue, Mar 1, 2016 at 6:13 AM, Vlad wrote: > >> So commit log can't keep more than memtable size, why is difference in >> commit log and memtables s

Re: Querying on index

2016-03-01 Thread Rakesh Kumar
At this time no one else is using this table. So the data is static. -Original Message- From: Rakesh Kumar To: user Sent: Tue, Mar 1, 2016 4:54 pm Subject: Querying on index Cassandra: 3.3On my test system I create a tablecreate table eventinput( event_id varchar ,event_class_c

Querying on index

2016-03-01 Thread Rakesh Kumar
Cassandra: 3.3 On my test system I create a table create table eventinput ( event_id varchar , event_class_cd int , event_ts timestamp , client_id varchar , event_message text , primary key ((client_id,event_id),event_ts) ) I created an index on client_id create in

Re: Isolation for atomic batch on the same partition key

2016-03-01 Thread Tyler Hobbs
On Mon, Feb 22, 2016 at 3:58 PM, Yawei Li wrote: > > 1. If an atomic batch (logged batch) contains a bunch of row mutations > and all of them have the same partition key, can I assume all those changes > have the same isolation as the row-level isolation? According to the post > here http://www.

Re: IF NOT EXISTS with multiple static columns confusion

2016-03-01 Thread Tyler Hobbs
What version of Cassandra are you using? I just tested this out against trunk and got reasonable behavior: cqlsh:ks1> CREATE TABLE test (k int, s1 int static, s2 int static, c int, v int, PRIMARY KEY (k, c)); cqlsh:ks1> INSERT INTO test (k, c, v) VALUES (0, 0, 0); cqlsh:ks1> UPDATE test SET s1 =

Re: List of List

2016-03-01 Thread Sandeep Kalra
The way I was planning is to give a restful interface to lookup details of a question, and then user must get complete list of answers and its comments. I am using thrift interface and node-js to serve it. Search on questions are using subject tag and/or its content, Best Regards, Sandeep Kalra

Re: Commit log size vs memtable total size

2016-03-01 Thread Tyler Hobbs
On Tue, Mar 1, 2016 at 6:13 AM, Vlad wrote: > So commit log can't keep more than memtable size, why is difference in > commit log and memtables sizes? In order to purge a commitlog segment, *all* memtables that contain data from that segment must be flushed to disk. Suppose you have two tables

Re: List of List

2016-03-01 Thread Jack Krupansky
Okay, so a very large number of questions, each with a very modest number of answers (generally under 5), each with a modest number of comments (generally under 5). Now we're back to the issue of how you wish to query and access the data. -- Jack Krupansky On Tue, Mar 1, 2016 at 12:39 PM, Sandee

Re: Snitch for AWS EC2 nondefaultVPC

2016-03-01 Thread Asher Newcomer
Hi Arun, This distinction has been a can of worms for me also - and I'm not sure my understanding is entirely correct. I use GossipingPropertyFileSnitch for my multi-region setup, which seems to be more flexible than the Ec2 snitches. The Ec2 snitches should work also, but their behavior is more

RE: Snitch for AWS EC2 nondefaultVPC

2016-03-01 Thread Jun Wu
I've worked on some experiments with AWS EC2. According to the doc you provided and from my own experience, EC2Multiregionsnitich should be the right setting as you have 2 different datacenters. In cassandra.yaml: change seeds to public address list, change listen and rpc address to private addr

Re: Cassandra Ussages

2016-03-01 Thread Andrés Ivaldi
Hello Jack What do you mind with "the map datatype with string key values effectively gives you extensible columns" Regards On Tue, Mar 1, 2016 at 1:34 PM, Jack Krupansky wrote: > OLAP using Cassandra and Spark: > > http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandr

Re: Cassandra Ussages

2016-03-01 Thread Andrés Ivaldi
Thanks all for the tips, Mainly we are replacing an OLAP cube, but our engine works fine with RDBMS directly so with the low latency of cassandra it could work nice (extensibility of this is what worries me). We will give a try to Cassandra + Spark Thanks again!! On Tue, Mar 1, 2016 at 2:59 PM, J

Snitch for AWS EC2 nondefaultVPC

2016-03-01 Thread Arun Sandu
Hi all, All our nodes are launched in AWS EC2 VPC (private). We have 2 datacenters(1 us-east , 1- asiapacific) and all communication is through private IP's and don't have any public IPs. What is the recommended snitch to be used? We currently have GossipingPropertyFileSnitch. 1. If Ec2MultiRegio

Re: Practical limit on number of column families

2016-03-01 Thread Eric Stevens
It's definitely not true for every use case of a large number of tables, but for many uses where you'd be tempted to do that, adding whatever would have driven your table naming instead as a column in your partition key on a smaller number of tables will meet your needs. This is especially true if

Re: List of List

2016-03-01 Thread Sandeep Kalra
Thanks a lot. I have started with the videos too..I will get back if I see any problem. Best Regards, Sandeep Kalra On Tue, Mar 1, 2016 at 12:36 PM, Jonathan Haddad wrote: > I'd do something like this: > > CREATE TABLE questions ( > question_id timeuuid primary key, > question text >

Re: Checking replication status

2016-03-01 Thread Bryan Cheng
HI Jeremy, For more insight into the hint system, these two blog posts are great resources: http://www.datastax.com/dev/blog/modern-hinted-handoff, and http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery . For timeframes, that's going to differ bas

Re: List of List

2016-03-01 Thread Jonathan Haddad
I'd do something like this: CREATE TABLE questions ( question_id timeuuid primary key, question text ); CREATE TABLE answers ( question_id timeuuid, answer_id timeuuid, answer text, primary key(question_id, answer_id) ); CREATE TABLE comments ( answer_id timeuuid,

Re: Consistent read timeouts for bursts of reads

2016-03-01 Thread Carlos Alonso
We have had similar issues sometimes. Usually the problem was that failing queries where reading the same partition that another query still running and that partition is too big. The fact that is reading the same partition is why your query works upon retry. The fact that the partition (or the r

Re: Cassandra Ussages

2016-03-01 Thread Jack Krupansky
I would spin it as Cassandra being the right choice where your primary need in OLTP and with a secondary need for analytics. IOW, where you would otherwise need to use two separate databases for the same data. -- Jack Krupansky On Tue, Mar 1, 2016 at 12:40 PM, Jonathan Haddad wrote: > Spark &

Re: List of List

2016-03-01 Thread Sandeep Kalra
​I do not have limit of number of Answers or its comments.​ Assume it to be clone of StackOverflow.. Best Regards, Sandeep Kalra On Tue, Mar 1, 2016 at 11:29 AM, Jack Krupansky wrote: > Clustering columns are your friends. > > But the first question is how you need to query the data. Queries

IOException: MkDirs Failed to Create in Spark

2016-03-01 Thread Anuj Wadehra
Hi We are using Spark with Cassandra. While using rdd.saveAsTextFile("/tmp/dr"), we are getting following error when we run the application with root access. Spark is able to create two level of directories but fails after that with Exception: 16/03/01 22:59:48 WARN TaskSetManager: Lost task

Re: Cassandra Ussages

2016-03-01 Thread Jonathan Haddad
Spark & Cassandra work just fine together, but, as I said, Cassandra is *primarily* used for OLTP. If your main use case is analytics, I would use something that's built for analytics. If 90%+ of your queries are going to be 1-10ms & customer facing, then you're good to go. If you're building so

Re: List of List

2016-03-01 Thread Jonathan Haddad
You probably want to watch some intro videos on Datastax Academy. https://academy.datastax.com/ I suggest the intro video to some basics down: https://academy.datastax.com/courses/ds101-introduction-cassandra and then core concepts, a pretty thorough intro: https://academy.datastax.com/courses/ds2

Re: List of List

2016-03-01 Thread Jack Krupansky
Clustering columns are your friends. But the first question is how you need to query the data. Queries drive data models in Cassandra. What is the cardinality of this data - how many answers per question and how many comments per answer? -- Jack Krupansky On Tue, Mar 1, 2016 at 12:23 PM, Sande

List of List

2016-03-01 Thread Sandeep Kalra
Hi all. I am beginner in Cassandra. I am working on Q&A project where I have to maintain a list of list for objects. For e.g. A Question can have list of Answers, and each Answer can then have list of Comments. -- As of now I have 3 tables. Questions, Answers, and Comments. I have stored UID of

Re: Cassandra Ussages

2016-03-01 Thread Jack Krupansky
OLAP using Cassandra and Spark: http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandra-and-spark What is the cardinality of your cube dimenstions? Obviously any multi-dimensional data must be flattened. Cassandra tables have fixed named columns, but... the map datatype w

Re: Cassandra Ussages

2016-03-01 Thread Andrés Ivaldi
Jonathan thanks for the link, I believe that maybe is good as Data Store part, because is fast for I/o and handles Time Series, for analytics could be with Apache Ignite and/or Apache Spark what it worries me is that looks very complex create the structure for each Fact table and then extend regar

Re: Practical limit on number of column families

2016-03-01 Thread Jack Krupansky
I don't think Cassandra was "purposefully developed" for some target number of tables - there is no evidence of any such an explicit intent. Instead, it would be fair to say that Cassandra was "not purposefully developed" with a goal of supporting "large numbers of tables." Sometimes features and c

Re: Disable writing to debug.log

2016-03-01 Thread Michael Mior
There are instructions given /etc/cassandra/logback.xml Looking later in the file, you'll see the following: 1024 0 true Commenting out this section will disable writing to debug.log. -- Michael Mior mm...@uwaterloo.ca 2016-03-01 10:43 GMT-05:00 Rakesh Kumar : > V

Disable writing to debug.log

2016-03-01 Thread Rakesh Kumar
Version: Cassandra 3.3 Can anyone tell on how to disable writing to debug.log. thanks.

Re: Practical limit on number of column families

2016-03-01 Thread Vlad
>If your Jira search fu is strong enoughAnd it is! ) >you should be able to find it yourselfAnd I did! ) I see that this issue originates to problem with Java GC's design, but according to date it was Java 6 time. Now we have J8 with new  GC mechanism. Is this problem still exists with J8? Any ch

Re: Practical limit on number of column families

2016-03-01 Thread Fernando Jimenez
Hi Jack Being purposefully developed to only handle up to “a few hundred” tables is reason enough. I accept that, and likely a use case with many tables was never really considered. But I would still like to understand the design choices made so perhaps we gain some confidence level in this upp

Re: Practical limit on number of column families

2016-03-01 Thread Jack Krupansky
I'll defer to one of the senior committers as to whether they want that information disseminated any further than it already is. It was intentionally not documented since it is not recommended. If your Jira search fu is strong enough you should be able to find it yourself, but again, its use is str

Re: Practical limit on number of column families

2016-03-01 Thread Vlad
Hi Jack, >you can reduce the overhead per table  an undocumented Jira Can you please >point to this Jira number? >it is strongly not recommendedWhat is consequences of this (besides >performance degradation, if any)? Thanks. On Tuesday, March 1, 2016 7:23 AM, Jack Krupansky wrote: 3

Re: Practical limit on number of column families

2016-03-01 Thread Jack Krupansky
I don't think there are any "reasons behind it." It is simply empirical experience - as reported here. Cassandra scales in two dimension - number of rows per node and number of nodes. If some source of information lead you to believe otherwise, please point out the source so that we can endeavor t

Commit log size vs memtable total size

2016-03-01 Thread Vlad
Hi,there are following parameters in casansdra.yaml: memtable_total_space_in_mb (1/4 of heap, e.g. 512MB)- Specifies the total memory used for all memtables on a node. commitlog_total_space_in_mb (8GB) - Total space used for commit logs. If the used space goes above this value, Cassandra rounds u

Re: Practical limit on number of column families

2016-03-01 Thread Fernando Jimenez
Hi Tommaso It’s not that I _need_ a large number of tables. This approach maps easily to the problem we are trying to solve, but it’s becoming clear it’s not the right approach. At the moment I’m trying to understand the limitations in Cassandra regarding number of Tables and the reasons behin

Re: Practical limit on number of column families

2016-03-01 Thread tommaso barbugli
Hi Fernando, I used to have a cluster with ~300 tables (1 keyspace) on C* 2.0, it was a real pain in terms of operations. Repairs were terribly slow, boot of C* slowed down and in general tracking table metrics becomes bit more work. Why do you need this high number of tables? Tommaso On Tue, Ma

Re: Practical limit on number of column families

2016-03-01 Thread Fernando Jimenez
Hi Jack By entry I mean row Apologies for the “obsolete terminology”. When I first looked at Cassandra it was still on CQL2, and now that I’m looking at it again I’ve defaulted to the terms I already knew. I will bear it in mind and call them tables from now on. Is there any documentation abou