Compaction falling behind will likely cause additional work on reads (more
sstables to merge), but I’d be surprised if it manifested in super long GC.
When you say twice as many sstables, how many is that?.
In cfstats, does anything stand out? Is max row size on those nodes larger than
on othe
Hello,
we have a cassandra cluster of 7 nodes, all of them have the same JVM GC
configurations, all our writes / reads use the TokenAware Policy wrapping
a DCAware policy. All nodes are part of same Datacenter.
We are seeing that two nodes are having high GC collection times. Then
mostly seem to
Thanks Everyone. I am not using thrift. I am reading CQL and understanding
to use it.
Best Regards,
Sandeep Kalra
On Tue, Mar 1, 2016 at 9:51 PM, Dani Traphagen
wrote:
> Hey Sandeep,
>
> It's good to understand why using Thrift isn't a good idea so I'll help
> with that. You'll mostly hear pe
Tyler, thanks for explanation!
So commit segment can contain both data from flushed table A and non-flushed
table B.How is it replayed on start up? Does C* skip portions belonging to
table A that already were written to SSTable?
Regards, Vlad
On Tuesday, March 1, 2016 11:37 PM, Tyler Hobbs
Hey Sandeep,
It's good to understand why using Thrift isn't a good idea so I'll help
with that. You'll mostly hear people say RUN AWAY FROM THRIFT WITH THE
MIGHTY STRIDE OF A GAZELLE. The reason why is that it's old and not
supported. You'll end up with a broken pile of parts and you definitely
do
It is the total table count, across all key spaces. Memory is memory.
-- Jack Krupansky
On Tue, Mar 1, 2016 at 6:26 PM, Brian Sam-Bodden
wrote:
> Eric,
> Is the keyspace as a multitenancy solution as bad as the many tables
> pattern? Is the memory overhead of keyspaces as heavy as that of tab
Hi I want to run spark job to do incremental sync from oracle to
cassandra,job interval could be one minute.we are looking for a real time
replication with latency of 1 or 2 min.
Please advise what would be best Approch
1)oracle db->spark sql ->spark->cassandra.2)oracle db ->sqoop->cass
Reference:
http://stackoverflow.com/questions/35712166/broken-links-in-apache-cassandra-home-page/35724686#35724686
The following links are broken in the Apache Cassandra Home/Welcome page:
1. "materialized views":
http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
Original message
From: anil_ah
Date: 03/02/2016 9:11 am (GMT+08:00)
To: User Cassandra
Subject: DATA replication from Oracle DB to Cassandra
Hi I want to run spark job to do incremental sync from oracle to
cassandra,job interval could be one minute.we are look
That feels like a serious bug. Definitely file a JIRA with as many details
as possible. https://issues.apache.org/jira/browse/CASSANDRA/
On Tue, Mar 1, 2016 at 4:38 PM Rakesh Kumar wrote:
> Looks like Bloom filter size was the issue. Once I disabled it, the query
> returns rows correctly, bu
Looks like Bloom filter size was the issue. Once I disabled it, the query
returns rows correctly, but it was terrible slow (expected since it will hit
SStable every time).
-Original Message-
From: Rakesh Kumar
To: user
Sent: Tue, Mar 1, 2016 4:57 pm
Subject: Re: Querying on index
On Tue, Mar 1, 2016 at 12:12 PM, Arun Sandu wrote:
> All our nodes are launched in AWS EC2 VPC (private). We have 2
> datacenters(1 us-east , 1- asiapacific) and all communication is through
> private IP's and don't have any public IPs. What is the recommended snitch
> to be used? We currently ha
On Tue, Mar 1, 2016 at 3:23 PM, Jonathan Haddad wrote:
> Thrift is deprecated, and will be removed in Cassandra 4.0 Don't do any
> new development with it.
>
+infinity this.
=Rob
Eric,
Is the keyspace as a multitenancy solution as bad as the many tables
pattern? Is the memory overhead of keyspaces as heavy as that of tables?
Cheers,
Brian
On Tuesday, March 1, 2016, Eric Stevens wrote:
> It's definitely not true for every use case of a large number of tables,
> but for
Thrift is deprecated, and will be removed in Cassandra 4.0 Don't do any
new development with it.
What video says to use thrift?
On Tue, Mar 1, 2016 at 2:29 PM Sandeep Kalra
wrote:
> I am in very early stage , so, I can change. Infact, the videos you
> pointed also says to do so...
>
>
> Best R
I am in very early stage , so, I can change. Infact, the videos you pointed
also says to do so...
Best Regards,
Sandeep Kalra
On Tue, Mar 1, 2016 at 3:58 PM, Jack Krupansky
wrote:
> Thrift? Hah! Sorry, I can't help you if you are going that route. I
> recommend CQL - only.
>
> -- Jack Krupans
Thrift? Hah! Sorry, I can't help you if you are going that route. I
recommend CQL - only.
-- Jack Krupansky
On Tue, Mar 1, 2016 at 4:47 PM, Sandeep Kalra
wrote:
> The way I was planning is to give a restful interface to lookup details of
> a question, and then user must get complete list of ans
It would be nice to get this info into the doc or at least a blog post.
-- Jack Krupansky
On Tue, Mar 1, 2016 at 4:37 PM, Tyler Hobbs wrote:
>
> On Tue, Mar 1, 2016 at 6:13 AM, Vlad wrote:
>
>> So commit log can't keep more than memtable size, why is difference in
>> commit log and memtables s
At this time no one else is using this table. So the data is static.
-Original Message-
From: Rakesh Kumar
To: user
Sent: Tue, Mar 1, 2016 4:54 pm
Subject: Querying on index
Cassandra: 3.3On my test system I create a tablecreate table eventinput(
event_id varchar ,event_class_c
Cassandra: 3.3
On my test system I create a table
create table eventinput
(
event_id varchar ,
event_class_cd int ,
event_ts timestamp ,
client_id varchar ,
event_message text ,
primary key ((client_id,event_id),event_ts)
)
I created an index on client_id
create in
On Mon, Feb 22, 2016 at 3:58 PM, Yawei Li wrote:
>
> 1. If an atomic batch (logged batch) contains a bunch of row mutations
> and all of them have the same partition key, can I assume all those changes
> have the same isolation as the row-level isolation? According to the post
> here http://www.
What version of Cassandra are you using? I just tested this out against
trunk and got reasonable behavior:
cqlsh:ks1> CREATE TABLE test (k int, s1 int static, s2 int static, c int, v
int, PRIMARY KEY (k, c));
cqlsh:ks1> INSERT INTO test (k, c, v) VALUES (0, 0, 0);
cqlsh:ks1> UPDATE test SET s1 =
The way I was planning is to give a restful interface to lookup details of
a question, and then user must get complete list of answers and its
comments. I am using thrift interface and node-js to serve it. Search on
questions are using subject tag and/or its content,
Best Regards,
Sandeep Kalra
On Tue, Mar 1, 2016 at 6:13 AM, Vlad wrote:
> So commit log can't keep more than memtable size, why is difference in
> commit log and memtables sizes?
In order to purge a commitlog segment, *all* memtables that contain data
from that segment must be flushed to disk.
Suppose you have two tables
Okay, so a very large number of questions, each with a very modest number
of answers (generally under 5), each with a modest number of comments
(generally under 5).
Now we're back to the issue of how you wish to query and access the data.
-- Jack Krupansky
On Tue, Mar 1, 2016 at 12:39 PM, Sandee
Hi Arun,
This distinction has been a can of worms for me also - and I'm not sure my
understanding is entirely correct.
I use GossipingPropertyFileSnitch for my multi-region setup, which seems to
be more flexible than the Ec2 snitches. The Ec2 snitches should work also,
but their behavior is more
I've worked on some experiments with AWS EC2. According to the doc you provided
and from my own experience, EC2Multiregionsnitich should be the right setting
as you have 2 different datacenters.
In cassandra.yaml: change seeds to public address list, change listen and rpc
address to private addr
Hello Jack
What do you mind with "the map datatype with string key values effectively
gives you extensible columns"
Regards
On Tue, Mar 1, 2016 at 1:34 PM, Jack Krupansky
wrote:
> OLAP using Cassandra and Spark:
>
> http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandr
Thanks all for the tips,
Mainly we are replacing an OLAP cube, but our engine works fine with RDBMS
directly so with the low latency of cassandra it could work nice
(extensibility of this is what worries me).
We will give a try to Cassandra + Spark
Thanks again!!
On Tue, Mar 1, 2016 at 2:59 PM, J
Hi all,
All our nodes are launched in AWS EC2 VPC (private). We have 2
datacenters(1 us-east , 1- asiapacific) and all communication is through
private IP's and don't have any public IPs. What is the recommended snitch
to be used? We currently have GossipingPropertyFileSnitch.
1. If Ec2MultiRegio
It's definitely not true for every use case of a large number of tables,
but for many uses where you'd be tempted to do that, adding whatever would
have driven your table naming instead as a column in your partition key on
a smaller number of tables will meet your needs. This is especially true
if
Thanks a lot.
I have started with the videos too..I will get back if I see any problem.
Best Regards,
Sandeep Kalra
On Tue, Mar 1, 2016 at 12:36 PM, Jonathan Haddad wrote:
> I'd do something like this:
>
> CREATE TABLE questions (
> question_id timeuuid primary key,
> question text
>
HI Jeremy,
For more insight into the hint system, these two blog posts are great
resources: http://www.datastax.com/dev/blog/modern-hinted-handoff, and
http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery
.
For timeframes, that's going to differ bas
I'd do something like this:
CREATE TABLE questions (
question_id timeuuid primary key,
question text
);
CREATE TABLE answers (
question_id timeuuid,
answer_id timeuuid,
answer text,
primary key(question_id, answer_id)
);
CREATE TABLE comments (
answer_id timeuuid,
We have had similar issues sometimes.
Usually the problem was that failing queries where reading the same
partition that another query still running and that partition is too big.
The fact that is reading the same partition is why your query works upon
retry. The fact that the partition (or the r
I would spin it as Cassandra being the right choice where your primary need
in OLTP and with a secondary need for analytics. IOW, where you would
otherwise need to use two separate databases for the same data.
-- Jack Krupansky
On Tue, Mar 1, 2016 at 12:40 PM, Jonathan Haddad wrote:
> Spark &
I do not have limit of number of Answers or its comments. Assume it to be
clone of StackOverflow..
Best Regards,
Sandeep Kalra
On Tue, Mar 1, 2016 at 11:29 AM, Jack Krupansky
wrote:
> Clustering columns are your friends.
>
> But the first question is how you need to query the data. Queries
Hi
We are using Spark with Cassandra. While using rdd.saveAsTextFile("/tmp/dr"),
we are getting following error when we run the application with root access.
Spark is able to create two level of directories but fails after that with
Exception:
16/03/01 22:59:48 WARN TaskSetManager: Lost task
Spark & Cassandra work just fine together, but, as I said, Cassandra is
*primarily* used for OLTP. If your main use case is analytics, I would use
something that's built for analytics. If 90%+ of your queries are going to
be 1-10ms & customer facing, then you're good to go. If you're building
so
You probably want to watch some intro videos on Datastax Academy.
https://academy.datastax.com/
I suggest the intro video to some basics down:
https://academy.datastax.com/courses/ds101-introduction-cassandra
and then core concepts, a pretty thorough intro:
https://academy.datastax.com/courses/ds2
Clustering columns are your friends.
But the first question is how you need to query the data. Queries drive
data models in Cassandra.
What is the cardinality of this data - how many answers per question and
how many comments per answer?
-- Jack Krupansky
On Tue, Mar 1, 2016 at 12:23 PM, Sande
Hi all.
I am beginner in Cassandra.
I am working on Q&A project where I have to maintain a list of list for
objects.
For e.g. A Question can have list of Answers, and each Answer can then have
list of Comments.
--
As of now I have 3 tables. Questions, Answers, and Comments. I have stored
UID of
OLAP using Cassandra and Spark:
http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandra-and-spark
What is the cardinality of your cube dimenstions? Obviously any
multi-dimensional data must be flattened.
Cassandra tables have fixed named columns, but... the map datatype w
Jonathan thanks for the link,
I believe that maybe is good as Data Store part, because is fast for I/o
and handles Time Series, for analytics could be with Apache Ignite and/or
Apache Spark
what it worries me is that looks very complex create the structure for each
Fact table and then extend
regar
I don't think Cassandra was "purposefully developed" for some target number
of tables - there is no evidence of any such an explicit intent. Instead,
it would be fair to say that Cassandra was "not purposefully developed"
with a goal of supporting "large numbers of tables." Sometimes features and
c
There are instructions given /etc/cassandra/logback.xml
Looking later in the file, you'll see the following:
1024
0
true
Commenting out this section will disable writing to debug.log.
--
Michael Mior
mm...@uwaterloo.ca
2016-03-01 10:43 GMT-05:00 Rakesh Kumar :
> V
Version: Cassandra 3.3
Can anyone tell on how to disable writing to debug.log.
thanks.
>If your Jira search fu is strong enoughAnd it is! )
>you should be able to find it yourselfAnd I did! )
I see that this issue originates to problem with Java GC's design, but
according to date it was Java 6 time. Now we have J8 with new GC mechanism.
Is this problem still exists with J8? Any ch
Hi Jack
Being purposefully developed to only handle up to “a few hundred” tables is
reason enough. I accept that, and likely a use case with many tables was never
really considered. But I would still like to understand the design choices made
so perhaps we gain some confidence level in this upp
I'll defer to one of the senior committers as to whether they want that
information disseminated any further than it already is. It was
intentionally not documented since it is not recommended. If your Jira
search fu is strong enough you should be able to find it yourself, but
again, its use is str
Hi Jack,
>you can reduce the overhead per table an undocumented Jira Can you please
>point to this Jira number?
>it is strongly not recommendedWhat is consequences of this (besides
>performance degradation, if any)?
Thanks.
On Tuesday, March 1, 2016 7:23 AM, Jack Krupansky
wrote:
3
I don't think there are any "reasons behind it." It is simply empirical
experience - as reported here.
Cassandra scales in two dimension - number of rows per node and number of
nodes. If some source of information lead you to believe otherwise, please
point out the source so that we can endeavor t
Hi,there are following parameters in casansdra.yaml:
memtable_total_space_in_mb (1/4 of heap, e.g. 512MB)- Specifies the total
memory used for all memtables on a node.
commitlog_total_space_in_mb (8GB) - Total space used for commit logs. If the
used space goes above this value, Cassandra rounds u
Hi Tommaso
It’s not that I _need_ a large number of tables. This approach maps easily to
the problem we are trying to solve, but it’s becoming clear it’s not the right
approach.
At the moment I’m trying to understand the limitations in Cassandra regarding
number of Tables and the reasons behin
Hi Fernando,
I used to have a cluster with ~300 tables (1 keyspace) on C* 2.0, it was a
real pain in terms of operations. Repairs were terribly slow, boot of C*
slowed down and in general tracking table metrics becomes bit more work.
Why do you need this high number of tables?
Tommaso
On Tue, Ma
Hi Jack
By entry I mean row
Apologies for the “obsolete terminology”. When I first looked at Cassandra it
was still on CQL2, and now that I’m looking at it again I’ve defaulted to the
terms I already knew. I will bear it in mind and call them tables from now on.
Is there any documentation abou
56 matches
Mail list logo