Hi,
I'm always interessted in such benchmark experiments, because the
databases evolve so fast, that the race is always open and there is a
lot motion in there.
And of course I askes myself the same question. And I think that this
publication is unreliable. For 4 reasons (from reading very fast,
Hi Or,
I did some sort of this a while ago. If your machines do have a free
disk slot - just put another disk there and use it as another
data_file_directory.
If not - as in my case:
- grab an usb dock for disks
- put the new one in there, plug in, format, mount to /mnt etc.
- I did an onlin
i just have read this benchmark pdf, does anyone have some opinion about this?
i think it's not fair about cassandra
url:http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf
http://msrg.utoronto.ca/papers/NoSQLBenchmark
All,
We have a Cassandra cluster which seems to be struggling a bit. I have one node
which crashes continually, and others which crash sporadically. When they crash
it's with a JVM couldn't allocate memory, even though there's heaps available.
I suspect it's because one table which is very big.
do you have to replace those disks? can you simply add new disks to those
nodes and configure C* to use JBOD?
On Dec 18, 2014 10:18 AM, "Or Sher" wrote:
> Hi all,
>
> We have a situation where some of our nodes have smaller disks and we
> would like to align all nodes by replacing the smaller dis
Thanks Ken. Any other use cases where counters are used apart from Rainbird
?
Rajath Subramanyam
On Thu, Dec 18, 2014 at 5:12 PM, Ken Hancock
wrote:
>
> Here's one from Twitter...
>
>
> http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-20
Here's one from Twitter...
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
On Thu, Dec 18, 2014 at 6:08 PM, Rajath Subramanyam
wrote:
>
> Hi Folks,
>
> Have any of you come across blogs that describe how companies in the
> industry are using Cassandra coun
Hi Folks,
Have any of you come across blogs that describe how companies in the
industry are using Cassandra counters practically.
Thanks in advance.
Regards,
Rajath
Rajath Subramanyam
This topic comes up quite a bit. Enough, in fact, that I've done a 1 hour
webinar on the topic. I cover how the JVM GC works and things you need to
consider when tuning it for Cassandra.
https://www.youtube.com/watch?v=7B_w6YDYSwA
With your specific problem - full GC not reducing the old gen -
V
On Dec 4, 2014 11:14 PM, "Philo Yang" wrote:
> Hi,all
>
> I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full
> gc that sometime there may be one or two nodes full gc more than one time
> per minute and over 10 seconds each time, then the node will be unreachable
> and the
On Tue, Dec 16, 2014 at 12:38 AM, Jonas Borgström
wrote:
>
> That said, I've done some testing and it appears to be possible to
> perform an in place conversion as long as all nodes contain all data (3
> nodes and replication factor 3 for example) like this:
I would expect this to work, but to s
On Wed, Dec 17, 2014 at 7:04 PM, Kevin Burton wrote:
>
> I’m trying to figure out the best way to bootstrap our nodes.
>
> I *think* I want our nodes to be manually bootstrapped. This way an admin
> has to explicitly bring up the node in the cluster and I don’t have to
> worry about a script acci
On Mon, Dec 15, 2014 at 12:41 AM, Mathijs Vogelzang
wrote:
>
> Would it be possible to trigger a manual partial compaction, to first
> compact 4x 256 tables? Could this be added to nodetool if it doesn't exist
> already?
>
JMX call forceUserDefinedCompaction.
=Rob
I'd consider solving your root problem of "people are starting and stopping
servers in prod accidentally" instead of making Cassandra more difficult to
manage operationally.
On Thu Dec 18 2014 at 4:04:34 AM Ryan Svihla wrote:
> why auto_bootstrap=false? The documentation even suggests the opposi
Hi Or,
You don't have another machine on the network that would temporarily be
able to host your /var/lib/cassandra content? That way you would simply be
scp:ing the files temporarily to another machine and copy them back when
done. You obviously want to do a repair afterwards just in case, but th
Hi all,
We have a situation where some of our nodes have smaller disks and we would
like to align all nodes by replacing the smaller disks to bigger ones
without replacing nodes.
We don't have enough space to put data on / disk and copy it back to the
bigger disks so we would like to rebuild the n
@Colin -
I bounce back and forth on classifying storm and spark as stream processing
frameworks. Clearly they are marketed as stream processing frameworks and
they can process data streams. Even with the commercial stream processing
products, expressing joins with some of the products is a bit "qui
Almost every stream processing system I know of offers joins out of the box and
has done so for years
Even open source offerings like Esper have offered joins for years.
What hasnt are systems like storm, spark, etc which I dont really classify as
stream processors anyway.
--
Colin Clark
Hi Peter,
You are right.The idea is to directly query the data from No SQL, in our
case via Spark SQL on Spark (as largely Spark support
Mongo/Cassandra/HBase/Hadoop). As you said, the business users still need
to query using Spark SQL. We are already using No SQL BI tools like Pentaho
(which also
by data warehouse, what kind do you mean?
is it the traditional warehouse where people create multi-dimensional cubes?
or is it the newer class of UI tools that makes it easier for users to
explore data and the warehouse is "mostly" a denormalized (ie flattened)
format of the OLTP?
or is it a comb
in the interest of knowledge sharing on the general topic of stream
processing. the domain is quite old and there's a lot of existing
literature.
within this space there are several important factors which many products
don't address:
temporal windows (sliding windows, discrete windows, dynamic w
Thanks Ryan and Peter for the suggestions.
Our requirement(an ecommerce company) at a higher level is to build a
Datawarehouse as a platform or service(for different product teams to
consume) as below:
Datawarehouse as a platform/service
|
Spark SQL
My mistake on Storm, and I'm certain there are a number of use cases where
you're right Spark isn't the right answer, but I'd argue your treating it
like 0.5 Spark feature set wise instead of 1.1 Spark.
As for filtering before persistence..this is the common use case for spark
streaming and I've h
Hi,
I am occasionally seeing:
WARN [ReadStage:9576] 2014-12-18 11:16:19,042 SliceQueryFilter.java (line
225) Read 756 live and 17027 tombstoned cells in mykeyspace.mytable (see
tombstone_warn_threshold). 5001 columns was requested,
slices=[73c31274-f45c-4ba5-884a-6d08d20597e7:myfield-],
delInfo=
for the record I think spark is good and I'm glad we have options.
my point wasn't to bad mouth spark. I'm not comparing spark to storm at
all, so I think there's some confusion here. I'm thinking of espers,
streambase, and other stream processing products. My point is to think
about the problems
I'll decline to continue the commentary on spark, as again this probably
belongs on another list, other than to say, microbatches is an intentional
design tradeoff that has notable benefits for the same use cases you're
referring too, and that while you may disagree with those tradeoffs, it's a
bit
some of the most common types of use cases in stream processing is sliding
windows based on time or count. Based on my understanding of spark
architecture and spark streaming, it does not provide the same
functionality. One can fake it by setting spark streaming to really small
micro-batches, but t
Since Ajay is already using spark the Spark Cassandra Connector really gets
them where they want to be pretty easily
https://github.com/datastax/spark-cassandra-connector (joins, etc).
As far as spark streaming having "basic support" I'd challenge that
assertion (namely Storm has a number of probl
that depends on what you mean by real-time analytics.
For things like continuous data streams, neither are appropriate platforms
for doing analytics. They're good for storing the results (aka output) of
the streaming analytics. I would suggest before you decide cassandra vs
hbase, first figure out
I'd argue the higher latency for reads than HBase, I'm not sure of what
experience you have with both, and that may have been true at one point,
but with Leveled Compaction Strategy and proper JVM tunings I'm not sure
how this is true, it would at least be comparable. I've worked with buffer
cached
why auto_bootstrap=false? The documentation even suggests the opposite. If
you don't auto_bootstrap the node will take queries before it has copies of
all the data, and you'll get the wrong answer (it'd not be unlike using CL
ONE when you've got a bunch of dropped mutations on a single node in the
I'm not sure that'll work with that many version moves in the middle,
upgrades are to my knowledge only tested between specific steps, namely
from 1.2.9 to the latest 2.0.x
http://www.datastax.com/documentation/upgrade/doc/upgrade/cassandra/upgradeC_c.html
Specifically:
Cassandra 2.0.x restrictio
Many thanks for information Dennis and Karl.
I don’t think I can test until Monday, but I will let you know what (hopefully)
works.
Regards
Nigel
From: d...@aegisco.com [mailto:d...@aegisco.com]
Sent: 17 December 2014 22:31
To: user@cassandra.apache.org
Subject: Re: Cassandra metrics & Graphite
Hi,
while curious on the new incremental repairs I updated our cluster to C*
version 2.1.2 via the Debian apt-repository. Everything went quite well,
but trying to start the tools sstablemetadata and sstablerepairedset
lead to the following error:
root@a01:/home/ifjke# sstablerepairedset
34 matches
Mail list logo