Here's a snitch we use for this situation - it uses a property file if it
exists, but falls back to EC2 autodiscovery if it is missing.
https://github.com/barchart/cassandra-plugins/blob/master/src/main/java/com/barchart/cassandra/plugins/snitch/GossipingPropertyFileWithEC2FallbackSnitch.java
On
The latest consensus around the web for running Cassandra on EC2 seems to
be "use new SSD instances." I've not seen any mention of the elephant in
the room - using the new SSD instances significantly raises the cluster
cost per TB. With Cassandra's strength being linear scalability to many
terabyte
We are building a historical timeseries database for stocks and futures,
with trade prices aggregated into daily bars (open, high, low, close values
for the day). The latest bar for each instrument needs to be updated as new
trades arrive on the realtime data feeds. Depending on the trading volume
If your nodes are not actually evenly distributed across physical racks for
redundancy, don't use multiple racks.
On Tue, Aug 5, 2014 at 10:57 AM, DE VITO Dominique <
dominique.dev...@thalesgroup.com> wrote:
> First, thanks for your answer.
>
> > This is incorrect. Network Topology w/ Vnodes wi
ashboard that shows how long it
>> takes for data to get sent across various DCs.
>>
>
> The brute force method described downthread by Jeremy Jongsma gives you
> something like the monitoring you're looking for, but I continue to believe
> it's probably a bad idea to try to design a system in this way.
>
> =Rob
>
>
The brute force way would be:
1) Make client connections to a node in each datacenter from your
monitoring tool.
2) Periodically write a row to one datacenter (at whatever consistency
level your application typically uses.)
3) Immediately query the other datacenter nodes for the same row key with
Yes, and all nodes have had at least two more scheduled repairs since then.
On Jul 30, 2014 1:47 AM, "Or Sher" wrote:
> Did you ran a repair after changing replication factor for system_auth ?
>
>
> On Tue, Jul 29, 2014 at 5:48 PM, Jeremy Jongsma
> wrote:
>
>>
, Jul 22, 2014 at 8:53 AM, Jeremy Jongsma wrote:
> Verified all clocks are in sync.
>
>
> On Mon, Jul 21, 2014 at 10:03 PM, Rahul Menon wrote:
>
>> I could you perhaps check your ntp?
>>
>>
>> On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma
>> wrote:
&
We also run a nightly "nodetool snapshot" on all nodes, and use duplicity
to sync the snapshot to S3, keeping 7 days' worth of backups.
Since duplicity tracks incremental changes this gives you the benefit of
point-in-time snapshots without duplicating sstables that are common across
multiple back
My experience is similar to Nicholas'. Basic usage was easy to get a handle
on, but the advanced tuning/tweaking info is scattered EVERYWHERE around
the web, mostly on personal blogs. It feels like it took way too long to
become confident enough in my understanding of Cassandra that I trust our
dep
Verified all clocks are in sync.
On Mon, Jul 21, 2014 at 10:03 PM, Rahul Menon wrote:
> I could you perhaps check your ntp?
>
>
> On Tue, Jul 22, 2014 at 3:35 AM, Jeremy Jongsma
> wrote:
>
>> I routinely get this exception from cql
I routinely get this exception from cqlsh on one of my clusters:
cql.cassandra.ttypes.AuthenticationException:
AuthenticationException(why='org.apache.cassandra.exceptions.ReadTimeoutException:
Operation timed out - received only 2 responses.')
The system_auth keyspace is set to replicate X times
?
>
>
>
> I've seen a lot of deployments, and I think you captured the scenarios and
> reasoning quite well. You can apply other nuances and details to #2 (e.g.
> segment based on SLA or topology), but I agree with all of your reasoning.
>
> -Tupshin
> -Global Field S
Do you prefer purpose-specific Cassandra clusters that support a single
application's data set, or a single Cassandra cluster that contains column
families for many applications? I realize there is no ideal answer for
every situation, but what have your experiences been in this area for
cluster pla
You'd be better off using external indexing (ElasticSearch or Solr),
Cassandra isn't really designed for this sort of querying.
On Jun 24, 2014 3:09 AM, "Mike Carter" wrote:
> Hello!
>
>
> I'm a beginner in C* and I'm quite struggling with it.
>
> I’d like to measure the performance of some Cassa
Use a ByteBuffer value type with your own serialization (we use protobuf
for complex value structures)
On Jun 24, 2014 5:30 AM, "Tuukka Mustonen"
wrote:
> Hello,
>
> I need to store a list of mixed types in Cassandra. The list may contain
> numbers, strings and booleans. So I would need something
ood performance in Cassandra...
>>> I wish there was a simpler way to query in batches. Opening a large
>>> amount of connections and sending 1 message at a time seems bad to me, as
>>> sometimes you want to work with small rows.
>>> It's no surprise Cassandra
e other jars are). bin/cassandra.in.sh
> does this:
>
> for jar in "$CASSANDRA_HOME"/lib/*.jar; do
> CLASSPATH="$CLASSPATH:$jar"
> done
>
>
>
> On Fri, Jun 20, 2014 at 12:58 PM, Jeremy Jongsma
> wrote:
>
>> Where do I add my custom sni
Where do I add my custom snitch JAR to the Cassandra classpath so I can use
it?
age to each host.
> This would decrease resource usage, am I wrong?
>
> []s
>
>
> 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma :
>
> I've found that if you have any amount of latency between your client and
>> nodes, and you are executing a large batch of queries, you'
I've found that if you have any amount of latency between your client and
nodes, and you are executing a large batch of queries, you'll usually want
to send them together to one node unless execution time is of no concern.
The tradeoff is resource usage on the connected node vs. time to complete
al
One option is to add new nodes, and do a node repair/cleanup on everything.
That will at least reduce your per-node data size.
On Wed, Jun 18, 2014 at 11:01 AM, Brian Tarbox
wrote:
> I'm running on AWS m2.2xlarge instances using the ~800 gig
> ephemeral/attached disk for my data directory. My
That will not necessarily scale, and I wouldn't recommend it - your "backup
node" will need as much disk space as an entire replica of the cluster
data. For a cluster with a couple of nodes that may be OK, for dozens of
nodes, probably not. You also lose the ability to restore individual nodes
- th
Good to know, thanks Peter. I am worried about client-to-node latency if I
have to do 20,000 individual queries, but that makes it clearer that at
least batching in smaller sizes is a good idea.
On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford
wrote:
> On Wed, Jun 11, 2014 at 10:12 AM, Jer
ed, Jun 11, 2014 at 9:33 AM, Jeremy Jongsma wrote:
> I'm using Astyanax with a query like this:
>
> clusterContext
> .getClient()
> .getKeyspace("instruments")
> .prepareQuery(INSTRUMENTS_CF)
> .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORU
27;} AND
compression={'sstable_compression': 'SnappyCompressor'};
On Tue, Jun 10, 2014 at 6:35 PM, Laing, Michael
wrote:
> Perhaps if you described both the schema and the query in more detail, we
> could help... e.g. did the query have an IN clause with 2 keys
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting
in a full scan), I'm requesting 2 specific rows by key.
On Jun 10, 2014 6:02 PM, "DuyHai Doan" wrote:
> Hello Jeremy
>
> Basically what you are doing is to ask Cassandra to do a distributed full
> scan on all the part
I ran an application today that attempted to fetch 20,000+ unique row keys
in one query against a set of completely empty column families. On a 4-node
cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
heap), every single node immediately ran out of memory and became
unresp
I'm in the process of migrating data over to cassandra for several of our
apps, and a few of the schemas use secondary indexes. Four times in the
last couple months I've run into a corrupted sstable belonging to a
secondary index, but have never seen this on any other sstables. When it
happens, any
A dead node is still allocated key ranges, and Cassandra will wait for it
to come back online rather than redistributing its data. It needs to be
decommissioned or replaced by a new node for it to be truly dead as far as
the cluster is concerned.
On Tue, Jun 3, 2014 at 11:12 AM, Prem Yadav wrote
I wouldn't recommend doing this before regular backups for the simple
reason that for large data sets it will take a long time to run, and
will require that your node backup schedule be properly staggered (you
should never be running repair on all nodes at the same time.) Backups
should be trea
It appears that only adding the CA certificate to the truststore is
sufficient for this.
On Thu, May 22, 2014 at 10:05 AM, Jeremy Jongsma
wrote:
> The docs say that each node needs every other node's certificate in its
> local truststore:
>
>
> http://www.datastax.com/doc
The docs say that each node needs every other node's certificate in its
local truststore:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html
This seems like a bit of a headache for adding nodes to a cluster. How do
others deal with this?
1) If I a
33 matches
Mail list logo