RE: Data Modelling Help

2015-04-29 Thread Donald Smith
Secondary indicies are inefficient and are deprecated, as far as I know. Unless you store many thousands of emails for a long time (which I recommend against), just use a single table with the partition key being the userid and the timestamp being the clustering (column) key, as in your schema.

Cassandra hanging in IntervalTree.comparePoints() and in CompactionController.maxPurgeableTimestamp()

2015-04-29 Thread Donald Smith
We deployed a brand new 13 node 2.1.4 C* cluster and used sstabloader to stream about 500GB into cassandra. The streaming took less than a day but afterwards pending compactions do not decrease. The Cassandra nodes (which have about 500 pending compactions each) seem to spend most of their t

Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in data dir

2015-04-28 Thread Donald Smith
Using 2.1.4, tables in our data/ directory are showing up as our_table-147a2090ed4211e480153bc81e542ebd/ instead of as our_table/ Why would that happen? We're also seeing lagging compactions and high cpu usage. Thanks, Don

Questions about bootrapping and compactions during bootstrapping

2014-12-16 Thread Donald Smith
Looking at the output of "nodetool netstats" I see that the bootstrapping nodes pulling from only two of the nine nodes currently in the datacenter. That surprises me: I'd think the vnodes it pulls from would be randomly spread across the existing nodes. We're using Cassandra 2.0.11 with 256

RE: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Donald Smith
then do a write, with the current timestamp. The record in disk will have a timestamp greater than the one in the memtable. On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith mailto:donald.sm...@audiencescience.com>> wrote: Question about the read path in cassandra. If a partition/row is

Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Donald Smith
Question about the read path in cassandra. If a partition/row is in the Memtable and is being actively written to by other clients, will a READ of that partition also have to hit SStables on disk (or in the page cache)? Or can it be serviced entirely from the Memtable? If you select all colu

RE: stream_throughput_outbound_megabits_per_sec

2014-10-22 Thread Donald Smith
2014 4:05 AM To: user@cassandra.apache.org Subject: Re: stream_throughput_outbound_megabits_per_sec On Thu, Oct 16, 2014 at 1:54 AM, Donald Smith mailto:donald.sm...@audiencescience.com>> wrote: stream_throughput_outbound_megabits_per_sec is the timeout per operation on the streami

stream_throughput_outbound_megabits_per_sec

2014-10-15 Thread Donald Smith
stream_throughput_outbound_megabits_per_sec is the timeout per operation on the streaming socket. The docs recommend not to have it too low (because a timeout causes streaming to restart from the beginning). But the default 0 never times out. What's a reasonable value? Does it stream an en

Re: Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

2014-10-15 Thread Donald Smith
Sent: Wednesday, October 15, 2014 1:54 PM To: user@cassandra.apache.org Subject: Re: Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges? On Tue, Oct 14, 2014 at 4:52 PM, Donald Smith mailto:donald.sm...@audiencescience.com>&

Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

2014-10-14 Thread Donald Smith
Suppose I create a new DC with 25 nodes. I have their IPs in cassandra-topology.properties. Twenty-three of the nodes start up, but two of the nodes fail to start. If I start replicating (via "nodetool rebuild") without those two nodes, then when those 2 nodes enter the DC the distribution o

timeout for port 7000 on stateful firewall? streaming_socket_timeout_in_ms?

2014-09-29 Thread Donald Smith
We have a stateful firewall between data centers for port 7000 (inter-cluster). How long should the idle timeout be for the connections on the firewall? Similarly what's appropriate for streaming_socket_timeout_in_ms in cassandra.yaml? The defaul

Experience with multihoming cassandra?

2014-09-25 Thread Donald Smith
We have large boxes with 256G of RAM and SSDs. From iostat, top, and sar we think the system has excess capacity. Anyone have recommendations about multihoming cassandra on such a node (connecting it to multiple IPs and running multiple cassandras

RE: Would warnings about overlapping SStables explain high pending compactions?

2014-09-25 Thread Donald Smith
version are you on? Do you have pending compactions and no ongoing compactions? /Marcus On Wed, Sep 24, 2014 at 11:35 PM, Donald Smith mailto:donald.sm...@audiencescience.com>> wrote: On one of our nodes we have lots of pending compactions (499).In the past we’ve seen pending compactions

Would warnings about overlapping SStables explain high pending compactions?

2014-09-24 Thread Donald Smith
On one of our nodes we have lots of pending compactions (499).In the past we've seen pending compactions go up to 2400 and all the way back down again. Investigating, I saw warnings such as the following in the logs about overlapping SStables and about needing to run "nodetool scrub" on a ta

Adjusting readahead for SSD disk seeks

2014-09-24 Thread Donald Smith
We're using cassandra as a key-value store; our values are small. So we're thinking we don't need much disk readahead (e.g., "blockdev -getra /dev/sda"). We're using SSDs. When cassandra does disk seeks to satisfy read requests does it typically have to read in the entire SStable into memory

Is there harm from having all the nodes in the seed list?

2014-09-23 Thread Donald Smith
Is there any harm from having all the nodes listed in the seeds list in cassandra.yaml? Donald A. Smith | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.com [AudienceScience]

dropped MUTATION messages on remote DC when using cross_node_timeout: true

2014-09-21 Thread Donald Smith
We have C* 2.0.9 running on three DCs, with ntpd running to synchronize time. On our local DC we set *cross_node_timeout: true* in cassandra.yaml without problem. But when we did it in a remote DC we got lots of messages like INFO [ScheduledTasks:1] 2014-09-21 21:26:45,191 MessagingService.java

Is it wise to increase native_transport_max_threads if we have lots of CQL clients?

2014-09-19 Thread Donald Smith
If we have hundreds of CQL clients (for C* 2.0.9), should we increase native_transport_max_threads in cassandra.yaml from the default (128) to the number of clients? If we don't do that, I presume requests will queue up, resulting in higher latency, What's a reasonable max value for increas

Trying to understand cassandra gc logs

2014-09-15 Thread Donald Smith
I understand that cassandra uses ParNew GC for New Gen and CMS for Old Gen (tenured). I'm trying to interpret in the logs when a Full GC happens and what kind of Full GC is used. It never says "Full GC" or anything like that. But I see that whenever there's a line like 2014-09-15T18:04:1

Rebuilding a cassandra seed node with the same tokens and same IP address

2014-08-29 Thread Donald Smith
One of our nodes is getting an increasing number of pending compactions due, we think, to https://issues.apache.org/jira/browse/CASSANDRA-7145 , which is fixed in future version 2.0.11 . (We had the same error a month ago, but at that time we were in pre-production and could just clean the di

RE: How often are JMX Cassandra metrics reset?

2014-08-29 Thread Donald Smith
s the exp. decaying one which is weighted for the last 5 minutes. http://grepcode.com/file/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.2.0/com/yammer/metrics/core/Timer.java?av=f Chris Lohfink On Aug 28, 2014, at 5:39 PM, Donald Smith mailto:donald.sm...@audiencescience.com>

RE: How often are JMX Cassandra metrics reset?

2014-08-28 Thread Donald Smith
MX as well as doing some of it's own aggregation depending on the rollup size. On Thu, Aug 28, 2014 at 12:36 PM, Robert Coli mailto:rc...@eventbrite.com>> wrote: On Thu, Aug 28, 2014 at 9:27 AM, Donald Smith mailto:donald.sm...@audiencescience.com>> wrote: And yet OpsCenter show

RE: How often are JMX Cassandra metrics reset?

2014-08-28 Thread Donald Smith
metrics reset? On Wed, Aug 27, 2014 at 12:38 PM, Donald Smith mailto:donald.sm...@audiencescience.com>> wrote: I’m using JMX to retrieve Cassandra metrics. I notice that Max and Count are cumulative and aren’t reset.How often are the stats for Mean, 99tthPercentile, etc reset back t

How often are JMX Cassandra metrics reset?

2014-08-27 Thread Donald Smith
I'm using JMX to retrieve Cassandra metrics. I notice that Max and Count are cumulative and aren't reset.How often are the stats for Mean, 99tthPercentile, etc reset back to zero? For example, 99thPercentile shows as 1.5 mls. Over how many minutes? ClientRequest/Read/Latency: Latency

RE: adding more nodes into the cluster

2014-08-01 Thread Donald Smith
According to datastax’s documentation at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html “By default, this setting [auto_bootstrap] is true and not listed in the cassandra.yaml file.” But http://wiki.apache.org/cassandra/StorageConfigurati

Problem with /etc/cassandra for cassandra 2.0.8

2014-06-17 Thread Donald Smith
I installed a package version of cassandra via "sudo yum install cassandra20.noarch" into a clean host and got: cassandra20.noarch 2.0.8-2 @datastax That resulted in a problem: /etc/cassandra/ did not exist. So I did "sudo yum downgrade cassandra20.noarch" and got version 2.0.7.

RE: Schema disagreement errors

2014-05-13 Thread Donald Smith
I too have noticed that after doing “nodetool flush” (or “nodetool drain”), the commit logs are still there. I think they’re NEW (empty) commit logs, but I may be wrong. Anyone know? Don From: Gaurav Sehgal [mailto:gsehg...@gmail.com] Sent: Monday, May 12, 2014 12:31 PM To: user@cassandra.apach

RE: Cassandra data retention policy

2014-04-28 Thread Donald Smith
CQL lets you specify a default TTL per column family/table: and default_time_to_live=86400 . From: Redmumba [mailto:redmu...@gmail.com] Sent: Monday, April 28, 2014 12:51 PM To: user@cassandra.apache.org Subject: Re: Cassandra data retention policy Have you looked into using a TTL? You can set

RE: Lots of commitlog files

2014-04-14 Thread Donald Smith
ntime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) but our commit log files are each 32MB in size. Is this indicative of a bug? Shouldn't they be 1024MB in size? Don From: Donald Smith Sent: Monday, April 14, 2014 12:04 PM

Logs of commitlog files

2014-04-14 Thread Donald Smith
1. With cassandra 2.0.6, we have 547G of files in /var/lib/commitlog/. I started a "nodetool flush" 65 minutes ago; it's still running. The 17536 commitlog files have been created in the last 3 days. (The node has 2.1T of sstables data in /var/lib/cassandra/data/. This is in staging, not pro

Setting gc_grace_seconds to zero and skipping "nodetool repair (was RE: Timeseries with TTL)

2014-04-07 Thread Donald Smith
This statement is significant: “BTW if you never delete and only ttl your values at a constant value, you can set gc=0 and forget about periodic repair of the table, saving some space, IO, CPU, and an operational step.” Setting gc_grace_seconds to zero has the effect of not storing hinted handof

Question about rpms from datastax

2014-03-27 Thread Donald Smith
On http://rpm.riptano.com/community/noarch/ what's the difference between cassandra20-2.0.6-1.noarch.rpm and dsc20-2.0.6-1.noarch.rpm ? Thanks, Don Donal

RE: Question about how compaction and partition keys interact

2014-03-26 Thread Donald Smith
lustering keys? Hope that helps. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 [Image removed by sender.]<http://www.linkedin.com/in/jlacefield> [Image removed by sender.]<http://www.datastax.com/what-we-offer/products-services/training/virtual-tr

RE: memory usage spikes

2014-03-26 Thread Donald Smith
Prem, Did you follow the instructions at http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html?scroll=reference_ds_sxl_gf3_2k And did you install jna-3.2.7.jar into /usr/share/java, as per http://www.datastax.com/documentation/cassandra/2.0/mobile/c

Question about how compaction and partition keys interact

2014-03-26 Thread Donald Smith
In CQL we need to decide between using ((customer_id,type),date) as the CQL primary key for a reporting table, versus ((customer_id,date),type). We store reports for every day. If we use (customer_id,type) as the partition key (physical key), then we have a WIDE ROW where each date's data is s

nodetool scrub throws exception FileAlreadyExistsException

2014-03-26 Thread Donald Smith
% time nodetool scrub -s as_reports data_report_info_2011 xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k Exception in thread "main" FSWriteError in /mnt/cas

Speed of sstableloader

2014-03-11 Thread Donald Smith
I tested bulk loading in cassandra with CQLSSTableWriter and sstableloader. It turns out that writing 1 millions rows with sstableloader took over twice as long as inserting regularly with batch CQL statements from Java (cassandra-driver-core, version 2.0.0). Specifically, the call to sstable

RE: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Donald Smith
You may need to do "chown -R cassandra /var/lib/cassandra /var/log/cassandra" . Don From: user 01 [mailto:user...@gmail.com] Sent: Monday, March 10, 2014 10:23 AM To: user@cassandra.apache.org Subject: Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra" $ sudo su -

RE: replication_factor: ?

2014-03-07 Thread Donald Smith
Robert, please elaborate why you say "To make best use of Cassandra, my minimum recommendation is usually RF=3, N=6." I surmise that with any less than 6 nodes, you'd likely perform better with a sequential/single-node solution. You need at least six nodes to overcome the overheads from concur

RE: Supported Cassandra version for CentOS 5.5

2014-02-26 Thread Donald Smith
Cassandra 1.2.12 on CentOS 5.10. Was running 1.1.15 previously without any issues as well. -Arindam From: Donald Smith [mailto:donald.sm...@audiencescience.com] Sent: Tuesday, February 25, 2014 3:40 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: Supported Cas

RE: Supported Cassandra version for CentOS 5.5

2014-02-25 Thread Donald Smith
I was unable to get cassandra working with CentOS 5.X . I needed to use CentOS 6.2 or 6.4. Don From: Hari Rajendhran Sent: Tuesday, February 25, 2014 2:34 AM To: user@cassandra.apache.org Subject: Supported Cassandra version for CentOS 5.5 Hi, Currently i am

Corrupted Index File exceptions in 2.0.5

2014-02-18 Thread Donald Smith
We're getting exceptions like the one below using cassandra 2.0.5. A google search turns up nothing about these except the source code. Anyone have any insight? ERROR [CompactionExecutor:188] 2014-02-12 04:15:53,232 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecuto

RE: Dangers of "sudo swapoff --all"

2014-02-13 Thread Donald Smith
I meant to say "Doing "sudo swapon -a" on that node fixed the problem. From: Donald Smith [mailto:donald.sm...@audiencescience.com] Sent: Thursday, February 13, 2014 2:57 PM To: 'user@cassandra.apache.org' Subject: Dangers of "sudo swapoff --all" I fo

Dangers of "sudo swapoff --all"

2014-02-13 Thread Donald Smith
I followed the recommendations at http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/install/installRecommendSettings.html and did: $ sudo swapoff -all on each of the cassandra servers in my test cluster. I noticed, though, that sometimes the cassandra server and

RE: Warning about copying and pasting from datastax configuration page: weird characters in config

2014-02-11 Thread Donald Smith
02/11/2014 04:50 PM, Donald Smith wrote: > In > http://www.datastax.com/documentation/cassandra/2.0/mobile/cassandra/install/installRecommendSettings.html > it says: Just curious.. why are you using the mobile site on a desktop, instead of the main page? [0] -- Michael [0] http://ww

Warning about copying and pasting from datastax configuration page: weird characters in config

2014-02-11 Thread Donald Smith
In http://www.datastax.com/documentation/cassandra/2.0/mobile/cassandra/install/installRecommendSettings.html it says: Packaged installs: Ensure that the following settings are included in the /etc/security/limits.d/cassandra.conf file: cassandra - memlock unlimited cassandra - nofile 10 c

RE: Question about local reads with multiple data centers

2014-01-30 Thread Donald Smith
ly it is the same for the other datastax drivers. Best wishes, Duncan. On 30/01/14 02:07, Donald Smith wrote: > We have two datacenters, DC1 and DC2 in our test cluster. Our *write* > process uses a connection string with just the two hosts in DC1. Our *read* > process uses

Question about local reads with multiple data centers

2014-01-29 Thread Donald Smith
We have two datacenters, DC1 and DC2 in our test cluster. Our write process uses a connection string with just the two hosts in DC1. Our read process uses a connection string just with the two hosts in DC2. We use a PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data cen

RE: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Donald Smith
Last week I made a feature request to apache cassandra along these lines: https://issues.apache.org/jira/browse/CASSANDRA-6611 Don From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Monday, January 27, 2014 4:05 PM To: user@cassandra.apache.org Subject: Re: No deletes - is periodic repa

Benchanmarks of cassandra replication across data centers?

2014-01-24 Thread Donald Smith
Does anyone know of any good benchmark data about cassandra replication across data centers? I'm aware of the articles below. This article http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html from netflix is about benchmarking Cassandra scalability using AWS. It shows

Possible optimization: avoid creating tombstones for TTLed columns if updates to TTLs are disallowed

2014-01-21 Thread Donald Smith
I'm aware of https://issues.apache.org/jira/browse/CASSANDRA-4917, which optimizes tombstone creation for TTLed columns: "We only need to ensure that ExpiringColumn and tombstone together live as long as gc_grace. If the ExpiringColumn's TTL>=gc_grace_seconds then we can create an already gcable