commitlog replay missing data

2011-07-11 Thread Jeffrey Wang
Hey all, Recently upgraded to 0.8.1 and noticed what seems to be missing data after a commitlog replay on a single-node cluster. I start the node, insert a bunch of stuff (~600MB), stop it, and restart it. There are log messages pertaining to the commitlog replay and no errors, but some of the

RE: hinted handoff sleeping

2011-06-23 Thread Jeffrey Wang
bject: Re: hinted handoff sleeping On Thu, Jun 23, 2011 at 2:55 PM, Jeffrey Wang wrote: > Hey all, > > > > We’re running a slightly patched version of 0.7.3 on a cluster of 5 nodes. > I’ve been noticing a number of messages in our logs which look like this > (after a node goe

hinted handoff sleeping

2011-06-23 Thread Jeffrey Wang
Hey all, We're running a slightly patched version of 0.7.3 on a cluster of 5 nodes. I've been noticing a number of messages in our logs which look like this (after a node goes "down" and comes back up, usually just due to a GC): 2011-06-23 14:46:35,381 INFO [HintedHandoff:1] org.apache.cass

multiple clusters communicating

2011-06-06 Thread Jeffrey Wang
Hey all, We're seeing a strange issue in which two completely separate clusters (0.7.3) on the same subnet (X.X.X.146 through X.X.X.150) with 3 machines (146-148) and 2 machines (149-150). Both of them are seeded with the respective machines in their cluster, yet when we run them they end up go

RE: pig + hadoop

2011-04-19 Thread Jeffrey Wang
Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error for a while before I added that. -Jeffrey From: pob [mailto:peterob...@gmail.com] Sent: Tuesday, April 19, 2011 6:42 PM To: user@cassandra.apache.org Subject: Re: pig + hadoop Hey Aaron, I read it, and all of 3 env variabl

RE: DatabaseDescriptor.defsVersion

2011-04-15 Thread Jeffrey Wang
Done: https://issues.apache.org/jira/browse/CASSANDRA-2490 -Jeffrey -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, April 15, 2011 7:39 PM To: user@cassandra.apache.org Cc: Jeffrey Wang Subject: Re: DatabaseDescriptor.defsVersion I think you found a bug

DatabaseDescriptor.defsVersion

2011-04-15 Thread Jeffrey Wang
Hey all, I've been seeing a very rare issue with schema change conflicts on 0.7.3 (I am serializing all schema changes to a single Cassandra node and waiting for them to finish before continuing). Occasionally a node in the cluster will never report the correct schema, and I think it may have t

RE: pig counting question

2011-03-25 Thread Jeffrey Wang
ssage- From: Jeffrey Wang [mailto:jw...@palantir.com] Sent: Friday, March 25, 2011 11:42 AM To: user@cassandra.apache.org Subject: RE: pig counting question I don't think it's Pig running out of memory, but rather Cassandra itself (the data doesn't even make it to Pig). get_range_sli

RE: pig counting question

2011-03-25 Thread Jeffrey Wang
pig spilling to disk instead of growing in memory. The pig model is that you can have huge bags that don't kill you on memory but they are just slower because they spill to disk. What is the schema that you impose when you load the data? On Mar 24, 2011, at 3:57 PM, Jeffrey Wang wrote:

RE: pig counting question

2011-03-24 Thread Jeffrey Wang
pig, like so: rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(4096); or whatever value you wish. Give that a try and see if it gives you more of what you're looking for. On Mar 24, 2011, at 1:16 PM, Jeffrey Wang wrote: > Hey all, > > I'm tryi

pig counting question

2011-03-24 Thread Jeffrey Wang
Hey all, I'm trying to run a very simple Pig script against my Cassandra cluster (5 nodes, 0.7.3). I've gotten it all set up and working, but the script is giving me some strange results. Here is my script: rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(); rowct = FOREAC

insert during forced compaction

2011-03-16 Thread Jeffrey Wang
Hey all, I'm running 0.7.0 on a cluster of 5 machines. When I create a new column family after I run nodetool compact (but before it finishes), I see the error below. Seems like StorageService.getValidColumnFamilies() should make a copy of the set of column families in the case where cfNames.le

RE: running all unit tests

2011-03-15 Thread Jeffrey Wang
ron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, March 15, 2011 1:26 AM To: user@cassandra.apache.org Subject: Re: running all unit tests There is a test target in the build script. Aron On 15 Mar 2011, at 17:29, Jeffrey Wang wrote: Hey all, We're applying some patches to our own bran

running all unit tests

2011-03-14 Thread Jeffrey Wang
Hey all, We're applying some patches to our own branch of Cassandra, and we are wondering if there is a good way to run all the unit tests. Just having JUnit run all the test classes seems to result in a lot of errors that are hard to fix, so I'm hoping there's an easy way to do this. Thanks!

get_range_slices perf

2011-03-13 Thread Jeffrey Wang
Hey all, I'm trying to get a list of all the rows from a column family using get_range_slices retrieving no actual columns. I expected this operation to be pretty quick, but it seems to take a while (5-node 0.7.0 cluster takes 20 min to page through 60k keys 1000 at a time). It's not completely

RE: understanding tombstones

2011-03-09 Thread Jeffrey Wang
Yup. https://issues.apache.org/jira/browse/CASSANDRA-2305 -Jeffrey -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, March 09, 2011 6:19 PM To: user@cassandra.apache.org Subject: Re: understanding tombstones On Wed, Mar 9, 2011 at 4:54 PM, Jeffrey Wang

understanding tombstones

2011-03-09 Thread Jeffrey Wang
Hey all, I was wondering if this is the expected behavior of deletes (0.7.0). Let's say I have a 1-node cluster with a single CF which has gc_grace_seconds = 0. The following sequence of operations happens (in the given order): insert row X with timestamp T delete row X with timestamp T+1 force

when do snapshots go away?

2011-03-07 Thread Jeffrey Wang
Hi all, When I drop a column family, it creates a snapshot. When does the snapshot go away and free up the disk space? I was able to run nodetool clearsnapshot to get rid of them, but will they go away themselves? (Also, is there a purpose to keeping a snapshot around?) -Jeffrey

RE: memtable_flush_after_mins setting not working

2011-02-25 Thread Jeffrey Wang
I just noticed this thread. Does this mean that (assuming the same setup of an empty keyspace and CFs added later) if I have a CF that I write to for some time, but not enough to hit the flush limits, it will never get flushed until the server is restarted? I believe this is causing commit logs

dropped mutations, UnavailableException, and long GC

2011-02-24 Thread Jeffrey Wang
Hey all, Our setup is 5 machines running Cassandra 0.7.0 with 24GB of heap and 1.5TB disk each collocated in a DC. We're doing bulk imports from each of the nodes with RF = 2 and write consistency ANY (write perf is very important). The behavior we're seeing is this: - Nodes often se

RE: rolling window of data

2011-02-03 Thread Jeffrey Wang
it delete exactly half of it or is there stuff that might go on under the covers that makes this not work as you might expect? -Jeffrey -Original Message----- From: Jeffrey Wang [mailto:jw...@palantir.com] Sent: Thursday, February 03, 2011 3:03 PM To: user@cassandra.apache.org Subject: R

RE: rolling window of data

2011-02-03 Thread Jeffrey Wang
Thanks for the response, but unfortunately a TTL is not enough for us. We would like to be able to dynamically control the window in case there is an unusually large amount of data or something so we don't run out of disk space. One question I have in particular is: if I use the timestamp of my

RE: rolling window of data

2011-02-02 Thread Jeffrey Wang
user@cassandra.apache.org Subject: Re: rolling window of data This project may provide some inspiration for you https://github.com/thobbs/logsandra Not sure if it has a rolling window, if you find out let me know :) Aaron On 03 Feb, 2011,at 06:08 PM, Jeffrey Wang wrote: Hi, We're trying to use Cassan

rolling window of data

2011-02-02 Thread Jeffrey Wang
Hi, We're trying to use Cassandra 0.7 to store a rolling window of log data (e.g. last 90 days). We use the timestamp of the log entries as the column names so we can do time range queries. Everything seems to be working fine, but it's not clear if there is an efficient way to delete data that