Re: Site Not Surviving a Single Cassandra Node Crash

2011-04-09 Thread Joe Stump
Did the Cassandra cluster go down or did you start getting failures from the client when it routed queries to the downed node? The key in the client is to keep working around the ring if the initial node is down. --Joe On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote: > We have a 5 Cassandr

Re: Pyramid Organization of Data

2011-04-08 Thread Joe Stump
A few lines of Java in a partitioning or rack aware strategy might be able to achieve this. --Joe -- Typed with big fingers on a small keyboard. On Apr 8, 2011, at 13:17, Patrick Julien wrote: > We have a pilot project running where all our historical data > worldwide would be stored using

Re: Secondary Indexes

2011-04-03 Thread Joe Stump
On Apr 3, 2011, at 2:22 PM, Drew Kutcharian wrote: > Thanks Tyler. Can you update the wiki with these answers so they are stored > there for others to see too? Dude, it's a wiki.

Re: cassandra as session store

2011-02-01 Thread Joe Stump
FWIW we used Memcached for session data at Digg without any major issues. The one thing we did end up doing to reduce the LRU on sessions was to modify the slab size and put sessions in their own Memcached cluster. Probably not an issue for you though. +1 on Memcached. On Feb 1, 2011, at 9:57

Re: Cassandra on AWS across Regions

2010-09-01 Thread Joe Stump
On Sep 1, 2010, at 1:42 PM, Peter Fales wrote: > I probably should have made it clear that I wasn't proposing this as > an official patch (as you point out, it's not general enough for > production use). I'm just looking for feedback on the concept (thanks!) > and thought it might possibly be

Re: Cassandra & HAProxy

2010-08-28 Thread Joe Stump
On Aug 28, 2010, at 12:29 PM, Mark wrote: > Also, what would be a good way of monitoring the health of the cluster? We use Ganglia. I believe failover is usually built into clients. Not sure why using HAProxy or LVS wouldn't be a good option though. I used to use it with MySQL slaves with much

Re: RackAwareStrategy vs RackUnAwareStrategy on AWS EC2 cloud

2010-07-09 Thread Joe Stump
On Jul 9, 2010, at 1:16 PM, maneela a wrote: > Is there any way to mark cassandra node to keep it as just for replication > purpose and not to be as Primary for any data range in the ring? I believe there is. This is what we're doing, but we do all of our writes via a queue. Derek or Mike fro

Re: RackAwareStrategy vs RackUnAwareStrategy on AWS EC2 cloud

2010-07-09 Thread Joe Stump
We had similar issues when we started running Cassandra on EC2 between multiple AZ's (not regions; we're working up to that shortly). We ended up building a rack aware strategy specific to AWS, which is posted somewhere in JIRA. Basically it uses the AWS API to ensure that replicants are stored

Re: Digg 4 Preview on TWiT

2010-07-06 Thread Joe Stump
On Jul 6, 2010, at 6:18 PM, David Strauss wrote: > Then I'll tell my friend at Facebook to stick to topics he's qualified > to speak about. :-) You might want to clarify that this advice applies to all topics of discussion and not just Facebook related ones. ;) --Joe

Re: Cassandra on AWS across Regions

2010-06-29 Thread Joe Stump
On Jun 29, 2010, at 2:56 PM, Lenin Gali wrote: > Thanks Joe, I was hoping to hear from you. Can you pass me the SA contact at > AWS we would love to look in to it. Just contact your account representative. They'll get you hooked up. They have multiple SAs that help out account representatives.

Re: Cassandra on AWS across Regions

2010-06-29 Thread Joe Stump
On Jun 29, 2010, at 12:44 PM, Anthony Molinaro wrote: > Maybe you need to modify the security groups to allow the ports to be > accessible from one to the other? A likely better solution would be to look into the VPNCubed product which was built specifically for this purpose. We're in the middl

Re: django or pylons

2010-06-20 Thread Joe Stump
A lot of the magic that Django brings to the table is derived from the ORM. If you're skipping that then Pylons likely makes more sense. --Joe On Jun 20, 2010, at 5:08 PM, Charles Woerner wrote: > I recently looked into this and came to the same conclusion, but I'm not an > expert in either

Re: ec2 tests

2010-06-18 Thread Joe Stump
On Jun 18, 2010, at 6:39 PM, Olivier Mallassi wrote: > and I did not see any improvements (Cassandra stays around 7000 W/sec). It's a brave new world where N+1 scaling with 7,000 writes per second per node is considered suboptimal performance. --Joe

Re: Cassandra data loss

2010-05-24 Thread Joe Stump
On May 24, 2010, at 10:01 AM, Steve Lihn wrote: > So if I set it up to be strongly consistent, I should have the same level of > consistency as traditional relational DB ? If you do, say, QUORUM on the consistency level it will ensure at least 2 out of the 3 replicants have responded back that

Re: Cassandra data loss

2010-05-24 Thread Joe Stump
This is largely FUD. Cassandra let's you choose how consistent you want writes to be. The more consistency you choose, the slower the writes, but it's very unlikely with high consistency that you'll lose data. That being said, if you write with a consistency level of 0 then, yes, you could lose

Re: is cassandra really a 'handsoff' solution once setup?

2010-05-14 Thread Joe Stump
On May 14, 2010, at 1:13 PM, Ryan King wrote: > I wouldn't use it (at least at our scale) without engineers willing to > run a java debugger. :) +1 We spend a lot of time hacking on the internals (Mike Malone and Derek Smith on the list can chime in further here). But, we have had little issue

Re: is cassandra really a 'handsoff' solution once setup?

2010-05-14 Thread Joe Stump
On May 14, 2010, at 12:46 PM, S Ahmed wrote: > For those with live apps, how has it been? (fb/digg/twitter people, would > love your experiences) I didn't say it didn't require *any* administration. Just that it required *minimal* administration. I'd say we spend about a quarter of an engineer

Re: Trove maps

2010-05-04 Thread Joe Stump
On May 4, 2010, at 6:24 PM, Tatu Saloranta wrote: > But of course Apache can impose their own, however misguided silly > rules on projects under their umbrella. :-) I smell an -ac'esque patch to Cassandra brewing. ;) --Joe

Re: The Difference Between Cassandra and HBase

2010-04-25 Thread Joe Stump
On Apr 25, 2010, at 5:18 PM, Eric Hauser wrote: > Out of curiosity, are you planning on copying the data you store in > HBase/Hive into separate Hadoop cluster in a different data center or backing > up HDFS in some other manner? Redundancy isn't an issue within the cluster; > it's more a con

Re: The Difference Between Cassandra and HBase

2010-04-25 Thread Joe Stump
On Apr 25, 2010, at 11:40 AM, Mark Robson wrote: > For me an important difference is that Cassandra is operationally much more > straightforward - there is only one type of node, and it is fully redundant > (depending what consistency level you're using). > > This seems to be an advantage in C

Re: Just to be clear, cassandra is web framework agnostic b/c of Thrift?

2010-04-18 Thread Joe Stump
On Apr 18, 2010, at 5:33 PM, S Ahmed wrote: > Obviously if you run asp.net on windows, it is probably a VERY good idea to > be running cassandra on a linux box. Actually, I'm not sure this is true. A few people have found Windows performs fairly well with Cassandra, if I recall correctly. Obvi

Re: Memcached protocol?

2010-04-04 Thread Joe Stump
Seems like this would be pretty easy to build on top of the proxy stuff that was recently mentioned. I don't see a reason why you couldn't just store key/blob-in-column to get running quickly. Might make for a pretty interesting clustered queue system as well, which has been mentioned before on

Re: Deployment on AWS

2010-04-03 Thread Joe Stump
On Apr 3, 2010, at 2:54 PM, Benjamin Black wrote: > I'm pretty familiar with EC2, hence the question. I don't believe any > patches are required to do these things. Regardless, as I noted in > that ticket, you definitely do NOT need AWS credentials to determine > your availability zone. It is

Re: LazyBoy question

2010-04-03 Thread Joe Stump
On Apr 3, 2010, at 2:00 PM, Jonathan Ellis wrote: > I don't think Lazyboy exposes range queries [that is, iterating rows > whose keys you do not know ahead of time]. Pycassa does, though. I think ieure's fork has itertools support that will let you do crazy iteration stuff with it. I haven't d

Re: Deployment on AWS

2010-04-03 Thread Joe Stump
On Apr 3, 2010, at 1:53 PM, Benjamin Black wrote: > What specific features are you looking for to operate on EC2? It seemed people weren't looking for features, but tools to help with the management. The two things we've created that people might be interested in are: 1. An EC2-specific rack-a

Re: Deployment on AWS

2010-04-03 Thread Joe Stump
On Apr 2, 2010, at 4:49 PM, Masood Mortazavi wrote: > Is there a ready recipe for deploying a Cassandra cluster in AWS? ... (Seeds > need some "fixed" IP addresses.) We have a lot of code around this that we're trying to get released. We have a rack aware strategy for cross-AZ clusters. We als

Re: How reliable is cassandra?

2010-03-29 Thread Joe Stump
On Mar 29, 2010, at 12:40 PM, Eric Hauser wrote: > BTW, does anyone from Digg patrol the list? I'm really interested in some > additional the implementation of atomic counters with ZooKeeper. I know at least three Diggers patrol the list and one of them is a committer to Cassandra. Last I hea

Re: How reliable is cassandra?

2010-03-29 Thread Joe Stump
On Mar 29, 2010, at 11:55 AM, Eric Hauser wrote: > Does the information is the below link about Cassandra and replication over > WAN have any merit or is it just FUD? I can attest Cassandra works fine over inter-DC connections. We have ~20 nodes spread across three Amazon "Availability Zones".

Re: How reliable is cassandra?

2010-03-29 Thread Joe Stump
On Mar 29, 2010, at 11:31 AM, Matthew Stump wrote: > Am I crazy to want to switch our server's primary data store from postgres to > cassandra? This is a system used by banks and governments to store crypto > keys which absolutely can not be lost. You might be crazy. PostgreSQL has all sorts

Re: Startup issue when big data in.

2010-03-20 Thread Joe Stump
On Mar 20, 2010, at 3:33 AM, Lenin Gali wrote: > 1.what kind of performance are you getting, how many writes vs reads do you > do per min? Our performance is quite good. Here are some HDD benchmarks I've ran: http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html > 2. have

Re: Digg's data model

2010-03-20 Thread Joe Stump
On Mar 20, 2010, at 2:53 AM, Lenin Gali wrote: > 1. Eventual consistency: Given a volume of 5K writes / sec and roughly 1500 > writes are Updates per sec while the rest are inserts, what kind of latency > can be expected in eventual consistency? Depending on the size of the cluster you're not

Re: Digg's data model

2010-03-19 Thread Joe Stump
On Mar 19, 2010, at 1:16 PM, Gary wrote: > I am a newbie to bigtable like model and have a question as follows. Take > Digg as an example, I want to find a list users who dug a URL and also want > to find a list of URLs a user dug. How should the data model look like for > the queries to be ef