Re: Cassandra behaviour

2010-07-27 Thread Peter Schuller
> So userspace throttling is probably the answer? I believe so. >  Is the normal way of > doing this to go through the JMX interface from a userspace program, > and hold off on inserts until the values fall below a given threshold? >  If so, that's going to be a pain, since most of my system is >

Re: Cassandra behaviour

2010-07-26 Thread tsuraan
> It's reading through keys in the index and adding offset information > about roughly every 128th entry in RAM, in order to speed up reads. > Performing a binary search in an sstable from scratch would be > expensive. Because of the high cost of disk seeks, most storage > systems use btrees with a

Re: Cassandra behaviour

2010-07-26 Thread Peter Schuller
> be the most portable thing to do.  I had been thinking that the bloom > filters were created on startup, but further reading of the docs > indicates that they are in the SSTable Index.  What is cassandra > doing, then, when it's printing out that it's sampling indices while > it starts? It's rea

Re: Cassandra behaviour

2010-07-26 Thread tsuraan
> My guess: > Your test is beating up your system. The system may need more memory > or disk throughput or CPU in order to keep up with that particular > test. Yeah, I am testing on a pretty wimpy machine; I just wanted to get some practice getting cassandra up and running, and I ran into this pro

Re: Cassandra behaviour

2010-07-26 Thread tsuraan
> Bloom filters are indeed linear in size with respect to the number of > items (assuming a constant target false positive rate). While I have > not looked at how Cassandra calculates the bloom filter sizes, I feel > pretty confident in saying that it won't dynamically replace bloom > filters with

Re: Cassandra behaviour

2010-07-26 Thread Peter Schuller
> to play with.  Can anybody give me advice on how to make cassandra > keep running under a high insert load? I forgot to mention that if your insertion speed is simply legitimately faster than compaction, but you have left-over idle CPU on the system, then currently as far as I know you're out of

Re: Cassandra behaviour

2010-07-26 Thread Peter Schuller
[ 1 billion inserts, failed after 120m with out-of-mem ] > - is Cassandra's RAM use proportional to the number of values that > it's storing?  I know that it uses bloom filters for preventing > lookups of non-existent keys, but since bloom filters are designed to > give an accuracy/space tradeoff,

Re: Cassandra behaviour

2010-07-26 Thread Jonathan Shook
My guess: Your test is beating up your system. The system may need more memory or disk throughput or CPU in order to keep up with that particular test. Check some of the posts on the list with "deferred processing" in the body to see why. Also, can you post the error log? On Mon, Jul 26, 2010 at

Cassandra behaviour

2010-07-26 Thread tsuraan
I have a system where we're currently using Postgres for all our data storage needs, but on a large table the index checks for primary keys are really slowing us down on insert. Cassandra sounds like a good alternative (not saying postgres and cassandra are equivalent; just that I think they are b