> So userspace throttling is probably the answer?
I believe so.
> Is the normal way of
> doing this to go through the JMX interface from a userspace program,
> and hold off on inserts until the values fall below a given threshold?
> If so, that's going to be a pain, since most of my system is
>
> It's reading through keys in the index and adding offset information
> about roughly every 128th entry in RAM, in order to speed up reads.
> Performing a binary search in an sstable from scratch would be
> expensive. Because of the high cost of disk seeks, most storage
> systems use btrees with a
> be the most portable thing to do. I had been thinking that the bloom
> filters were created on startup, but further reading of the docs
> indicates that they are in the SSTable Index. What is cassandra
> doing, then, when it's printing out that it's sampling indices while
> it starts?
It's rea
> My guess:
> Your test is beating up your system. The system may need more memory
> or disk throughput or CPU in order to keep up with that particular
> test.
Yeah, I am testing on a pretty wimpy machine; I just wanted to get
some practice getting cassandra up and running, and I ran into this
pro
> Bloom filters are indeed linear in size with respect to the number of
> items (assuming a constant target false positive rate). While I have
> not looked at how Cassandra calculates the bloom filter sizes, I feel
> pretty confident in saying that it won't dynamically replace bloom
> filters with
> to play with. Can anybody give me advice on how to make cassandra
> keep running under a high insert load?
I forgot to mention that if your insertion speed is simply
legitimately faster than compaction, but you have left-over idle CPU
on the system, then currently as far as I know you're out of
[ 1 billion inserts, failed after 120m with out-of-mem ]
> - is Cassandra's RAM use proportional to the number of values that
> it's storing? I know that it uses bloom filters for preventing
> lookups of non-existent keys, but since bloom filters are designed to
> give an accuracy/space tradeoff,
My guess:
Your test is beating up your system. The system may need more memory
or disk throughput or CPU in order to keep up with that particular
test.
Check some of the posts on the list with "deferred processing" in the
body to see why.
Also, can you post the error log?
On Mon, Jul 26, 2010 at
I have a system where we're currently using Postgres for all our data
storage needs, but on a large table the index checks for primary keys
are really slowing us down on insert. Cassandra sounds like a good
alternative (not saying postgres and cassandra are equivalent; just
that I think they are b