Re: High BloomFilterFalseRation

2010-11-02 Thread Ryan King
On Tue, Nov 2, 2010 at 1:28 AM, Daniel Doubleday wrote: > Hi all > > had some time yesterday to dig a lil deeper. And maybe this saves someone who > made the same mistake the time so ... > > After trying to reproduce the problem in unit tests with the same data which > led nowhere because every

Re: High BloomFilterFalseRation

2010-11-02 Thread Daniel Doubleday
Hi all had some time yesterday to dig a lil deeper. And maybe this saves someone who made the same mistake the time so ... After trying to reproduce the problem in unit tests with the same data which led nowhere because every single result was almost exactly what the math promised and incident

Re: High BloomFilterFalseRation

2010-10-28 Thread Daniel Doubleday
Hi Ryan I took a sample of one sstable (just flushed, not compacted). I compared 2 samples of sstables. One that is showing fine false positive ratios and the problem one. And yes both look the same to me. Both have the expected 15 buckets per row and the cardinality of the bitsets are the sa

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Ah of course - question makes total sense. But no: this is not the case: I am not constantly asking the same question since the tree is deep enough. Most data nodes are level 5 from the root. So the parents getting queried will be different most of the time. Since the parent nodes are created

Re: High BloomFilterFalseRation

2010-10-27 Thread Jonathan Ellis
Do you have a key "a/b" then? What columns does it have? On Wed, Oct 27, 2010 at 9:14 AM, Daniel Doubleday wrote: > Hm - > > not sure if I understand the random question. We are using RP. But I wouldn't > know why that should matter. > I thought that the bloom filter hash function should evenly

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Hm - not sure if I understand the random question. We are using RP. But I wouldn't know why that should matter. I thought that the bloom filter hash function should evenly distribute no matter what keys come in. Keys are '/' separated strings (aka paths :-)) I do bulk inserts like: (1000 rows

Re: High BloomFilterFalseRation

2010-10-27 Thread Jonathan Ellis
This is not expected, no. How random are your queries? If you have a couple outlier rows causing the false positives that are being queried over and over then that could just be the luck of the draw. On Wed, Oct 27, 2010 at 5:24 AM, Daniel Doubleday wrote: > Hi people > > We are currently movin

High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Hi people We are currently moving our second use case from mysql to cassandra. While importing the data (ongoing) I noticed that the BloomFilterFalseRation seems to be pretty high compared to another CF which is in used in production right now. Its a hierarchical data model and I cannot avoid t