subject:"High BloomFilterFalseRation"

Re: High BloomFilterFalseRation

2010-11-02 Thread Ryan King

On Tue, Nov 2, 2010 at 1:28 AM, Daniel Doubleday wrote: > Hi all > > had some time yesterday to dig a lil deeper. And maybe this saves someone who > made the same mistake the time so ... > > After trying to reproduce the problem in unit tests with the same data which > led nowhere because every

Re: High BloomFilterFalseRation

2010-11-02 Thread Daniel Doubleday

Hi all had some time yesterday to dig a lil deeper. And maybe this saves someone who made the same mistake the time so ... After trying to reproduce the problem in unit tests with the same data which led nowhere because every single result was almost exactly what the math promised and incident

Re: High BloomFilterFalseRation

2010-10-28 Thread Daniel Doubleday

Hi Ryan I took a sample of one sstable (just flushed, not compacted). I compared 2 samples of sstables. One that is showing fine false positive ratios and the problem one. And yes both look the same to me. Both have the expected 15 buckets per row and the cardinality of the bitsets are the sa

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday

Ah of course - question makes total sense. But no: this is not the case: I am not constantly asking the same question since the tree is deep enough. Most data nodes are level 5 from the root. So the parents getting queried will be different most of the time. Since the parent nodes are created

Re: High BloomFilterFalseRation

2010-10-27 Thread Jonathan Ellis

Do you have a key "a/b" then? What columns does it have? On Wed, Oct 27, 2010 at 9:14 AM, Daniel Doubleday wrote: > Hm - > > not sure if I understand the random question. We are using RP. But I wouldn't > know why that should matter. > I thought that the bloom filter hash function should evenly

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday

Hm - not sure if I understand the random question. We are using RP. But I wouldn't know why that should matter. I thought that the bloom filter hash function should evenly distribute no matter what keys come in. Keys are '/' separated strings (aka paths :-)) I do bulk inserts like: (1000 rows

Re: High BloomFilterFalseRation

2010-10-27 Thread Jonathan Ellis

This is not expected, no. How random are your queries? If you have a couple outlier rows causing the false positives that are being queried over and over then that could just be the luck of the draw. On Wed, Oct 27, 2010 at 5:24 AM, Daniel Doubleday wrote: > Hi people > > We are currently movin

High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday

Hi people We are currently moving our second use case from mysql to cassandra. While importing the data (ongoing) I noticed that the BloomFilterFalseRation seems to be pretty high compared to another CF which is in used in production right now. Its a hierarchical data model and I cannot avoid t

Re: High BloomFilterFalseRation

Re: High BloomFilterFalseRation

Re: High BloomFilterFalseRation

Re: High BloomFilterFalseRation

Re: High BloomFilterFalseRation

Re: High BloomFilterFalseRation

Re: High BloomFilterFalseRation

High BloomFilterFalseRation

8 matches

Site Navigation

Mail list logo

Footer information