Re: Bloom filter calculation

Benedict Tue, 11 Jul 2023 01:12:39 -0700

I’m not sure I follow your reasoning. The bloom filter table is false positive 
per sstable given the number of bits *per key*. So for 10 keys you would have 
200 bits, which yields the same false positive rate as 20 bits and 1 key.

It does taper slightly at much larger N, but it’s pretty nominal for practical 
purposes.

I don’t understand what you mean by merging multiple filters together. We do 
lookup multiple bloom filters per query, but only one per sstable, and the 
false positive rate you’re calculating for 10 such lookups would not be 
accurate. This would be 1-(1-0.0000671)^10 which is still only around a 4%, not 
100%. You seem to be looking at the false positive rate of a bloom filter of 20 
bits with 10 entries, which means only 2 bits per entry?

> On 11 Jul 2023, at 07:14, Claude Warren, Jr via dev 
> <dev@cassandra.apache.org> wrote:
> 
> 
> Can someone explain to me how the Bloom filter table in 
> BloomFilterCalculations was derived and how it is supposed to work?  As I 
> read the table it seems to indicate that with 14 hashes and 20 bits you get a 
> fp of 6.71e-05.  But if you plug those numbers into the Bloom filter 
> calculator [1],  that is calculated only for 1 item being in the filter.  If 
> you merge multiple filters together the false positive rate goes up.  And as 
> [1] shows by 5 merges you are over 50% fp rate and by 10 you are at close to 
> 100% fp.  So I have to assume this analysis is wrong.  Can someone point me 
> to the correct calculations?
> 
> Claude
> 
> [1] https://hur.st/bloomfilter/?n=&p=6.71e-05&m=20&k=14

Re: Bloom filter calculation

Reply via email to