Re: why bloom filter is only for row key?

2014-09-15 Thread Philo Yang
Thanks DuyHai,

I think the trouble of bloom filter on all row keys & column names is
memory usage. However, if a CF has only hundreds of columns per row,  the
number of total columns will be much fewer, so the bloom filter is possible
for this condition, right? Is there a good way to adjust bloom filter's
property between row keys and row keys+column names automatically or by
user's config?

Thanks,
Philo Yang


2014-09-15 2:45 GMT+08:00 DuyHai Doan :

> Hello Philo
>
>  Building bloom filter for column names (what you call column key) is
> technically possible but very expensive in term of memory usage.
>
>   The approximate formula to calculate space required by bloom filter can
> be found on slide 27 here:
> http://fr.slideshare.net/quipo/modern-algorithms-and-data-structures-1-bloom-filters-merkle-trees
>
> false positive chance = 0.6185 * m/n  where m = number of bits for the
> filter and n = number of distinct keys
>
> For example, if you want to index 1 million of rows, each having 100 000
> columns on average, it will end up indexing 100 billions of keys (row keys
> & column names) with bloom filter.
>
>  By applying the above formula, m ≈ 4.8 * 10^11 bits ≈ 60Gb to allocate in
> RAM just for bloom filter on all row keys & column names ...
>
>  Regards
>
>  Duy Hai DOAN
>
> On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang  wrote:
>
>> Hi all,
>>
>> After reading some docs, I find that bloom filter is built on row keys,
>> not on column key. Can anyone tell me what is considered for not building
>> bloom filter on column key? Is it a good idea to offer a table property
>> option between row key and primary key for what boolm filter is built on?
>>
>> Thanks,
>> Philo Yang
>>
>>
>


Re: why bloom filter is only for row key?

2014-09-15 Thread Robert Coli
On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang  wrote:

> After reading some docs, I find that bloom filter is built on row keys,
> not on column key. Can anyone tell me what is considered for not building
> bloom filter on column key? Is it a good idea to offer a table property
> option between row key and primary key for what boolm filter is built on?
>

Here's the nitty gritty of the process which removed the row-level bloom
filter on column keys.

https://issues.apache.org/jira/browse/CASSANDRA-4885 - "Remove or rework
per-row bloom filters"

=Rob
http://twitter.com/rcolidba