Re: Bloom filter

Shalin Shekhar Mangar Wed, 30 Jul 2014 01:06:30 -0700

Hi Per,

There's LUCENE-5675 which has added a new postings format for IDs. Trying
it out in Solr is in my todo list but maybe you can get to it before me.


https://issues.apache.org/jira/browse/LUCENE-5675


On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen <st...@designware.dk>
wrote:

> On 30/07/14 08:55, jim ferenczi wrote:
>
>> Hi Per,
>> First of all the BloomFilter implementation in Lucene is not exactly a
>> bloom filter. It uses only one hash function and you cannot set the false
>> positive ratio beforehand. ElasticSearch has its own bloom filter
>> implementation (using "guava like" BloomFilter), you should take a look at
>> their implementation if you really need this feature.
>>
> Yes, I am looking into what Lucene can do and how to use it through Solr.
> If it does not fit our needs I will enhance it - potentially with
> inspiration from ES implementation. Thanks
>
>  What is your use-case ? If your index fits in RAM the bloom filter won't
>> help (and it may have a negative impact if you have a lot of segments). In
>> fact the only use case where the bloom filter can help is when your term
>> dictionary does not fit in RAM which is rarely the case.
>>
> We have so many documents that it will never fit in memory. We use
> optimistic locking (our own implementation) to do correct concurrent
> assembly of documents and to do duplicate control. This require a lot of
> finding docs from their id, and most of the time the document is not there,
> but to be sure we need to check both transactionlog and the actual index
> (UpdateLog). We would like to use Bloom Filter to quickly tell that a
> document with a particular id is NOT present.
>
>>
>> Regards,
>> Jim
>>
> Regards, Per Steffensen
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Bloom filter

Reply via email to