Hi Per, There's LUCENE-5675 which has added a new postings format for IDs. Trying it out in Solr is in my todo list but maybe you can get to it before me.
https://issues.apache.org/jira/browse/LUCENE-5675 On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen <st...@designware.dk> wrote: > On 30/07/14 08:55, jim ferenczi wrote: > >> Hi Per, >> First of all the BloomFilter implementation in Lucene is not exactly a >> bloom filter. It uses only one hash function and you cannot set the false >> positive ratio beforehand. ElasticSearch has its own bloom filter >> implementation (using "guava like" BloomFilter), you should take a look at >> their implementation if you really need this feature. >> > Yes, I am looking into what Lucene can do and how to use it through Solr. > If it does not fit our needs I will enhance it - potentially with > inspiration from ES implementation. Thanks > > What is your use-case ? If your index fits in RAM the bloom filter won't >> help (and it may have a negative impact if you have a lot of segments). In >> fact the only use case where the bloom filter can help is when your term >> dictionary does not fit in RAM which is rarely the case. >> > We have so many documents that it will never fit in memory. We use > optimistic locking (our own implementation) to do correct concurrent > assembly of documents and to do duplicate control. This require a lot of > finding docs from their id, and most of the time the document is not there, > but to be sure we need to check both transactionlog and the actual index > (UpdateLog). We would like to use Bloom Filter to quickly tell that a > document with a particular id is NOT present. > >> >> Regards, >> Jim >> > Regards, Per Steffensen > -- Regards, Shalin Shekhar Mangar.