Re: Bloom filter

2014-08-04 Thread Per Steffensen
I just finished adding support for persisted ("backed" as I call them) bloom-filters in Guava Bloom Filter. Implemented one kind of persisted bloom-filter that works on memory mapped files. I have changed our Solr code so that it uses such a enhanced Guava Bloom Filter. Making sure it is kept up

Re: Bloom filter

2014-08-02 Thread Umesh Prasad
+1 to Guava's BloomFilter implementation. You can actually hook into UpdateProcessor chain and have the logic of updating bloom filter / checking there. We had a somewhat similar use case. We were using DIH and it was possible that same solr input document (meaning same content) will be coming l

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
You're right. I misunderstood. I thought that you wanted to optimize the "finding by id" path which is typically done for comparing versions during inserts in Solr. Yes, it won't help with the case where the ID does not exist. On Wed, Jul 30, 2014 at 6:14 PM, Per Steffensen wrote: > Hi > > I a

Re: Bloom filter

2014-07-30 Thread Per Steffensen
Hi I am not sure exactly what LUCENE-5675 does, but reading the description it seems to me that it would help finding out that there is no document (having an id-field) where version-field is less than . As far as I can see this will not help finding out if a document with id= exists. We want

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
I opened https://issues.apache.org/jira/browse/SOLR-6301 On Wed, Jul 30, 2014 at 1:35 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Hi Per, > > There's LUCENE-5675 which has added a new postings format for IDs. Trying > it out in Solr is in my todo list but maybe you can get to it

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
Hi Per, There's LUCENE-5675 which has added a new postings format for IDs. Trying it out in Solr is in my todo list but maybe you can get to it before me. https://issues.apache.org/jira/browse/LUCENE-5675 On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen wrote: > On 30/07/14 08:55, jim ferencz

Re: Bloom filter

2014-07-30 Thread Per Steffensen
On 30/07/14 08:55, jim ferenczi wrote: Hi Per, First of all the BloomFilter implementation in Lucene is not exactly a bloom filter. It uses only one hash function and you cannot set the false positive ratio beforehand. ElasticSearch has its own bloom filter implementation (using "guava like" Bloo

Re: Bloom filter

2014-07-29 Thread jim ferenczi
Hi Per, First of all the BloomFilter implementation in Lucene is not exactly a bloom filter. It uses only one hash function and you cannot set the false positive ratio beforehand. ElasticSearch has its own bloom filter implementation (using "guava like" BloomFilter), you should take a look at their

Re: Bloom filter

2014-07-28 Thread Per Steffensen
Yes I found that one, along with SOLR-3950. Well at least it seems like the support is there in Lucene. I will figure out myself how to make it work via Solr, the way I need it to work. My use-case is not as specified in SOLR-1375, but the solution might be the same. Any input is of course stil

Re: Bloom filter

2014-07-28 Thread Lukas Drbal
Hi Per, link to jira - https://issues.apache.org/jira/browse/SOLR-1375 Unresolved ;-) L. On Mon, Jul 28, 2014 at 1:17 PM, Per Steffensen wrote: > Hi > > Where can I find documentation on how to use Bloom filters in Solr (4.4). > http://wiki.apache.org/solr/BloomIndexComponent seems to be outd

Re: Bloom filter

2014-07-28 Thread Shalin Shekhar Mangar
I don't think that issue was ever committed. On Mon, Jul 28, 2014 at 4:47 PM, Per Steffensen wrote: > Hi > > Where can I find documentation on how to use Bloom filters in Solr (4.4). > http://wiki.apache.org/solr/BloomIndexComponent seems to be outdated - > there is no BloomIndexComponent inclu