Re: Bloom filter

2014-08-04 Thread Per Steffensen
I just finished adding support for persisted ("backed" as I call them) bloom-filters in Guava Bloom Filter. Implemented one kind of persisted bloom-filter that works on memory mapped files. I have changed our Solr code so that it uses such a enhanced Guava Bloom Filter. Making sure

Re: Bloom filter

2014-08-02 Thread Umesh Prasad
+1 to Guava's BloomFilter implementation. You can actually hook into UpdateProcessor chain and have the logic of updating bloom filter / checking there. We had a somewhat similar use case. We were using DIH and it was possible that same solr input document (meaning same content) will be c

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
today is bloom filters, which use up > huge amounts of memory", I guess a bloom filter only takes as much space > (disk or memory) as you want it to. The more space you allow it to use the > more it gives you a false positive (saying "this doc might exist" in cases > where

Re: Bloom filter

2014-07-30 Thread Per Steffensen
ory", I guess a bloom filter only takes as much space (disk or memory) as you want it to. The more space you allow it to use the more it gives you a false positive (saying "this doc might exist" in cases where the doc actually does not exist). So the space you need to use for t

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
ut maybe you can get to it before me. > > https://issues.apache.org/jira/browse/LUCENE-5675 > > > On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen > wrote: > >> On 30/07/14 08:55, jim ferenczi wrote: >> >>> Hi Per, >>> First of all the BloomFilter im

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
:55, jim ferenczi wrote: > >> Hi Per, >> First of all the BloomFilter implementation in Lucene is not exactly a >> bloom filter. It uses only one hash function and you cannot set the false >> positive ratio beforehand. ElasticSearch has its own bloom filter >> implementat

Re: Bloom filter

2014-07-30 Thread Per Steffensen
On 30/07/14 08:55, jim ferenczi wrote: Hi Per, First of all the BloomFilter implementation in Lucene is not exactly a bloom filter. It uses only one hash function and you cannot set the false positive ratio beforehand. ElasticSearch has its own bloom filter implementation (using "guava

Re: Bloom filter

2014-07-29 Thread jim ferenczi
Hi Per, First of all the BloomFilter implementation in Lucene is not exactly a bloom filter. It uses only one hash function and you cannot set the false positive ratio beforehand. ElasticSearch has its own bloom filter implementation (using "guava like" BloomFilter), you should take

Re: Bloom filter

2014-07-28 Thread Per Steffensen
Yes I found that one, along with SOLR-3950. Well at least it seems like the support is there in Lucene. I will figure out myself how to make it work via Solr, the way I need it to work. My use-case is not as specified in SOLR-1375, but the solution might be the same. Any input is of course stil

Re: Bloom filter

2014-07-28 Thread Lukas Drbal
Hi Per, link to jira - https://issues.apache.org/jira/browse/SOLR-1375 Unresolved ;-) L. On Mon, Jul 28, 2014 at 1:17 PM, Per Steffensen wrote: > Hi > > Where can I find documentation on how to use Bloom filters in Solr (4.4). > http://wiki.apache.org/solr/BloomIndexComponent seems to be outd

Re: Bloom filter

2014-07-28 Thread Shalin Shekhar Mangar
I don't think that issue was ever committed. On Mon, Jul 28, 2014 at 4:47 PM, Per Steffensen wrote: > Hi > > Where can I find documentation on how to use Bloom filters in Solr (4.4). > http://wiki.apache.org/solr/BloomIndexComponent seems to be outdated - > there is no BloomIndexComponent inclu

Bloom filter

2014-07-28 Thread Per Steffensen
Hi Where can I find documentation on how to use Bloom filters in Solr (4.4). http://wiki.apache.org/solr/BloomIndexComponent seems to be outdated - there is no BloomIndexComponent included in 4.4 code. Regards, Per Steffensen