(commons-collections) branch master updated: initial changes. (#507)

ggregory Sun, 23 Jun 2024 04:37:11 -0700

This is an automated email from the ASF dual-hosted git repository.

ggregory pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/commons-collections.git



The following commit(s) were added to refs/heads/master by this push:
     new 7c9e046d4 initial changes. (#507)
7c9e046d4 is described below

commit 7c9e046d417296a2d46bc2f960ba3d355351bf60
Author: Claude Warren <cla...@apache.org>
AuthorDate: Sun Jun 23 13:36:29 2024 +0200

    initial changes. (#507)
    
    Added MathJax to display math symbols.
    
    Added markdown to site processing.
    Created markdown booomFilters directory with multiple documentation files.
---
 pom.xml                                            |   16 +-
 .../bloomfilter/EnhancedDoubleHasher.java          |    2 +-
 src/site/markdown/bloomFilters.md                  |   25 +
 src/site/markdown/bloomFilters/advanced.md         |  324 ++++++
 src/site/markdown/bloomFilters/implementation.md   |  235 ++++
 src/site/markdown/bloomFilters/intro.md            |  197 ++++
 src/site/markdown/bloomFilters/multidimensional.md |  110 ++
 src/site/site.xml                                  |   20 +
 src/site/xdoc/bloomFilters.xml                     | 1137 --------------------
 9 files changed, 927 insertions(+), 1139 deletions(-)

diff --git a/pom.xml b/pom.xml
index 0a01bdb01..a1d6f602c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -170,6 +170,8 @@
     <commons.jacoco.branchRatio>0.78</commons.jacoco.branchRatio>
     <commons.jacoco.lineRatio>0.85</commons.jacoco.lineRatio>
     <commons.jacoco.complexityRatio>0.78</commons.jacoco.complexityRatio>
+    <doxia.module.markdown.version>1.12.0</doxia.module.markdown.version>
+    <math.mathjax.version>2.7.2</math.mathjax.version>
   </properties>
 
   <build>
@@ -267,8 +269,21 @@
         <artifactId>maven-javadoc-plugin</artifactId>
         <configuration>
             <source>8</source>
+            <!-- Enable MathJax -->
+            <additionalOptions>-Xdoclint:all --allow-script-in-comments 
-header '&lt;script type="text/javascript" 
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/${math.mathjax.version}/MathJax.js?config=TeX-AMS-MML_HTMLorMML"&gt;&lt;/script&gt;'</additionalOptions>
         </configuration>
       </plugin>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-site-plugin</artifactId>
+        <dependencies>
+          <dependency>
+            <groupId>org.apache.maven.doxia</groupId>
+            <artifactId>doxia-module-markdown</artifactId>
+            <version>${doxia.module.markdown.version}</version>
+          </dependency>
+        </dependencies>
+      </plugin>
     </plugins>
   </build>
 
@@ -812,4 +827,3 @@
   </contributors>
 
 </project>
-
diff --git 
a/src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java
 
b/src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java
index d82e10f57..230187fda 100644
--- 
a/src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java
+++ 
b/src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java
@@ -85,7 +85,7 @@ public class EnhancedDoubleHasher implements Hasher {
      *</p>
      * <ol>
      * <li>If there is an odd number of bytes the excess byte is assigned to 
the increment value</li>
-     * <li>The bytes alloted are read in big-endian order any byte not 
populated is set to zero.</li>
+     * <li>The bytes allotted are read in big-endian order any byte not 
populated is set to zero.</li>
      * </ol>
      * <p>
      * This ensures that small arrays generate the largest possible increment 
and initial values.
diff --git a/src/site/markdown/bloomFilters.md 
b/src/site/markdown/bloomFilters.md
new file mode 100644
index 000000000..48227d323
--- /dev/null
+++ b/src/site/markdown/bloomFilters.md
@@ -0,0 +1,25 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+ # Bloom filters
+
+Bloom filters were added to commons collections in version 4.5.
+
+The documentation comprises four parts:
+ * [An introduction](bloomFilters/intro.html) to Bloom filters.
+ * The [Commons Collections implementations](bloomFilters/implementation.html).
+ * [Unusual usage and advanced implementations](bloomFilters/advanced.html).
+ * Using [Bloom filters for indexing](bloomFilters/multidimensional.html).
diff --git a/src/site/markdown/bloomFilters/advanced.md 
b/src/site/markdown/bloomFilters/advanced.md
new file mode 100644
index 000000000..6090eeb26
--- /dev/null
+++ b/src/site/markdown/bloomFilters/advanced.md
@@ -0,0 +1,324 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+# Bloom Filters Part 3: Unusual Usage and Advanced Implementations
+
+In the previous post we discussed the Apache Commons CollectionⓇ 
implementation of Bloom filters and showed how to use them to answer the most 
basic questions.  In this post we will look at some unusual usage patterns and 
some advanced implementations.
+
+## Unusual Usage Patterns
+
+These usage patterns are unusual in the sense that they are not commonly seen 
in code bases.  However, it is important for developers to realize the types of 
questions that can be answered with Bloom filters.
+
+### Finding filter in intersection of sets
+
+If you have two Bloom filters and you want to know if there are elements that 
are in both, it is possible to create a single Bloom filter to answer the 
question.  The solution is to create a Bloom filter that is the intersection of 
the two original Bloom filters.  To do this we are going to use the 
`BitMapExtractor.processBitmapPairs()` function.  This function takes a 
`BitMapExtractor` and a `LongBiPredicate` as arguments.  The `LongBiPredicate` 
takes two long values, performs some func [...]
+
+In this example the `LongBiPredicate` creates a new bitmap by performing a 
bitwise “and” on the `BitMap` pairs presented by the 
`BitMapExtractor.processBitmapPairs()` function.  The class has a method to 
present the result as a `BitMapExtractor`.
+
+```java
+import org.apache.commons.collections4.bloomfilter.BitMap;
+import org.apache.commons.collections4.bloomfilter.BitMapExtractor;
+import org.apache.commons.collections4.bloomfilter.LongBiPredicate;
+import org.apache.commons.collections4.bloomfilter.Shape;
+
+/**
+ * Calculates the intersection of two Bloom filters
+ */
+class Intersection implements LongBiPredicate {
+    private long[] newMaps;
+    private int idx;
+
+    /**
+     * Creates an intersection for a specified shape.
+     * @param shape the shape of the Bloom filters being compared.
+     */
+       Intersection(Shape shape) {
+               newMaps = new 
long[BitMap.numberOfBitMaps(shape.getNumberOfBits())];
+        idx = 0;
+       }
+
+    /**
+     * Implements the LongBiPredicate test. 
+     * @param a one BitMap
+     * @param b the other BitMap
+     * @return true always.
+     */
+       public boolean test(long a, long b) {
+        newMaps[idx++] = a & b;
+        return true;
+    }
+
+    /**
+     * Returns the intersection as a BitMapExtractor.
+     * @return the the intersection as a BitMapExtractor.
+     */
+    BitMapExtractor asExtractor() {
+        return BitMapExtractor.fromBitMapArray(newMaps);
+    }
+}
+```
+Now we can use the `Intersecton` class to merge the two Bloom filters into a 
new filter as follows:
+
+```java
+import Intersection;
+import org.apache.commons.collections4.bloomfilter.BloomFilter;
+import org.apache.commons.collections4.bloomfilter.LongBiPredicate;
+import org.apache.commons.collections4.bloomfilter.Shape;
+
+class IntersectionExample {
+
+    public static main(String[] args) {
+        /* start setup */
+        
+        // get an array of Bloom filters to check
+        BloomFilter[] candidateFilters = getBloomFilterCandidates();
+
+        // use two populated Bloom filters.
+        BloomFilter collection1 = populateCollection1();
+        BloomFilter collection2 = populateCollection2();
+
+        /* end setup */
+
+        // create the Intersection instance
+        Intersection intersection = new Intersection(collection1.getShape());
+
+        // populate the intersection instance with data from the 2 filters.
+        collection1.processBitMapPairs(collection2, intersection);
+
+        // create a new Bloom filter from the intersection instance.
+        BloomFilter collection1And2 = new BloomFilter(collection1.getShape());
+        collection1And2.merge(intersection.asExtractor());
+
+        // now do the search for filters that are in both collections.
+        for (BloomFilter target : candidateFilters) {
+            if (collection1And2.contains(target)) {
+                // do something interesting
+            }
+        }
+    }
+}
+```
+
+### Sharding by Bloom filter
+In processing large volumes of data it is often desirable to fragment or shard 
the data into different repositories.  Bloom filters provide a quick method for 
sharding the data.
+
+Let’s assume there are multiple locations to store files, and we want to 
distribute the data across the cluster in a way that minimizes collisions.  The 
following code is an example of how to implement this.
+
+```java
+import org.apache.commons.collections4.bloomfilter.BloomFilter;
+import org.apache.commons.collections4.bloomfilter.SetOperations;
+import org.apache.commons.collections4.bloomfilter.Shape;
+import org.apache.commons.collections4.bloomfilter.SimpleBloomFilter;
+import Storage; // a class that implements the actual storage I/O
+
+class ShardingExample {
+
+    private BloomFilter[] gatekeeper;
+    private Storage[] storage;
+
+     /**
+     * Constructor.  Assumes 10K items in filter with false positive rate of 
0.01
+     * @param storage The storage locations to store the objects in.
+     */
+    public ShardingExample(Storage[] storage) {
+        this(storage, Shape.fromNP(10000, 0.01));
+    }
+
+    /**
+     * Constructor. 
+     * @param storage The storage locations to store the objects in.
+     * @param shape The shape for the gatekeeper Bloom filters.
+     */
+    public ShardingExample(Storage[] storage, Shape shape) {
+        gatekeeper = new BloomFilter[storage.length];
+        this.storage = storage;
+        for (int i = 0; i < gatekeeper.length; i++) {
+            gatekeeper[i] = new SimpleBloomFilter(shape);
+        }
+    }
+
+    /**
+     * Creates the BloomFilter for the key.
+     * @param objectKey The key to hash.
+     * @return the BloomFilter for the key.
+     */
+    private BloomFilter createBloomFilter(Object objectKey) {
+        byte[] bytes = SerializationUtils.serialize(o);
+        long[] hash = MurmurHash3.hash128(bytes);
+        BloomFilter bloomFilter = new 
SimpleBloomFilter(gatekeeper[0].getShape());
+        bloomFilter.merge(new EnhancedDoubleHasher(hash[0], hash[1]));
+        return bloomFilter;
+    }
+
+    
+    /**
+     * Write an object into the filter.
+     * @param itemKey The item key for the storage layer.
+     * @param itemToStore The item to store.
+     */
+    public void write(Object itemKey, Object itemToStore) {
+        // create the Bloom filter to for key
+        BloomFilter itemBloomFilter = createBloomFilter(itemKey);
+
+        // find the storage to insert into.
+       int selected = 0;
+       int hammingValue = Integer.MAX_INT;
+       for (int i = 0; i < gatekeeper.length; i++) {
+           int hamming = SetOperations.hammingDistance(gatekeeper[i], 
itemBloomFilter);
+           if (hamming < hammingValue) {
+               selected = i;
+               hammingValue = hamming;
+           }
+       }
+
+       // insert the data.
+       storage[selected].write(itemKey, itemToStore);
+       gatekeeper[selected].merge(itemBloomFilter);
+    }
+
+    /**
+     * Reads the item from the storage.  
+     * @param itemKey The key to look for.
+     * @return The stored object or null if not found. 
+     */
+    public Object read(Object itemKey) {
+        // create the Bloom filter to look for
+        BloomFilter itemBloomFilter = createBloomFilter(itemKey);
+        
+        // assumes storage returns null if key not found.
+        for (int i = 0; i < gatekeeper.length; i++) {
+            if (gatekeeper[i].contains(itemBloomFilter)) {
+                Object itemThatWasStored = storage[i].read(itemKey);
+                if (itemThatWasStored != null) {
+                    return itemThatWasStored;
+                }
+            }
+        }
+        return null;
+    }
+}
+```
+
+However, the issue with this solution is that once the filters are saturated, 
the search begins to degrade.  To solve this problem a new, empty filter can be 
added when one of the existing filters approaches saturation.  When a filter 
reaches saturation, remove it from consideration for insert but let it respond 
to read requests.  This solution achieves a balanced Bloom filter distribution 
and does not exceed the false positive threshold.  An example of a test for 
saturation is:
+```java
+    if (bloomfilter.getShape().estimateMaxN() <= bloomfilter.estimateN()) {
+        // handle saturated case.
+    }
+```
+The above calculation is dependent upon `BloomFilter.cardinality()` function, 
so it is advisable to use BloomFilters that track cardinality or can calculate 
cardinality quickly.
+
+## Counting Bloom filters
+
+Standard Bloom filters do not have a mechanism to remove items.  One of the 
solutions for this is to convert each bit to a counter<span><a 
class="footnote-ref" href="#fn1">1</a></span>. The counter and index together 
are commonly called a cell.  As items are added to the filter, the values of 
the cells associated with the enabled bits are incremented.  When an item is 
removed the values of the cells associated with the enabled bits are 
decremented.  This solution supports removal of item [...]
+
+The counting Bloom filter also has a couple of operations not found in other 
Bloom filters:
+ * Counting Bloom filters can be added together so that the sum of their 
counts is achieved.
+ * Counting Bloom filters can be subtracted so that the difference of their 
counts is achieved.
+ * Counting Bloom filters can report the maximum number of times another Bloom 
filter might have been merged into it.  This is an upper bound estimate and may 
include false positives.
+
+There are several error conditions with counting Bloom filters that are not 
found in other cases.
+ * Incrementing the counter past the maximum value that can be stored in a 
cell.
+ * Decrementing the counter past 0.
+ * Removing a Bloom filter that was not added.  This condition can decrement a 
cell that had an initial value of zero leading to decrementing error.  But in 
other cases, when all the enabled bits in the Bloom filter have cells in the 
counting filter it is undetectable, but will lead to unexpected results for 
subsequent operations.
+
+### Apache Commons Collections Implementation
+
+The Apache Commons Collections Bloom filter package contains a counting Bloom 
filter implementation.  To support the `CountingBloomFilter` there are 
`CellExtractor` and `CellPredicate` interfaces.
+
+The `CellExtractor` extends the `IndexExtractor` by producing the index for 
every cell with a count greater than 0.  The interface defines several new 
methods:
+ * `processCells(CellPredicate consumer)` that will call the `CellPredicate` 
for each populated cell in the producer.
+ * `from(IndexExtractor indexExtractor)` A method to produce a `CellExtractor` 
from an `IndexExtractor` where each index that appears in the `IndexExtractor` 
is a cell with a value of one (1).  Since the `CellExtractor` makes a guarantee 
of uniqueness for the index, duplicate indices are summed together. 
+
+The `CellPredicate` is a functional consumer interface that accepts two 
integer values, the index and the cell value, for each populated cell.
+
+The `CountingBloomFilter` interface defines several new methods:
+ * `add(CellExtractor other)` - adds the cells of the extractor to the cells 
of the counting Bloom filter.  
+ * `subtract(CellExtractor other)` - subtracts the cells of the  extractor 
from the cells of the counting Bloom filter.
+ * `remove(BitMapExtractor other)`, `remove(BloomFilter)`, `remove(Hasher)`, 
and `remove(IndexExtractor)` - these methods decrement the associated cells by 
1.  This is the inverse of the `merge()` methods.
+ * `isValid()` verifies that all cells are valid.  If `false` the 
`CountingBloomFilter` is no longer accurate and functions may yield 
unpredictable results.
+ * `getMaxCell()` defines the maximum value of a cell before it becomes 
invalid.
+
+The only provided implementation, `ArrayCountingBloomFilter`, uses a simple 
array of integers to track the counts.
+
+## Element Decay Bloom filters - for streaming data
+
+Element decay Bloom filters are effectively counting Bloom filters that 
automatically decrement some of the counts.  Technically speaking counting 
Bloom filters can be used for streaming data, but to do so one would have to 
figure out how to remove filters when they were too old to be considered any 
longer.  There are several approaches to this problem; here we discuss two: 
Creating layers of filters based on some quantized time unit (a temporal 
layered Bloom filter), and the stable Bloo [...]
+
+### Stable Bloom filter
+
+The stable Bloom filter<span><a class="footnote-ref" href="#fn2">2</a></span> 
is a form of counting Bloom filter that automatically degrades the cells so 
that items are “forgotten” after a period of time.  Each cell has a maximum 
value defined by the stable Bloom filter shape.  When a bit is turned on, the 
cell is set to the maximum value.  The process for an insert is:
+ 1. Randomly select a number of cells and if the value is greater than zero 
decrement it.
+ 2. For each bit to be enabled, set the cell to the maximum value.
+
+After a period of time the number of enabled cells becomes stable, hence the 
name of the filter.  This filter will detect duplicates for recently seen 
items.  However, it also introduces a false negative rate, so unlike other 
Bloom filters this one does not guarantee that if the target is not in the 
filter it has not been seen.
+
+The stable filter works well in environments where inserts occur at a fairly 
fixed rate; it does not handle bursty environments very well.
+
+There is no implementation of a stable Bloom filter in commons collections.
+
+### Layered Bloom filter
+
+The layered Bloom filter<span><a class="footnote-ref" href="#fn3">3</a></span> 
creates a list of filters.  Items are added to a target filter and, after a 
specified condition is met, a new filter is added to the list and it becomes 
the target.  Filters are removed from the list based on other specified 
conditions.
+
+For example, a layered Bloom filter could comprise a list of Bloom filters, 
where every 10 merges a new target is created and old target filters are 
removed one minute after its last merge.  This provides a one minute windowing 
and guarantees that no Bloom filter in the list will contain more than 10 other 
filters.  This type of filter handles bursty rates better than the stable 
filter.
+
+The layered filter also has the capability to locate the layer in which a 
filter was added.  This gives the user the ability to look back in time, if 
necessary.
+
+The layered filter can also be used in situations where the actual number of 
items is unknown when the Bloom filter is defined.  By using a layered filter 
that adds a target whenever the saturation of the current target reaches 1, the 
false positive rate for the entire filter does not rise above the value 
calculated for the shape.
+
+### Apache Commons Collections Implementation
+
+The Apache Commons Collections Bloom filter package contains a layered Bloom 
filter implementation.  To support the `LayeredBloomFilter` there are a 
`LayerManager` class and a `BloomFilterExtractor` interface.
+
+The `LayerManager` handles all of the manipulation of the collection of Bloom 
filters that comprise the `LayeredBloomFilter`. It is constructed by a 
`Builder` that requires:
+ * a `Supplier` of Bloom filters that will create new empty Bloom filters as 
required.  
+ * a `Predicate` that takes a `LayerManager` and determines if a new layer 
should be added.
+ * a `Consumer` that takes a `Deque` of the types of Bloom filters provided by 
the `Supplier` noted above.
+
+These three properties ensure that the `LayerManager` will produce filters 
when required, and remove them when necessary.
+
+The `BloomFilterExtractor` is a functional interface that functions much like 
the `CellExtractor` in the `CountingBloomFilter`.  The `BloomFilterExtractor` 
has methods:
+* `processBloomFilters(Predicate<BloomFilter> consumer)` - that will call the 
`Predicate` for each Bloom filter layer.
+* `processBloomFilterPairs(BloomFilterExtractor other, 
BiPredicate<BloomFilter, BloomFilter> funcindexExtractor)` - that wil lapply 
the BiPredicate to every pair of Bloom filters.
+* Methods to produce a `BloomFilterExtractor` from a collection of Bloom 
filters.
+* `flatten()` - a method to merge all the filters in the list into a single 
filter.
+
+
+The `LayeredBloomFilter` class defines several new methods:
+* `contains(BloomFilterExtractor others)` - returns true if the layered filter 
contains all of the other filters.
+* `find(BitmapExtractor)`, `find(BloomFilter)`, `find(Hasher)`, and 
`find(IndexExtractor)` - returns an array of ints identifying which layers 
match the pattern.
+* `get(int layer)` - returns the Bloom filter for the layer.
+* `getDepth()` - returns the number of layers in the filter..
+* `next()` - forces the the creation of a new layer.
+
+## Review
+
+In this post we covered some unusual uses for Bloom filters as well as a 
couple of interesting unusual Bloom filters.  In the next post, we will 
introduce the reference Bloom filter, and delve into multidimensional Bloom 
filters.  We will also show how multidimensional Bloom filters can be used to 
search encrypted data without decrypting.
+
+## Footnotes
+<span>
+<ol class="footnotes>">
+<li><a id='fn1'></a>
+Fan, Li; Cao, Pei; Almeida, Jussara; Broder, Andrei (2000), "Summary Cache: A 
Scalable Wide-Area Web Cache Sharing Protocol" (PDF), IEEE/ACM Transactions on 
Networking, 8 (3): 281–293, CiteSeerX 10.1.1.41.1487, doi:10.1109/90.851975, 
S2CID 4779754, archived from the original (PDF) on 2017-09-22, retrieved 
2018-07-30. A preliminary version appeared at SIGCOMM '98.
+</li>
+<li><a id='fn2'></a>
+Deng, Fan; Rafiei, Davood (2006), "Approximately Detecting Duplicates for 
Streaming Data using Stable Bloom Filters", Proceedings of the ACM SIGMOD 
Conference (PDF), pp. 25–36
+</li>
+<li><a id='fn2'></a>
+Zhiwang, Cen; Jungang, Xu; Jian, Sun (2010), "A multi-layer Bloom filter for 
duplicated URL detection", Proc. 3rd International Conference on Advanced 
Computer Theory and Engineering (ICACTE 2010), vol. 1, pp. V1-586-V1-591, 
doi:10.1109/ICACTE.2010.5578947, ISBN 978-1-4244-6539-2, S2CID 3108985
+</li>
+</ol>
+</span>
diff --git a/src/site/markdown/bloomFilters/implementation.md 
b/src/site/markdown/bloomFilters/implementation.md
new file mode 100644
index 000000000..3020c8a2b
--- /dev/null
+++ b/src/site/markdown/bloomFilters/implementation.md
@@ -0,0 +1,235 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+# Bloom Filters Part 2: Apache Commons CollectionsⓇ Bloom Filter Implementation
+
+Previously I wrote about Bloom filters, what they are, why we use them, and 
what statistics they can provide.  In this post I am going to cover the Apache 
Commons Collections Bloom filter implementation in Java and explain the logic 
behind some of the design decisions.
+
+We need to realize that there are several pain points in Bloom filter usage.  
First, the amount of time it takes to create the number of hashes needed for 
each filter.  Second, the trade-off between storing the bits as a bit vector 
(which for sparsely populated filters can be wasteful) or storing the index of 
the enabled bits (which for densely populated filters can be very large).  
Adjacent to the internal storage question is how to serialize the Bloom filter. 
 Consideration of these pa [...]
+
+## Shape
+
+Last time I spoke of the shape of the filter.  In Commons Collections, `Shape` 
is a class that defines the number of bits in the filter (`m`) and the number 
of hashes (`k`) used for each object insertion.
+
+Bloom filters with the same shape can be merged and compared.  The code does 
not verify that the shape is the same, as the comparison would increase 
processing time.  However, every Bloom filter has a reference to its `Shape` 
and the test can be performed if desired.
+
+The `Shape` is constructed from one of five combinations of parameters 
specified in `fromXX` methods:
+ * `fromKM` - when the number of hash functions and the number of bits is 
known.
+ * `fromNM` - when the number of items and the number of bits is known.
+ * `fromNMK` - when the number of items, the number of bits and the number of 
hash functions is known.  In this case the number of items is used to verify 
that the probability is within range.
+ * `fromNP` - when the number of items and the probability of false positives 
is known.
+ * `fromPMK` - when the probability of false positives, number of bits, and 
number of hash functions is known.  In this case the probability is used to 
verify that the maximum number of elements will result in a probability that is 
valid.
+
+## The Extractors
+
+The question of how to efficiently externally represent the internal 
representation of a Bloom filter led to the development of two “extractors”.   
One that logically represents an ordered collection of BitMap longs, and the 
other that represents a collection of enabled bits indices.  In both cases the 
extractor feeds each value to a function - called a predicate - that does some 
operation with the value and returns true to continue processing, or false to 
stop processing.
+
+The extractors allow implementation of different internal storage as long as 
implementation can produce one or both of the standard producers.  Each 
producer has a static method to convert from the other type so only one 
implementation is required of the internal storage.
+
+### BitMapExtractor
+
+The BitMap extractor is an interface that defines objects that can produce 
BitMap vectors.  BitMap vectors are vectors of long values where the bits are 
enabled as per the `BitMap` class.  All Bloom filters implement this interface. 
 The `BitMapExtractor` has static methods to create a producer from an array of 
long values, as well as from an `IndexExtractor`.  The interface has default 
implementations that convert the extractor into an array of long values, and 
one that executes a LongP [...]
+
+The method `processBitMaps(LongPredicate)` is the standard access method for 
the extractor.  Each long value in the BitMap is passed to the predicate in 
order.  Processing continues until the last BitMap is processed or the 
`LongPredicate` returns false.
+
+### IndexExtractor
+
+The index extractor produces the index values for the enabled bits in a Bloom 
filter.  All Bloom filters implement this interface.  The `IndexExtractor` has 
static methods to create the extractor from an array of integers or a 
`BitMapExtractor`.  The interface also has default implementations to convert 
the extractor to an array of integers and one that executes an `IntPredicate` 
on each index.
+
+The method `processIndices(IntPredicate)` is the standard access method for 
the extractor.  Each int value in the extractor is passed to the predicate.  
Processing continues until the last int is processed or the `IntPredicate` 
returns false.  The order and uniqueness of the values is not guaranteed.
+
+## The Hashers
+
+### Hasher
+
+The `Hasher` interface defines a method that accepts a `Shape` and returns an 
`IndexrExtractor`. Calling `hasher.indices(shape)` will yield an 
`IndexExtractor` that will produce the `shape.numberOfHashFunctions()` integers.
+
+The `Hasher` provides a clean separation between new data being hashed and the 
Bloom filter.  It also provides a mechanism to add the same item to multiple 
Bloom filters even if they have different shapes.  This can be an important 
consideration when using Bloom filters in multiple systems with different 
requirements.
+
+Any current hashing process used by an existing Bloom filter implementation 
can be duplicated by a hasher.  The hasher only needs to create an 
`IndexExtractor` for an arbitrary shape.  In fact, the testing code contains 
`Hasher` implementations that produce sequential values.
+
+### EnhancedDoubleHasher
+
+In the previous post I described a hashing technique where the hash value is 
split into two values.  One is used for the initial value, the other is used to 
increment the values as additional hashes are required.  The Commons 
Collections implementation adds code to ensure that the increment is changed by 
a tetrahedral value on every iteration.  This change addresses issues with the 
incrementer initially being zero or very small.
+
+The constructor accepts either two longs or a byte array from a hashing 
function.  Users are expected to select a hashing algorithm to create an 
initial hash for the item.  The `EnhancedDoubleHasher` is then constructed from 
that hash value.  In many cases the Murmur3 hash, available in the Apache 
Commons CodecⓇ library, is sufficient and very fast.  The result is a `Hasher` 
that represents the hashed item.  This hasher can now be passed to any Bloom 
filter and the `Shape` of the filter  [...]
+
+## The Bloom Filters
+
+### Bloom Filter
+
+Now we come to the Bloom filter.  As noted above the `BloomFilter` interface 
implements both the `IndexExtractor` and the `BitMapExtractor`.  The two 
extractors are the external representations of the internal representation of 
the Bloom filter.  Bloom filter implementations are free to store bit values in 
any way deemed fit.  The requirements for bit value storage are:
+ * Must be able to produce an IndexExtractor.
+ * Must be able to produce a BitMapExtractor.
+ * Must be able to clear the values.  Reset the cardinality to zero.
+ * Must specify if the filter prefers (is faster creating) the IndexExtractor 
(Sparse characteristic) or the BitMapExtractor (Not sparse characteristic).
+ * Must be able to merge hashers, Bloom filters, index extractors, and BitMap 
extractors.  When handling extractors it is often the case that an 
implementation will convert one type of extractor to the other for merging.  
The BloomFilter interface has default implementations for Bloom filter and 
hasher merging.
+ * Must be able to determine if the filter contains hashers, Bloom filters, 
index extractors, and BitMap extractors. The BloomFilter interface has default 
implementations for BitMap extractor, Bloom filter and hasher checking.
+ * Must be able to make a deep copy of itself.
+ * Must be able to produce its `Shape`.
+
+Several implementations of Bloom filter are provided.  We will start by 
focusing on the `SimpleBloomFilter` and the `SparseBloomFilter`.
+
+### SimpleBloomFilter
+
+The `SimpleBloomFilter` implements its storage as an on-heap array of longs.  
This implementation is suitable for many applications.  It can also serve as a 
good model for how to implement `BitMap` storage in specific off-heap 
situations.
+
+### SparseBloomFilter
+The `SparseBloomFilter` implements its storage as a TreeSet of Integers.  
Since each bit is randomly selected across the entire [0,shape.numberOfBits) 
range the Sparse Bloom filter only makes sense when \\( \frac{2 \times m}{64} 
\lt k \times n \\).  The reasoning being that every BitMap in the array is 
equivalent to 2 integers in the sparse filter.  The number of integers is the 
number of hash functions times the number of items – this is an overestimation 
due to the number of hash colli [...]
+
+### Other
+
+The helper classes included in the package make it easy to implement new Bloom 
filters, for example it should be fairly simple to implement a Bloom filter on 
a `ByteBuffer`, or `LongBuffer`, or one that uses a compressed bit vector.
+
+## Common Usage Patterns
+
+### Populating a Bloom filter
+
+In most cases a shape is determined; in this example it is determined by the 
number of items to go in and the acceptable false positive rate, and then a 
number of items are added to the filter.
+
+```java
+import org.apache.commons.codec.digest.MurmurHash3;
+import org.apache.commons.collections4.bloomfilter.BloomFilter;
+import org.apache.commons.collections4.bloomfilter.EnhancedDoubleHasher;
+import org.apache.commons.collections4.bloomfilter.Shape;
+import org.apache.commons.lang3.SerializationUtils;
+
+public class PopulatingExample {
+
+    /**
+     * Populates a Bloom filter with the items.  Uses a shape that 
+     * expects 10K items max and a false positive rate of 0.01.
+     * @param items The items to insert.
+     * @return the Bloom filter populated with the items.
+     */
+    public static BloomFilter populate(Object[] items) {
+        populate(Shape.fromNP(10000, 0.01), items);
+    }
+
+    /**
+     * Populates a Bloom filter with the items.  
+     * @param shape The shape of the Bloom filter.
+     * @param items The items to insert.
+     * @return the Bloom filter populated with the items.
+     */
+    public static BloomFilter populate(Shape shape, Object[] items) {
+        BloomFilter collection = new SimpleBloomFilter(shape);
+        for (Object o : items) {
+            // this example serializes the entire object, actual 
implementation 
+            // may want to serialize specific properties into the hash.
+            byte[] bytes = SerializationUtils.serialize(o);
+            long[] hash = MurmurHash3.hash128(bytes);
+            collection.merge(new EnhancedDoubleHasher(hash[0], hash[1]));
+        }
+        // collection now contains all the items from the list of items
+        return collection;
+    }
+}
+```
+
+### Searching a Bloom filter
+When searching a single Bloom filter, it makes sense to use the `Hasher` to 
search in order to reduce the overhead of creating a Bloom filter for the match.
+
+```java
+import org.apache.commons.codec.digest.MurmurHash3;
+import org.apache.commons.collections4.bloomfilter.BloomFilter;
+import org.apache.commons.collections4.bloomfilter.EnhancedDoubleHasher;
+import org.apache.commons.lang3.SerializationUtils;
+
+public class SearchingExample {
+
+    private Bloomfilter collection;
+
+    /**
+     * Creates the example from a populated collection.
+     * @param collection the collection to search.
+     */
+    public SearchingExample(BloomFilter collection) {
+        this.collection = collection;
+    }
+
+    /**
+     * Perform the search.
+     * @param item the item to look for.
+     * @return true if the item is found, false otherwise.
+     */
+    public boolean search(Object item) {
+        // create the hasher to look for.=
+        byte[] bytes = SerializationUtils.serialize(o);
+        long[] hash = MurmurHash3.hash128(bytes);
+        return collection.contains(new EnhancedDoubleHasher(hash[0], hash[1]));
+    }
+}
+```
+
+However, if multiple filters are being checked, and they are all the same 
shape, then creating a Bloom filter from the hasher and using that is more 
efficient.
+
+```java
+import org.apache.commons.codec.digest.MurmurHash3;
+import org.apache.commons.collections4.bloomfilter.BloomFilter;
+import org.apache.commons.collections4.bloomfilter.EnhancedDoubleHasher;
+import org.apache.commons.collections4.bloomfilter.Shape;
+import org.apache.commons.lang3.SerializationUtils;
+
+public abstract class SearchingExample2 {
+    
+    private BloomFilter[] collections;
+
+    /**
+     * Create example from an array of populated Bloom filters that all have 
the same Shape.
+     * @param collections The Bloom filters to search.
+     */
+    public SearchingExample2(BloomFilter[] collections) {
+        this.collections = collections;
+    }
+
+    /**
+     * Search the filters for matching items.
+     * @param item the Item t o search for.
+     */
+    public void search(Object item) {
+        // create the Bloom filter to search for.
+        byte[] bytes = SerializationUtils.serialize(o);
+        long[] hash = MurmurHash3.hash128(bytes);
+        BloomFilter target = new SimpleBloomFilter(collectons[0].getShape());
+        target.merge(new EnhancedDoubleHasher(hash[0], hash[1]));
+
+        for (BloomFilter candidate : candidates) {
+            if (candidate.contains(target)) {
+                doSomethingInteresting(candidate, item);
+            }
+        }
+    }
+
+    /**
+     * The interesting thing to do when a match occurs.
+     * @param collection the Bloom filter that matched.
+     * @param item The item that was being searched for.
+     */
+    abstract public void doSomethingInteresting(BloomFilter collection, Object 
item);
+
+}
+```
+
+If multiple filters of different shapes are being checked then use the 
`Hasher` to perform the check.  It will be more efficient than building a Bloom 
filter for each `Shape`.
+
+## Statistics
+
+The statistics that I mentioned in the previous bloom post are implemented in 
the `SetOperations` class.  The methods in this class accept `BitMapExtractor` 
arguments. They can be called with `BloomFilter` implementations or with 
`BitMapExtractor` implementations generated from arrays of longs or 
`IndexExtractors` or any other valid `BitMapExtractor` instance.
+
+## Review
+
+In this post we investigated the Apache Commons Collections implementation of 
Bloom filters and how to use them.  We touched on how they can be used to 
implement new designs of the standard components.  In the next post we will 
cover some of the more unusual usages and introduce some unusual 
implementations.
diff --git a/src/site/markdown/bloomFilters/intro.md 
b/src/site/markdown/bloomFilters/intro.md
new file mode 100644
index 000000000..ea14fd880
--- /dev/null
+++ b/src/site/markdown/bloomFilters/intro.md
@@ -0,0 +1,197 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+# Bloom Filters Part 1: An Introduction
+
+Bloom filters are the magical elixir often used to reduce search space and 
time.  They have other interesting properties that make them applicable in many 
situations where knowledge of the approximate size of a set, union, or 
intersection is important, or where searching vast datasets for small matching 
patterns is desired, and even in cases where it is desirable to search for data 
without disclosing the data being searched for, or the actual data found, to 
3rd parties.
+
+In this series of blog posts we’ll do a deep dive into Bloom filters.  In 
this, the first post, we will touch on the mathematics behind the filters and 
work through an example of their use, and explore some of their properties.  In 
later posts we will explore the Apache Commons Collections® implementation that 
is due out in version 4.5 of that library, discuss using Bloom filters for data 
sharding, and explore some of the unusual Bloom filters, like counting Bloom 
filters, stable Bloom f [...]
+
+Bloom filters are probably used on websites and applications you use every 
day.  They are used to track articles you’ve read, speed up bitcoin clients, 
detect malicious web sites, and improve the performance of caches.   We will 
get back to these later.
+
+## What?
+But let us start at the beginning.  Bloom filters are a probabilistic data 
structure frequently used to represent sets of objects.  They were invented by 
Burton Bloom<span><a class="footnote-ref" href="#fn1">1</a></span> in 1970.  A 
Bloom filter is an array of bits (bit vector) into which a set of values has 
been hashed, so some of the bit values are on (value one), or "enabled", and 
others are off (value zero), or "disabled".
+
+Multiple Bloom filters can be merged together by creating a union of the two 
bit vectors (V1 or V2).  The resulting Bloom filter (B) is said to contain both 
items.
+
+The equation for combining can be expressed as: \\( B = V1 \cup V2 \\)
+
+When searching Bloom filters we generate a Bloom filter for the item we are 
looking for (the target, T), and then calculate the intersection of T with the 
bit vector of the Bloom filter that may contain the value (the candidate, C).  
If the result is equal to T, then a match has been made.  This calculation can 
yield false positives but never false negatives.
+
+The equation for matching can be expressed as: \\( T \cap C = T \\)
+
+There are several properties that define a Bloom filter: the number of bits in 
the vector (`m`), the number of items that will be merged into the filter 
(`n`), the number of hash functions for each item (`k`), and the probability of 
false positives (`p`).  All of these values are mathematically related. 
Mitzenmacher and Upfal<span><a class="footnote-ref" href="#fn2">2</a></span> 
have shown that the relationship between these properties is
+
+\\[ p = \left( 1 - e^{-kn/m} \rigth) ^k \\]
+
+However, it has been reported  that the false positive rate in real 
deployments is higher than the value given by this equation.<span><a 
class="footnote-ref" href="#fn3">3</a></span><span><a class="footnote-ref" 
href="#fn4">4</a></span> Theoretically it has been proven that the equation 
offered a lower bound of the false positive rate, and a more accurate false 
positive rate has been discovered.<span><a class="footnote-ref" 
href="#fn5">5</a></span>
+
+The net result is that we can describe a Bloom filter with a “shape” and that 
shape can be derived from combinations of the properties for example from `(p, 
m, k)`, `(n, p)`, `(k, m)`, `(n, m)`, or `(n, m, k)`.
+
+To compare Bloom filters, they must have the same shape and use the same 
hashing functions.
+
+## Why?
+Bloom filters are often used to reduce the search space. For example, consider 
an application looking for a file which may occur on one of many systems. 
Without a Bloom filter, each system must be queried for the existence of the 
file. Generally this is a lengthy process. However, if a Bloom filter is 
created for each system, then the query could first check the filter.  If the 
filter indicates the file might be on the system, then the expensive lookup 
check is performed.  Because the Bl [...]
+
+Examples of large Bloom filter collections can be found in 
bioinformatics<span><a class="footnote-ref" href="#fn6">6</a></span><span><a 
class="footnote-ref" href="#fn7">7</a></span><span><a class="footnote-ref" 
href="#fn8">8</a></span> where Bloom filters are used to represent gene 
sequences, and Bloom filter based databases where records are encoded into 
Bloom filters.<span><a class="footnote-ref" href="#fn9">9</a></span><span><a 
class="footnote-ref" href="#fn10">10</a></span>
+
+Medium, the digital publishing company, uses Bloom filters to track what 
articles have been read.<span><a class="footnote-ref" 
href="#fn11">11</a></span>  Bitcoin uses them to speed up clients.<span><a 
class="footnote-ref" href="#fn12">12</a></span>  They have been used to improve 
caching performance<span><a class="footnote-ref" href="#fn13">13</a></span> and 
in detecting malicious websites.<span><a class="footnote-ref" 
href="#fn14">14</a></span>
+
+## How?
+So, let’s work through an example.  Let's assume we want to put 3 items in a 
filter `(n = 3)` with a 1/5 probability of collision `(p = 1/5 = 0.2)`.  
Solving \\( p = \left( 1 - e^{-kn/m} \right) ^k \\) yields  `m=11` and `k=3`.  
Thomas Hurst has provided an online calculator<span><a class="footnote-ref" 
href="#fn15">15</a></span> where you can explore the interactions between the 
values.
+
+Now that we know the shape of our Bloom filters, let’s populate one.  In this 
example we will be using a CRC hash; this is not recommended and is only used 
here for ease of example.  Also, we will be using a naive combinatorial hashing 
technique that should not be used in real life.
+
+We start by taking the CRC hash for "CAT" which is `FD2615C4`.  The naive 
combinatorial hashing technique splits the calculated hash into 2 unsigned 
values.  In this case the CRC value can be interpreted as 2 unsigned ints
+
+| Value        | Use     |
+|--------------|----------|
+| FD26 = 64806 | increment |
+| 15C4 = 5572  | initial  |
+
+We need to generate 3 hash values (`k`) using the naive combinatorial hashing 
algorithm.  The first hash value is the initial value.  The second hash value 
is the initial value plus the increment.  The third hash value is the 2nd hash 
value plus the increment.  This proceeds until the proper number of hash values 
have been generated.  After we generate the `k` values, take the modulus 
(remainder after division) of those numbers by the number of bits in the vector 
(`m`).
+
+| Name | Calculation | Value | Value mod(11) |
+|------|-------------|-------|---------------|
+| k1  | 5572 | 5572 | 6 |
+ | k2 | 5572 + 64608 | 70180 | 0 |
+| k3 | 5572 + 64608 + 64608 | 134788 | 5 |
+
+
+This yields a Bloom filter of `00001100001` or  \\(\\{0,5,6\\}\\).  In the 
binary form we enumerate the bits from left to right as they would be displayed 
in a big-endian unsigned integer where the bit n is represented by \\(2^n\\). 
Performing the same operations on "DOG" and "GUINEA PIG" yields:
+
+| Name | CRC | Bit Set              | Bloom filter |
+|------|-----|----------------------|--------------|
+| CAT | FD26 15C4 | \\(\\{0,5,6\\}\\)        | 00001100001 |
+| DOG | 3560 D2EF | \\(\\{2\\}\\)            | 00000000100 |
+| GUINEA PIG | E58C A739 | \\(\\{2,7,10\\}\\)       | 10010000100 |
+| Collection | | \\(\\{0,2,5,6,7,10\\}\\) | 10011100101 |
+
+The collection is the union of the other three values.  This represents the 
set of animals.
+
+The interesting one in this collection is DOG.  When we execute the naive 
combinatorial hashing algorithm on the DOG hash, every value is 2.  We’ll come 
back to this in a moment.
+
+If we perform the same calculations on “HORSE”, we get \\(\\{2,5,9\\}\\).  Now 
to see if HORSE is in our collection we solve \\(\\{2,5,9\\} \cap 
\\{0,2,5,6,7,10\\} = \\{2,5,9\\} \longrightarrow \\{2,5\\} \ne \\{2,5,9\\}\\) . 
 So HORSE is not in the collection.
+
+If we only put CAT and GUINEA PIG into the collection, we get the same result 
for the collection.  But when testing for DOG we get the true statement 
\\(\\{2\\} \cap \\{0,2,5,6,7,10\\} = \\{2\\}\\). The filter says that DOG is in 
the collection.  This is an example of a false positive result.
+
+Dog also shows the weakness of the naive combinatorial hashing technique.   A 
proper implementation can be found in the EnhancedDoubleHashing class in the 
Apache Commons Collections version 4.5 library.  In this case, a tetrahedral 
number is added to the increment to reduce the probability of a single bit 
being selected over the course of the hash.
+
+## Statistics?
+
+Bloom filters lend themselves to several statistics.  The most common is the 
Hamming value or cardinality.  This is simply the number of bits that are 
enabled in the vector.  From this value a number of statistics can be 
calculated.
+
+The hamming distance is the number of bits that have to be flipped to convert 
one Bloom filter to another.  For example to convert Cat \\(\\{0,5,6\\}\\) to 
horse \\(\\{2,5,9\\}\\), we have to turn off bits \\(\\{0,6\\}\\) and turn on 
bits \\(\\{2,9\\}\\) so the hamming distance is 4.  Bloom filters with lower 
hamming distances are in some sense similar.
+
+Another measure of similarity is the Cosine similarity also known as Orchini 
similarity, Tucker coefficient of congruence or Ochiai similarity.  To 
calculate it the cardinality of the intersection (bitwise ‘and’) of the two 
filters is then divided by the square root of the cardinality of the first 
filter times the cardinality of the second filter.  The result is a number in 
the range \\([0,1]\\).
+
+The cosine distance is calculated as `1 - cosine similarity`.
+
+
+The final measure of similarity that we will cover is the Jaccard similarity 
also known as the Jaccard Index, Intersection over Union, and Jaccard 
similarity coefficient.  To calculate the Jaccard index the cardinality of the 
intersection (bitwise ‘and’)  of the two Bloom filters is calculated.  This 
value is divided by the cardinality of the union (bitwise ‘or’) of the two 
Bloom filters.
+
+The Jaccard distance, like the cosine distance, is calculated as `1 - Jaccard 
similarity`.
+
+The similarity and distance statistics can be used to group similar Bloom 
filters together; for example when distributing files across a system that uses 
Bloom filters to determine where the file might be located.  In this case it 
might make sense to store Bloom filters in the collection that has minimal 
distance.
+
+In addition to basic similarity and difference, if the shape of the filter is 
known some information about the data behind the filters can be estimated.  For 
example the number of items merged into a filter (n) can be estimated provided 
we have the cardinality (`c`), number of bits in the vector (`m`) and the 
number of hash functions (`k`) used when adding each element.
+
+\\[ n = \frac{-m ln(1 - c/m)}{k} \\]
+
+Estimating the size of the union of two filters is simply calculating n for 
the union (bitwise ‘or’) of the two filters.
+
+Estimating the size of the intersection of two filters is the estimated n of 
the first + the estimated n of the second - the estimated union of the two.  
There are some tricky edge conditions, such as when one or both of the 
estimates of n is infinite.
+
+## Usage Errors
+There are several places that errors can creep into Bloom filter usage.
+
+### Saturation errors
+
+Saturation errors arise from under estimating the number of items that will be 
placed into the filter.  Let’s define  “saturation” as the number of items 
merged into a filter divided by the number of items specified in the Shape.  
Then once `n` items have been inserted the filter is at a saturation of 1.  As 
the saturation increases the false positive rate increases.   Using the 
calculation for the false positive rate noted above, we can calculate the 
expected false positive rate for the [...]
+
+For Bloom filters defined as k=17 and n=3 the calculation yields m=72, and 
p=0.00001.  As the saturation increases the rates of false positives increase 
as per the following table:
+
+| Saturation | Probability of false positive |
+|------------|-------------------------------|
+| 1 | 0.000010 |
+| 2 | 0.008898 |
+| 3 | 0.115070 |
+| 4 | 0.356832 |
+| 5 | 0.606726 |
+
+
+The table shows that the probability of false positives is two orders of 
magnitude larger when the saturation reaches two.  After five times the 
estimated number of items have been added the false positive rate is so high as 
to make the filter useless.
+
+### Hashing errors
+
+A second focus of errors is the generation of the hashes.
+
+If a combinatorial hashing algorithm is used and the number of bits is 
significantly higher than the midpoint of the hash values then the generated 
values will be weighted to the lower bits.  For example if byte values were 
used to set the increment and the initial values but the number of bits was in 
excess of 255, then the higher valued bits could not be selected on the first 
hash, and in all cases values far above 255 would be rarely selected.
+
+## Review
+So far we have covered what Bloom filters are, why we use them, how to 
construct and use them, explored a few statistics, and looked at potential 
problems arising from usage errors.  In the next post we will see how the 
Apache Commons Collections code implements Bloom filters and look at how to 
implement the common usage patterns using the library.
+
+## Footnotes
+
+<span>
+<ol class="footnotes>">
+<li><a id='fn1'></a>
+Burton H. Bloom. 1970.   “Space/Time Trade-offs in Hash Coding with Allowable 
Errors". Commun. ACM, 13, 7 (July 1970), 422–426
+</li>
+<li><a id='fn2'></a>
+Michael Mitzenmacher and Eli Upfal. 2005. Probability and computing: 
Randomized algorithms and probabilistic analysis. Cambridge University Press, 
Cambridge, Cambridgeshire, UK. 109–111, 308 pages.
+</li>
+<li><a id='fn3'></a>
+L. L. Gremillion, “Designing a Bloom filter for differential file access,” 
Communications of the ACM, vol. 25, no. 7, pp. 600-604, 1982. 
+</li>
+<li><a id='fn4'></a>
+J. K. Mullin, “A second look at Bloom filters,” Communications of the ACM, 
vol. 26, no. 8, 1983
+</li>
+<li><a id='fn5'></a>
+P. Bose, H. Guo, E. Kranakis, “On the false-positive rate of Bloom filters,” 
Information Processing Letters, vol. 108, no. 4, pp. 210-213, 2008.
+</li>
+<li><a id='fn6'></a>
+Henrik Stranneheim, Max Käller, Tobias Allander, Björn Andersson, Lars 
Arvestad, and Joakim Lundeberg. 2015. Classification of DNA sequences using 
Bloom filters. Bioinfomatics 26, 13 (July 2015), 1595–1600. <a 
href="https://doi.org/10.1093/bioinformatics/btq230";>https://doi.org/10.1093/bioinformatics/btq230</a>
+</li>
+<li><a id='fn7'></a>
+Justin Chu, Sara Sadeghi, Anthony Raymond, Shaun D. Jackman, Ka Ming Nip, 
Richard Mar, Hamid Mohamadi, Yaron S. Butterfield, A. Gordon Robertson, and 
Inanç Birol. 2014. BioBloom tools: fast, accurate and memory-efficient host 
species sequence screening using bloom filters.  Bioinfomatics 30, 23 (Dec. 
2014), 302–304. <a 
href="https://doi.org/10.1093/bioinformatics/btu558";>https://doi.org/10.1093/bioinformatics/btu558</a>
+</li>
+<li><a id='fn8'></a>
+Brad Solomon and Carl Kingsford. 2016. Fast Search of Thousands of Short-Read 
Sequencing Experiments. Nature Biotechnology 34, 3 (March 2016), 300–302. <a 
href="https://doi.org/10.1038/nbt.3442";>https://doi.org/10.1038/nbt.3442</a>
+</li>
+<li><a id='fn9'></a>
+Steven M Bellovin and William R Cheswick. 2004. Privacy-Enchanced Searches 
Using Encrypted Bloom Filters". <a 
href="https://mice.cs.columbia.edu/getTechreport.php?techreportID=483";>https://mice.cs.columbia.edu/getTechreport.php?techreportID=483</a>
+</li>
+<li><a id='fn10'></a>
+Arisa Tajima, Hiroki Sato, and Hayato Yamana. 2018. Privacy-Preserving Join 
Processing over outsourced private datasets with Fully Homomorphic Encryption 
and Bloom Filters. <a 
href="https://db-event.jpn.org/deim2018/data/papers/201.pdf";>https://db-event.jpn.org/deim2018/data/papers/201.pdf</a>
+</li>
+<li><a id='fn11'></a>
+Jamie Talbot, Jul 15, 2015, What are Bloom filters? : A tale of code, dinner, 
and a favour with unexpected consequences., The Medium Blog, <a 
href="https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff";>https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff</a>
+</li>
+<li><a id='fn12'></a>
+BitcoinDeveloper, Documentation, 
https://developer.bitcoin.org/search.html?q=bloom+filter
+</li>
+<li><a id='fn13'></a>
+<a 
href="https://en.wikipedia.org/wiki/Bloom_filter#Cache_filtering";>https://en.wikipedia.org/wiki/Bloom_filter#Cache_filtering</a>
+</li>
+<li><a id='fn4'></a>
+K. Nandhini and R. Balasubramaniam, "Malicious Website Detection Using 
Probabilistic Data Structure Bloom Filter," 2019 3rd International Conference 
on Computing Methodologies and Communication (ICCMC), Erode, India, 2019, pp. 
311-316, doi: 10.1109/ICCMC.2019.8819818.
+</li>
+<li><a id='fn15'></a>
+Thomas Hurst, Bloom Filter Calculator. <a 
href="https://hur.st/bloomfilter/";>https://hur.st/bloomfilter/</a>
+</li>
+</ol>
+</span> 
diff --git a/src/site/markdown/bloomFilters/multidimensional.md 
b/src/site/markdown/bloomFilters/multidimensional.md
new file mode 100644
index 000000000..10f1fdbcc
--- /dev/null
+++ b/src/site/markdown/bloomFilters/multidimensional.md
@@ -0,0 +1,110 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+# Bloom Filters Part 4: Bloom filters for indexing
+
+In many cases Bloom filters are used as gatekeepers; that is, they are queried 
before attempting a longer operation to see if the longer operation should be 
executed.  However, there is another type of Bloom filter: the reference type.  
The reference type contains the hashed values for the properties of a single 
object.  For example a Person object might have the fields  name, date of 
birth, address, and phone number.  Each of those could be hashed and combined 
into a single Bloom filter [...]
+
+We can use reference Bloom filters to index data by storing the Bloom filter 
along with a record identifier that can be used to retrieve the data.  The 
simplest solution is create a list of reference Bloom filters and their 
associated record identifiers and then perform the linear search for matches.  
For every Bloom filter that matches the search, return the associated record 
identifier.
+
+Searching can be performed by creating a target Bloom filter with partial 
data, for example name and date of birth from the person example, and then 
searching through the list as described above.  The associated records either 
have the name and birthdate or are false positives and need to be filtered out 
during retrieval.
+
+## Multidimensional Bloom filters
+
+The description above is a multidimensional Bloom filter.  A multidimensional 
Bloom filter is simply a collection of searchable filters, the simplest 
implementation being a list.  In fact, for fewer than 10K filters the list is 
the fastest possible solution.  There are two basic reasons for this:
+  * Bloom filter comparisons are extremely fast taking on approximately five 
(5) machine instructions for the simple comparison.
+ * Bloom filters do not have an obvious natural order that can be used to 
reduce the search space without incurring significant overhead.  The amount of 
overhead often overwhelms  the advantage of the index.
+
+There are, however, several multidimensional Bloom filter algorithms among 
them are: Bloofi, Flat Bloofi, BF-Trie, Hamming Skip List, Sharded List, and 
Natural Bloofi.
+
+### Bloofi
+Bloofi is a technique that uses a B+-tree structure where the inner nodes are 
merges of the Bloom filters below and the leaf nodes contain the actual Bloom 
filters.<span><a class="footnote-ref" href="#fn1">1</a></span>  This technique 
works well for Bloom filters that are not densely populated (i.e. low 
saturation) and are designed with a very small false positive rate.
+
+Bloofi is extremely fast when searching; however, inserting often requires 
updates to multiple inner nodes.  Bloofi supports deletion, but deletion can 
also generate updates to multiple inner nodes.
+
+### Flat Bloofi
+Flat-Bloofi, expands each bit into a bit vector representing which Bloom 
filters in the index contain the specific bit.  Conceptually this is a bit 
matrix, with the columns being the bits in the Bloom filter and the rows the 
bitMaps for the indexed Bloom filters.  During insert, the Bloom filter being 
inserted is given an index number (row).  In each bit vector associated with 
each enabled bit in the Bloom filter (column) the index number bit is enabled.  
During a search, all of the bit  [...]
+
+This solution is consistently among the fastest solutions of all solutions 
presented here.  It is often the fastest or second fastest.  It supports 
deletion through the addition of a list of deleted rows and reuse of space in 
the vectors.  
+
+### BF-Trie
+
+BF-Trie creates a trie based on the byte values in the filters.  It has the 
same insert, delete, and update characteristics you would expect from a trie 
structure.
+
+During searching there is an expansion factor that has to be taken into 
account.  Every zero in the target filter yields two possible solutions.  For 
example the byte 0xFA has 4 potential matches (see table below) that have to be 
included in the search.  This is performed by exploring multiple paths through 
the trie while finding the solution.
+
+| code | matching pattern |
+| ---- | -----------------|
+| 0xFA | 1111 1010 |
+| 0XFB | 1111 1011 |
+| 0xFE | 1111 1110 |
+| 0xFF | 1111 1111|
+
+
+### Hamming Skip List
+
+Conceptually the Hamming skip list is an implementation of a two segment key.  
This index arises from the observation that no target can match a filter with a 
lower hamming value.  In addition, if the Bloom filter bit vector is 
interpreted as a very large unsigned integer no target can match a filter of a 
lower value.  Since we have a binary representation of the very large unsigned 
integer we can calculate \\( log_2 \\) of the value.  Now we construct a skip 
list based on the hamming va [...]
+
+This solution suffers from the clustering of hamming values.  Most hamming 
values will cluster around the number of values that were merged times the 
number of hash functions \\( (n \times k) \\) adjusted for the expected 
collision rate.  So the hamming value only provides a strong selector when the 
hamming value of the target is close to the saturation of the indexed filter.  
The \\( log_2 \\) index is fairly evenly distributed in the upper range and 
really only provides a strong select [...]
+
+The Hamming Skip List is a good simple implementation for architectures that 
use relational databases or other environments where multi-segmented numerical 
indexes are present.
+
+### Sharded List
+
+A sharded list is a collection of lists of Bloom filters and builds upon the 
sharding solution presented in part 3 of this series.  In this instance, as a 
Bloom filter is added to the index the filter is hashed and a Bloom filter 
created from that hash.  The Bloom filter’s Bloom filter is then used to 
determine which list to add the filter to.  When a collection reaches capacity 
(as defined by the Shape of the Bloom filter’s Bloom filter), it is removed 
from consideration for further ins [...]
+
+This solution utilizes the rapidity of the standard list solution, while 
providing a mechanism to handle more than 10K filters at a time.
+
+### Natural Bloofi
+
+Natural Bloofi uses a Tree structure like Bloofi does except that each node in 
the tree is a filter that was inserted into the index.<span><a 
class="footnote-ref" href="#fn2">2</a></span>  Natural Bloofi operates like the 
sharded list except that if the Bloom filter for a node is contained by a node 
in the list then it is made a child of that node.  If the Bloom filter node 
contains a node in the list, then it becomes the parent of that node.  This 
yields a flat Bloofi tree where the mor [...]
+
+## Encrypted indexing
+
+The idea of using Bloom filters for indexing encrypted data is not a new 
idea.<span><a class="footnote-ref" href="#fn3">3</a></span><span><a 
class="footnote-ref" href="#fn4">4</a></span><span><a class="footnote-ref" 
href="#fn5">5</a></span><span><a class="footnote-ref" href="#fn6">6</a></span>  
  The salient points are that Bloom filters are a very effective one way hash 
with matching capabilities.  The simplest solution is to create a reference 
Bloom filter comprising the plain text of  [...]
+
+When searching such an index, the desired plain text values are hashed into a 
Bloom filter and sent to the storage engine.  The engine finds all the matching 
Bloom filters and returns the encrypted blobs associated with them.  The client 
then decrypts the blobs and removes any false positives.
+
+The technique ensures that the plain text data never leaves the client’s 
system and guarantees that the service has no access to the plain text of the 
stored data.
+
+## Review
+
+In this section we discussed multidimensional Bloom filters and presented 
several implementations.  We also explored the idea of an encrypted database 
where the data in transit and at rest is encrypted or at least strongly hashed.
+
+I hope that over the course of this series you have developed a deeper 
understanding of Bloom filters, their construction and how they can be applied 
to various technical problems.
+
+<span>
+<ol class="footnotes>">
+<li><a id='fn1'></a>
+Adina Crainiceanu. 2013. Bloofi: a hierarchical Bloom filter index with 
applications to distributed data provenance. In Proceedings of the 2nd 
International Workshop on Cloud Intelligence. ACM, New York, NY, USA. 
https://doi.org/10.1145/2501928.2501931
+</li>
+<li><a id='fn2'></a>
+Adina Crainiceanu and Daniel Lemire. 2015. Bloofi: Multidimensional Bloom 
filters. Information Systems 54, C (Dec. 2015), 311–324. 
https://doi.org/10.1016/j.is.2015.01.002
+</li>
+<li><a id='fn3'></a>
+Yan-Cheng Chang and Michael Mitzenmacher. Privacy Preserving Keyword Searches 
on Remote Encrypted Data. Accessed on 18-Dec-2019. 2004. url: 
https://eprint.iacr.org/2004/051.pdf.
+</li>
+<li><a id='fn4'></a>
+Steven M Bellovin and William R Cheswick”. Privacy-Enchanced Searched Using 
Encrypted Bloom Filters”. Accessed on 18-Dec-2019. 2004. url: 
https://mice.cs.columbia.edu/getTechreport.php?techreportID=483.
+</li>
+<li><a id='fn5'></a>
+Eu-Jin Goh. Secure Indexes. Accessed on 18-Dec-2019. 2004. url: 
https://crypto.stanford.edu/~eujin/papers/secureindex/ secureindex.pdf.
+</li>
+<li><a id='fn6'></a>
+Arisa Tajima, Hiroki Sato, and Hayato Yamana. Privacy-Preserving Join 
Processing over outsourced private datasets with Fully Homomorphic Encryption 
and Bloom Filters. Accessed on 18-Dec-2019. 2018. url: 
https://db-event.jpn.org/deim2018/data/papers/201.pdf
+</li>
+</ol>
+</span>
diff --git a/src/site/site.xml b/src/site/site.xml
index 48df0c5d3..09fabc2b7 100644
--- a/src/site/site.xml
+++ b/src/site/site.xml
@@ -23,6 +23,26 @@
     </bannerRight>
 
     <body>
+        <head>&lt;script type="text/javascript" id="MathJax-script" 
async="async" 
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-AMS-MML_HTMLorMML"&gt;&lt;/script&gt;
+        &lt;style&gt;
+            .footnotes ol li p {
+            display: inline;
+            }
+            .footnote-ref {
+             font-size: .83em ;
+             vertical-align: super ;
+             font-weight:bold;
+            }
+            .footnote-ref:after {
+            content: "\00a0";
+            }
+            .bodyTable {
+            width : max-content;
+            border: 1px solid black;
+            }
+
+            &lt;/style&gt;
+        </head>
         <menu name="Commons Collections">
             <item name="Overview" href="/index.html" />
             <item name="Download" href="/download_collections.cgi" />
diff --git a/src/site/xdoc/bloomFilters.xml b/src/site/xdoc/bloomFilters.xml
deleted file mode 100644
index 3f551d773..000000000
--- a/src/site/xdoc/bloomFilters.xml
+++ /dev/null
@@ -1,1137 +0,0 @@
-<?xml version="1.0"?>
-<!-- Licensed to the Apache Software Foundation (ASF) under one or more 
contributor
-       license agreements. See the NOTICE file distributed with this work for 
additional
-       information regarding copyright ownership. The ASF licenses this file to
-       You under the Apache License, Version 2.0 (the "License"); you may not 
use
-       this file except in compliance with the License. You may obtain a copy 
of
-       the License at http://www.apache.org/licenses/LICENSE-2.0 Unless 
required
-       by applicable law or agreed to in writing, software distributed under 
the
-       License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR 
CONDITIONS
-       OF ANY KIND, either express or implied. See the License for the specific
-       language governing permissions and limitations under the License. -->
-
-<document>
-
-       <properties>
-               <title>Bloom filters</title>
-               <author email="d...@commons.apache.org">Commons Documentation 
Team</author>
-       </properties>
-
-       <head>
-               <link rel="stylesheet" href="./css/LaTeXML.css" type="text/css" 
/>
-               <link rel="stylesheet" href="./css/references.css"
-                       type="text/css" />
-               <script type="text/javascript"
-                       
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.6/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
 />
-       </head>
-
-       <body>
-               <section name='Contents'>
-                       <ol>
-                               <li>
-                                       <a href="#S1" title="1 Introducing 
Bloom filters">
-                                               Introducing Bloom filters
-                                       </a>
-                                       <ol>
-                                               <li>
-                                                       <a href="#S1.SS1" 
title="1.1 Understanding Bloom filters">
-                                                               Understanding 
Bloom filters
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S1.SS2" 
title="1.2 Defining a Bloom filter">
-                                                               Defining a 
Bloom filter
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S1.SS3" 
title="1.3 Constructing a Bloom filter">
-                                                               Constructing a 
Bloom filter
-                                                       </a>
-                                               </li>
-                                       </ol>
-                               </li>
-                               <li>
-                                       <a href="#S2" title="2. Exploring the 
Classes">
-                                               Exploring the Classes
-                                       </a>
-                                       <ol>
-                                               <li>
-                                                       <a href="#S2.SS1"
-                                                               title="2.1 
Exploring the HashFunctionIdentity and HashFunction">
-                                                               Exploring the 
HashFunctionIdentity and HashFunction
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S2.SS2" 
title="2.2 Exploring the Shape">
-                                                               Exploring the 
Shape
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S2.SS3" 
title="2.3 Exploring the Hasher">
-                                                               Exploring the 
Hasher
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S2.SS4" 
title="2.4 Exploring the BloomFilter">
-                                                               Exploring the 
BloomFilter
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S2.SS5"
-                                                               title="2.5 Why 
Hasher, Shape and BloomFilter?">
-                                                               Why Hasher, 
Shape and BloomFilter?
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S2.SS6" 
title="2.6 Build your own BloomFilter">
-                                                               Build your own 
BloomFilter
-                                                       </a>
-                                               </li>
-                                       </ol>
-                               </li>
-                               <li>
-                                       <a href="#S3" title="3 Examples">
-                                               Examples
-                                       </a>
-                                       <ol>
-                                               <li>
-                                                       <a href="#S3.SS1"
-                                                               title="3.1 How 
to use a Bloom filter as a gatekeeper">
-                                                               Example: How to 
use a Bloom filter as a gatekeeper
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S3.SS2" 
title="3.2 Data sharding">
-                                                               Example: Data 
sharding
-                                                       </a>
-                                               </li>
-                                               <li>
-                                                       <a href="#S3.SS3" 
title="3.3 Future directions">
-                                                               Future 
directions
-                                                       </a>
-                                               </li>
-                                       </ol>
-                               </li>
-                               <li>
-                                       <a href="#S4" title="4 References">
-                                               References
-                                       </a>
-                               </li>
-                       </ol>
-               </section>
-
-
-               <section id="S1" name="1. Introducing Bloom filters">
-
-                       <subsection id="S1.SS1" name="1.1 Understanding Bloom 
filters">
-                               <p>
-                                       Bloom filters were introduced to Apache
-                                       commons-collections in
-                                       version 4.5. A Bloom filter:
-                               </p>
-                               <ul>
-                                       <li>
-                                               Is a probabilistic data 
structure defined by Burton Bloom in
-                                               1970&#160;<cite>[<a 
href='#Bloom'>Bloom</a>]</cite>;
-                                       </li>
-                                       <li>
-                                               Can be considered as a bit 
vector representing a set of
-                                               objects;
-                                       </li>
-                                       <li>
-                                               Is constructed by hashing data 
multiple times;
-                                       </li>
-                                       <li>
-                                               Identifies where things are
-                                               <b>not</b>
-                                               ;
-                                       </li>
-                                       <li>
-                                               Will yield false positives but
-                                               <b>never</b>
-                                               false negatives.
-                                       </li>
-                                       <li>Is typically used where hash table 
solutions are
-                                               too
-                                               memory
-                                               intensive and false positives 
can be addressed;
-                                               for example
-                                               a gating
-                                               function to a longer operation 
(e.g.
-                                               determine if a
-                                               database lookup
-                                               should be made);
-                                       </li>
-                               </ul>
-                       </subsection>
-
-                       <subsection id="S1.SS2" name="1.2 Defining a Bloom 
filter">
-
-                               <p>Bloom filters are constrained by: the number 
of
-                                       elements in the
-                                       set, the number of hash functions, the 
number
-                                       of bits in the
-                                       vector,
-                                       and the probability of false positives.
-                               </p>
-                               <ul>
-                                       <li>
-                                               \(p\)
-                                               - is the probability of false 
positives,
-                                       </li>
-                                       <li>
-                                               \(n\)
-                                               - is the number of elements in 
the set represented by the
-                                               Bloom
-                                               filter,
-                                       </li>
-                                       <li>
-                                               \(m\)
-                                               - is the number of bits, and
-                                       </li>
-                                       <li>
-                                               \(k\)
-                                               - is the number of hash 
functions.
-                                       </li>
-                               </ul>
-                               <p>
-                                       Mitzenmacher and Upfal&#160;<cite>[<a 
href='#Mitzenmacher'>Mitzenmacher</a>]</cite>
-                                       have shown that the relationship 
between these properties is:
-                                       \(p =
-                                       (1- e^{-kn/m})^k\)
-
-                               </p>
-
-                               <p>
-                                       Thomas Hurst provides a good 
calculator&#160;<cite>[<a href='#Hurst'>Hurst</a>]</cite>
-                                       where the interplay between these 
values can be explored.
-                               </p>
-
-                       </subsection>
-                       <subsection id="S1.SS3" name="1.3 Constructing a Bloom 
filter">
-
-                               <p>
-                                       A simple Bloom filter is constructed by 
hashing an object
-                                       \(k\)
-                                       times, and using each hash value to 
turn on a single bit in a
-                                       bit
-                                       vector comprising
-                                       \(m\)
-                                       bits.
-                               </p>
-
-                               <div id="algorithm1" class="ltx_float">
-                                       <div class="ltx_listing 
ltx_lst_numbers_left ltx_listing">
-                                               <div class="ltx_listingline">
-                                                       <span class="ltx_text 
ltx_font_bold">Result:</span>
-                                                       A populated bit vector
-                                               </div>
-                                               <div class="ltx_listingline"> 
byte[][] buffers // the list of buffers to hash
-                                               </div>
-                                               <div class="ltx_listingline"> 
bit[m] bitBuffer;
-                                               </div>
-                                               <div class="ltx_listingline">
-                                                       <span class="ltx_text 
ltx_font_bold">for</span>
-                                                        
-                                                       <em class="ltx_emph 
ltx_font_italic">buffer in buffers</em>
-                                                        
-                                                       <span class="ltx_text 
ltx_font_bold">do</span>
-                                               </div>
-                                               <div class="ltx_listingline">
-                                                         
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                          
-                                                       <span class="ltx_text 
ltx_font_bold">for</span>
-                                                        
-                                                       <em class="ltx_emph 
ltx_font_italic">i=1 to k</em>
-                                                        
-                                                       <span class="ltx_text 
ltx_font_bold">do</span>
-                                               </div>
-                                               <div class="ltx_listingline">
-                                                         
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                            
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                           long h = hash( 
buffer, i );
-                                               </div>
-                                               <div class="ltx_listingline">
-                                                         
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                            
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                           int bitIdx = h mod 
m;
-                                               </div>
-                                               <div class="ltx_listingline">
-                                                         
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                            
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                           bitBuffer[bitIdx] = 
1;
-                                               </div>
-                                               <div class="ltx_listingline">
-                                                         
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                            
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                          
-                                               </div>
-                                               <div class="ltx_listingline">
-                                                         
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                           end for
-                                               </div>
-                                               <div class="ltx_listingline">
-                                                         
-                                                       <span class="ltx_rule"
-                                                               
style="width:1px;height:100%;background:black;display:inline-block;"> </span>
-                                                          
-                                               </div>
-                                               <div class="ltx_listingline"> 
end for
-                                               </div>
-                                               <div class="ltx_listingline" />
-
-                                       </div>
-                                       <div>
-                                               <span class="ltx_tag 
ltx_tag_float">
-                                                       <span class="ltx_text 
ltx_font_bold">Algorithm 1</span>
-                                               </span>
-                                               How to construct a Bloom filter
-                                       </div>
-                               </div>
-
-
-                               <p>Building a bloom filter with the Apache
-                                       Commons-Collections
-                                       implementation looks like:
-                               </p>
-                               <source><![CDATA[
-// select a hash function
-HashFunction hFunc = new Murmur128x86Cyclic();
-// define the shape - 7 items. 1 in 1000 probability of collisions
-Shape shape = new Shape ( hFunc , 7, 1.0/1000);
-// create a hasher.
-DynamicHasher hasher = new DynamicHasher.Builder( hFunc )
-    .with( "banana" ).with( "apple" ).with( "pear" ).build();
-// create the bloom filter
-BloomFilter filter = new BitSetBloomFilter ( hasher , shape ); ]]></source>
-
-
-                               <p>This looks more complex than the simple 
Hadoop Bloom
-                                       filters.
-                                       Creating the same filter using the 
Hadoop library look
-                                       like:
-                               </p>
-                               <source><![CDATA[
-//define the filter
-BloomFilter hadoopBloomFilter = new BloomFilter (1 _000 , 7, Hash.MURMUR_HASH 
);
-// add the fruits
-hadoopBloomFilter.add( new Key( "banana".getBytes() ) );
-hadoopBloomFilter.add( new Key( "apple".getBytes() ) );
-hadoopBloomFilter.add( new Key( "pear".getBytes () ) ); ]]></source>
-
-                               <p>
-                                       However, the Commons Collection Bloom 
filter library is
-                                       designed to
-                                       meet
-                                       <b>most</b>
-                                       Bloom filter use cases. Covering the 
wide variety of use cases
-                                       means that the library provides more 
options and therefore
-                                       appears
-                                       more complex. Any application that 
wants a simple,
-                                       consistent,
-                                       cross
-                                       platform Bloom filter with a tested
-                                       implementation can build
-                                       it very
-                                       quickly from the Commons
-                                       Collections Bloom filter library.
-                               </p>
-
-                       </subsection>
-               </section>
-               <section id="S2" name="2. Exploring the Classes">
-
-                       <p>The Commons Collections Bloom filter library has 
five (5) major
-                               components.
-                       </p>
-                       <ol>
-                               <li>
-                                       <code>HashFunctionIdentity</code>
-                                       - An interface that defines a hash 
function.
-                               </li>
-                               <li>
-                                       <code>HashFunction</code>
-                                       -
-                                       An interface that extends
-                                       <code>HashFunctionIdentity</code>
-                                       to include a the hash function 
implementation. The hash
-                                       function
-                                       converts a
-                                       <code>byte[]</code>
-                                       into a
-                                       <code>long</code>
-                                       value using a seed.
-                               </li>
-                               <li>
-                                       <code>Hasher</code>
-                                       - A class that, when provided with a
-                                       <code>Shape</code>
-                                       , converts one
-                                       or more
-                                       <code>byte[]</code>
-                                       into an
-                                       <code>iterator&lt;int&gt;</code>
-                                       representing
-                                       the bits that are enabled in the Bloom 
filter.
-                               </li>
-                               <li>
-                                       <code>Shape</code>
-                                       - A class that describes the
-                                       shape of the Bloom filter. The shape is
-                                       defined by:
-                                       <ul>
-                                               <li>
-                                                       
<code>HashFunctionIdentity</code>
-                                               </li>
-                                               <li>
-                                                       The probability of 
false positives (
-                                                       \(p\)
-                                                       )
-                                               </li>
-                                               <li>
-                                                       The number of elements 
in the set represented by the Bloom filter
-                                                       (
-                                                       \(n\)
-                                                       )
-                                               </li>
-                                               <li>
-                                                       The number of bits (
-                                                       \(m\)
-                                                       )
-                                               </li>
-                                               <li>
-                                                       The number of hash 
functions (
-                                                       \(k\)
-                                                       )
-                                               </li>
-                                       </ul>
-                               </li>
-                               <li>
-                                       <code>BloomFilter</code>
-                                       - An interface that defines the 
standard functions that a Bloom
-                                       filter must
-                                       provide. Implementations generally take 
a
-                                       <code>Hasher</code>
-                                       and a
-                                       <code>Shape</code>
-                                       and converts the
-                                       <code>iterator&lt;int&gt;</code>
-                                       from the
-                                       <code>Hasher</code>
-                                       and preserves them in an internal 
representation.
-                               </li>
-                       </ol>
-                       <subsection id="S2.SS1"
-                               name="2.1 Exploring the HashFunctionIdentity
-                               and HashFunction">
-
-                               <p>
-                                       The
-                                       <code>HashFunctionIdentity</code>
-                                       is an interface that defines three (3) 
basic attributes of a
-                                       <code>HashFunction</code>
-                                       :
-                               </p>
-
-                               <ul>
-                                       <li>
-                                               The common hash function
-                                               name. This is a value like
-                                               <b>MD5</b>
-                                               or
-                                               <b>Murmur3-128-x86</b>
-                                       </li>
-                                       <li>
-                                               The signedness of the 
calculation. When a
-                                               <code>byte[]</code>
-                                               is interpreted as a numerical 
value is it assumed to be signed or
-                                               unsigned?
-                                       </li>
-                                       <li>
-                                               The process type. In a 
traditional Bloom filter
-                                               calculation the
-                                               hash function is called 
multiple times with different seeds.
-                                               This
-                                               technique is called iterative 
hashing. However Kirsch and
-                                               Mitzenmacher&#160;<cite>[<a 
href='#kirsch'>kirsch</a>]</cite>
-                                               have shown that it is possible 
to more
-                                               quickly generate the
-                                               necessary values by utilizing 
two (2) values from
-                                               the
-                                               hashing. This
-                                               is called cyclic hashing.
-                                       </li>
-                               </ul>
-
-                               <p>
-                                       The
-                                       <code>HashFunctionIdentity</code>
-                                       is used in situations
-                                       where the attributes of the
-                                       <code>HashFunction</code>
-                                       are required but
-                                       the implementation is not. For example, 
a Bloom
-                                       filter need only know
-                                       what
-                                       the attributes of the
-                                       <code>HashFunction</code>
-                                       that generated the bits
-                                       was, it does not require the 
implementation.
-                               </p>
-
-                               <p>
-                                       The
-                                       <code>HashFunctionIdentity</code>
-                                       also has a signature created
-                                       by hashing the string produced by:
-                                       <code>String.format( "%s-%s-%s", 
getName(),
-                                               getSignedness(),getProcess() 
).getBytes( "UTF-8" )
-                                       </code>
-                                       with a seed of
-                                       0 (zero). The signature is used in 
quick comparison
-                                       checks to ensure
-                                       function
-                                       compatibility.
-                               </p>
-
-                               <p>
-                                       The
-                                       <code>HashFunction</code>
-                                       is an interface that extends
-                                       <code>HashFunctionIdentity</code>
-                                       with an method to access the hash 
function implementation.
-                                       <code>HashFunction</code>
-                                       implementations are often used where the
-                                       <code>HashFunctionIdentity</code>
-                                       is required as they are the only 
provided implementations of
-                                       <code>HashFunctionIdentity</code>
-                                       .
-                               </p>
-
-                               <p>
-                                       <code>HashFunction</code>
-                                       implementations include:
-                               </p>
-                               <ul>
-                                       <li>MD5Cyclic
-                                               - MD5 hash used in a cyclic 
manner.
-                                       </li>
-                                       <li>Murmur128x86Cyclic
-                                               - Murmur3 128-bit x86 hash 
implementation
-                                               used in a cyclic manner.
-                                       </li>
-                                       <li>Murmur32x86Iterative - Murmur3 
32-bit x86 hash implementation
-                                               used in an iterative manner.
-                                       </li>
-                                       <li>ObjectsHashIterative - Java
-                                               Objects hash used in an 
iterative
-                                               manner.
-                                       </li>
-                               </ul>
-
-                               <p>
-                                       Additional
-                                       <code>HashFunction</code>
-                                       implementations
-                                       can be created to support the hashing 
strategies
-                                       often found in
-                                       bioinformatics
-                                       where k-mer sequences are indexed into
-                                       Bloom filters. In some cases the
-                                       data
-                                       about the k-mer sequence is
-                                       used as the seed value for the hash.
-                               </p>
-
-
-                       </subsection>
-
-                       <subsection id="S2.SS2" name="2.2 Exploring the Shape">
-
-                               <p>
-                                       As noted earlier the
-                                       <code>Shape</code>
-                                       describes the shape of the resulting 
Bloom filter. The constructor
-                                       for
-                                       <code>Shape</code>
-                                       always takes a
-                                       <code>HashFunctionIdentity</code>
-                                       as well
-                                       as many of the other Shape properties 
necessary to solve
-                                       \(p = (1- e^{-kn/m})^k\). These 
combinations are:
-                               </p>
-
-                               <ul>
-                                       <li>
-                                               The probability of collisions (
-                                               \(p\)
-                                               ), the
-                                               number of bits (
-                                               \(m\)
-                                               ), and the number of hash 
functions (
-                                               \(k\)
-                                               ).
-                                       </li>
-                                       <li>
-                                               The number of elements (
-                                               \(n\)
-                                               ), and
-                                               the probability of collisions (
-                                               \(p\)
-                                               ).
-                                       </li>
-                                       <li>
-                                               The number of elements (
-                                               \(n\)
-                                               ), number of bits (
-                                               \(m\)
-                                               ), and
-                                               number of hash functions (
-                                               \(k\)
-                                               ).
-                                       </li>
-                               </ul>
-                       </subsection>
-
-                       <subsection id="S2.SS3" name="2.3 Exploring the Hasher">
-
-                               <p>
-                                       The
-                                       <code>Hasher</code>
-                                       converts converts one or more
-                                       <code>byte[]</code>
-                                       into an
-                                       <code>iterator&lt;int&gt;</code>
-                                       representing the bits that
-                                       are enabled in the Bloom filter. The 
bits
-                                       indexes are in the range [0,
-                                       \(m\)
-                                       -1]. The Hasher is, in a sense, a 
primitive Bloom filter. There are
-                                       two implementations of Hasher provided:
-                               </p>
-                               <ul>
-                                       <li>DynamicHasher - calls the 
HashFunction as needed to generate
-                                               the list of bits.
-                                       </li>
-                                       <li>StaticHasher - contains a list of 
bits that were enabled
-                                               for a
-                                               specific Shape and simply 
replays them. This method is faster
-                                               than
-                                               the Dynamic when the Hasher is 
reused but can only be used for
-                                               filters with
-                                               the same Shape as the Hasher.
-                                       </li>
-                               </ul>
-
-                               <p>
-                                       Other implementations of
-                                       <code>Hasher</code>
-                                       are possible. Consider
-                                       the
-                                       <code>Hasher</code>
-                                       as the class that converts one or more
-                                       <code>byte[]</code>
-                                       into Bloom filters of any shape.
-                               </p>
-
-                       </subsection>
-
-                       <subsection id="S2.SS4"
-                               name="2.4 Exploring the BloomFilter">
-
-                               <p>
-                                       The
-                                       <code>BloomFilter</code>
-                                       is an interface that describes the 
functionality
-                                       of the Bloom
-                                       filter. Along with the
-                                       <code>AbstractBloomFilter</code>
-                                       there
-                                       are three (3) concrete implementations 
of
-                                       <code>BloomFilter</code>
-                                       provided:
-                               </p>
-
-                               <ul>
-                                       <li>
-                                               <code>BitSetBloomFilter</code>
-                                               - Stores the enabled bits in a
-                                               <code>BitSet</code>
-                                               object.
-                                       </li>
-                                       <li>
-                                               <code>HasherBloomFilter</code>
-                                               - Uses the
-                                               <code>Hasher</code>
-                                               implementation as the storage 
for the bits. This is not efficient
-                                               but often convenient if few 
Bloom filter operations are required.
-                                       </li>
-                                       <li>
-                                               <code>CountingBloomFilter</code>
-                                               - Uses a sparse list (
-                                               <code>HashMap</code>
-                                               ) of counts for the enabled 
bits. CountingBloomFilters
-                                               are an
-                                               extension to the normal Bloom 
filter and enable removal of
-                                               Bloom
-                                               filters from the set.
-                                       </li>
-                               </ul>
-
-                       </subsection>
-
-                       <subsection id="S2.SS5"
-                               name="2.5 Why Hasher, Shape and BloomFilter?">
-
-                               <p>
-                                       Other libraries tend to merge the 
functionality
-                                       of the
-                                       <code>Hasher</code>
-                                       ,
-                                       <code>Shape</code>
-                                       , and
-                                       <code>BloomFilter</code>
-                                       into one or two classes. By separating 
the responsibility we have
-                                       developed classes that focus on a 
single task. This permits a wider
-                                       range
-                                       of uses for the classes, for example:
-                               </p>
-
-                               <ul>
-                                       <li>
-                                               It
-                                               becomes possible to create 
client/server style applications
-                                               where the
-                                               client
-                                               creates a
-                                               <code>Hasher</code>
-                                               , sends it across the wire, and 
the server
-                                               then builds Bloom
-                                               filters of various
-                                               <code>Shape</code>
-                                               depending on the
-                                               requested functionality.
-                                       </li>
-                                       <li>
-                                               The library can support
-                                               Bloom filters that represent 
the enabled
-                                               bits in different ways in
-                                               order
-                                               to support specific 
functionality.
-                                               For example some applications
-                                               utilize
-                                               Bloom filters that are very
-                                               large (large
-                                               \(m\)
-                                               value) but have few bits
-                                               enabled (low
-                                               \(t\)
-                                               and
-                                               \(n\)
-                                               values). When storing a large 
number of these
-                                               it becomes important
-                                               to have bit vector compression. 
However, for
-                                               Bloom filters
-                                               with a
-                                               higher number of enabled bits 
the compression is just excess
-                                               overhead.
-                                               Applications can swap out the
-                                               <code>BloomFilter</code>
-                                               implementation without
-                                               impacting the
-                                               <code>Hasher</code>
-                                               or
-                                               <code>Shape</code>
-                                               selections.
-                                       </li>
-
-                                       <li>The library can support 
implementations of attenuated,
-                                               spectral
-                                               and other exotic Bloom filter 
varieties.
-                                       </li>
-                               </ul>
-
-                       </subsection>
-
-                       <subsection id="S2.SS6"
-                               name="2.6 Build your own BloomFilter">
-
-                               <p>
-                                       The
-                                       <code>AbstractBloomFilter</code>
-                                       requires
-                                       implementation of four methods:
-                               </p>
-
-                               <ul>
-                                       <li>getBits()
-                                               - Get the enabled bits as an 
array of long values.
-                                       </li>
-
-                                       <li>getHasher() - Creates a 
StaticHasher that returns the indexes
-                                               of the enabled
-                                               bits and contains the shame 
Shape as the Bloom
-                                               filter.
-                                       </li>
-
-                                       <li>merge(BloomFilter) - Merges enabled 
bits from the other Bloom
-                                               filter into
-                                               this Bloom filter.
-                                       </li>
-
-                                       <li>merge(Hasher) - Enables
-                                               the bits specified by the 
Hasher in this
-                                               Bloom filter.
-                                       </li>
-                               </ul>
-
-                               <p>
-                                       Other methods in
-                                       <code>AbstractBloomFilter</code>
-                                       should be overridden
-                                       to take advantage of the bit encoding 
the new
-                                       implementation
-                                       provides.
-                               </p>
-
-                       </subsection>
-               </section>
-
-               <section id="S3" name="3 Examples">
-
-                       <subsection id="S3.SS1"
-                               name="3.1 How to use a Bloom filter as a 
gatekeeper">
-                               <p>
-                                       A ”gatekeeper” is a Bloom filter that 
determines if a longer
-                                       running
-                                       method should be executed. In this 
example we create a Bloom
-                                       filter from
-                                       a list of bad IDs. We then check a 
candidate ID against
-                                       the Bloom
-                                       filter.
-                                       If the ID is in the Bloom filter we 
make the
-                                       expensive call to a
-                                       remote server
-                                       to verify the ID. For purposes of
-                                       illustration
-                                       <code>server.getBlackList()</code>
-                                       returns a list of IDs as
-                                       <code>Strings</code>
-                                       , and
-                                       <code>server.checkBlackList(
-                                               String id )
-                                       </code>
-                                       verifies that the ID is on the black 
list.
-                               </p>
-
-
-                               <source><![CDATA[
-/* setup environment */
-// select the hash function
-HashFunction hFunc = new Murmur128x86Cyclic();
-// 10000 elements, 1/million probability of collision
-Shape shape = new Shape( hFunc , 10000, 1.0/1000000);
-
-/* build the gatekeeper */
-BloomFilter gateKeeper = new BitSetBloomFilter( shape );
-for ( String id : server.getBlackList() ) {
-    DynamicHasher hasher = new DynamicHasher.Builder( hFunc ).with( id 
).build();
-    gateKeeper.merge( hasher );
-}
-
-/* use the gatekeeper */
-String id = // perhaps entered by user during login
-DynamicHasher hasher = new DynamicHasher.Builder( hFunc ).with( id ).build();
-boolean inBlackList = gatekeeper.contains( hasher );
-if ( inBlackList )
-{
-    /* it might be in the blacklist, so execute the server call to find out. */
-    inBlackList = server.checkBlackList( id );
-}
-// inBlackList is now true if the ID is in the black list, false otherwise.
-                               ]]></source>
-
-
-
-
-                       </subsection>
-
-                       <subsection id="S3.SS2" name="3.2 Data sharding">
-
-                               <p>In this example we create a Bloom filter for
-                                       each data item we are
-                                       going to save. When the data is 
inserted in the
-                                       storage
-                                       each
-                                       gatekeeper Bloom filter is checked to 
see which is ”closest” to
-                                       the
-                                       filter being inserted and the object is 
inserted in the associated
-                                       container.
-                                       When searching for data each gatekeeper 
Bloom filter is
-                                       checked for the
-                                       presence
-                                       of the data Bloom filter if it is in the
-                                       gatekeeper then the
-                                       associated container
-                                       is searched for the object.
-                               </p>
-
-
-                               <source><![CDATA[
-/* setup environment */
-
-// Set some limits
-int MAX_SHARD_SIZE = 100000;
-
-// select the hash function
-HashFunction hFunc = new Murmur128x86Cyclic();
-
-// 100000 elements, 1/million probability of collision
-Shape shape = new Shape( hFunc , MAX_SHARD_SIZE , 1.0/1000000);
-
-// create the sharding container
-Map < CountingBloomFilter , Map < String , T >> shards = new HashMap <>();
-
-// populate the minimum shards (5)
-for ( int i =0; i <5; i ++) {
-    shards.put( new CountingBloomFilter( shape ), new HashMap < String , T 
>>() );
-}
-
-
-/* insert a data object */
-
-T data = // get the data object.
-DynamicHasher hasher = new DynamicHasher.Builder( hFunc ).with( data.getId() 
).build();
-BloomFilter dataBloom = new BitSetBloomFilter( hasher , shape );
-int distance = Integer.MAX_VALUE ;
-BloomFilter collectionFilter = null
-for ( BloomFilter candidate : shards.keySet() ) {
-    if ( SetOperations.hammingDistance( candidate , dataBloom ) < distance ) {
-        distance = SetOperations.hammingDistance( candidate , dataBloom );
-        collectionFilter = candidate ;
-    }
-}
-if ( collectionFilter == null ) {
-    // no available collection so create one.
-    collectionFilter = new CountingBloomFilter( shape );
-    shards.put( collectionFilter , new HashMap < String , T >() );
-}
-Map < String , T > collection = shards.get( collectionFilter );
-if ( collection.size() >= MAX_SHARD_SIZE ) {
-    // no space in collection so create a new one.
-    collectionFilter = new CountingBloomFilter( shape );
-    collection = new HashMap < String , T >();
-    shards.put( collectionFilter , collection );
-}
-collectionFilter.merge( dataBloom );
-collections.put( data.getId(), data );
-
-
-/* search for a data object */
-
-T result = null ;
-String id = // get the data ID.
-DynamicHasher hasher = new DynamicHasher.Builder( hFunc ).with( id ).build();
-BloomFilter dataBloom = new BitSetBloomFilter( hasher , shape );
-for ( Map.Entry < BloomFilter , Collection < T >> entry : shards ) {
-    if ( shard.key().contains( dataBloom )) {
-        T candidate = shard.value().get( id );
-        if ( candidate != null ) {
-            result = candidate ;
-        }
-    }
-}
-// result is either null or contains the desired value.
-               ]]></source>
-                       </subsection>
-                       <subsection id="S3.SS3" name="3.3 Future directions">
-
-                               <p>
-                                       The separation and definition of the
-                                       <code>HashFunctionIdentity</code>
-                                       ,
-                                       <code>HashFunction</code>
-                                       , and
-                                       <code>Hasher</code>
-                                       classes as well as the development of 
the Cyclic hash function,
-                                       means that it is possible to construct 
hashers that can be sent
-                                       across the
-                                       wire. This opens up the possibility of 
sending query
-                                       requests without
-                                       exposing
-                                       the values or properties being queried 
and
-                                       may provide the ability to
-                                       query
-                                       encrypted data.&#160;<cite>[<a 
href='#Bellovin'>Bellovin</a>,
-                                               <a href='#Goh'>Goh</a>,
-                                               <a href='#Tajima'>Tajima</a>,
-                                               <a 
href='#Warren1'>Warren1</a>]</cite>
-                               </p>
-
-                               <p>
-                                       Implementations of multidimensional
-                                       Bloom filters&#160;<cite>[<a 
href='#Crainiceanu'>Crainiceanu</a>]</cite>
-                                       (AKA Bloom filter
-                                       indexes) may also be enhanced as the 
Hasher is not
-                                       bound to a shape. So
-                                       multidimensional
-                                       filters can us a Bloom filter
-                                       of a different Shape from the ones being
-                                       stored
-                                       to reduce the number
-                                       of filters it actually checks 
for.&#160;<cite>[<a href='#Warren2'>Warren2</a>]</cite>
-                               </p>
-
-                               <p>Implementations of attenuated,
-                                       spectral and other exotic Bloom
-                                       filter varieties is possible using the
-                                       commons-collection
-                                       library.
-                               </p>
-                       </subsection>
-               </section>
-               <section id="S4" name="4 References">
-                       <div class="references">
-
-                               <div id="Bellovin">
-                                       <span class="refName">[Bellovin]</span>
-                                       <span class="refBody">
-                                               Steven M Bellovin and William R 
Cheswick”. Privacy-Enchanced
-                                               Searched
-                                               Using Encrypted Bloom Filters”. 
2004. url:
-                                               <a
-                                                       
href="https://mice.cs.columbia.edu/getTechreport.php?techreportID=483";>https://mice.cs.columbia.edu/getTechreport.php?techreportID=483
-                                               </a>
-                                               (Accessed 26 Jan 2020).
-                                       </span>
-                               </div>
-
-                               <div id="Bloom">
-                                       <span class="refName">[Bloom]</span>
-                                       <span class="refBody">
-                                               Burton H. Bloom. “Space/Time 
Trade-offs in Hash
-                                               Coding with
-                                               Allowable
-                                               Errors”. In: Communications of 
the ACM 13.7
-                                               (July 1970), pp. 422–426.
-                                       </span>
-                               </div>
-
-                               <div id="Crainiceanu">
-                                       <span 
class="refName">[Crainiceanu]</span>
-                                       <span class="refBody">
-                                               Adina Crainiceanu and Daniel 
Lemire. Bloofi: Multidimensional
-                                               Bloom
-                                               Filters. Accessed on 
11-Jan-2020. 2016. url:
-                                               <a 
href="https://arxiv.org/pdf/1501.01941.pdf";>https://arxiv.org/pdf/1501.01941.pdf</a>
-                                               (Accessed 26 Jan 2020).
-                                       </span>
-                               </div>
-                               <div id="Goh">
-                                       <span class="refName">[Goh]</span>
-                                       <span class="refBody">
-                                               Eu-Jin Goh. Secure Indexes. 
2004. url:
-                                               <a
-                                                       
href="https://crypto.stanford.edu/~eujin/papers/secureindex/secureindex.pdf";>https://crypto.stanford.edu/~eujin/papers/secureindex/secureindex.pdf
-                                               </a>
-                                               (Accessed 26 Jan 2020).
-                                       </span>
-                               </div>
-
-                               <div id="Hurst">
-                                       <span class="refName">[Hurst]</span>
-                                       <span class="refBody">
-                                               Thomas Hurst. Bloom filter 
calculator. 2019.
-                                               url:
-                                               <a 
href="https://hur.st/bloomfilter/";>https://hur.st/bloomfilter/</a>
-                                               (Accessed 26 Jan 2020).
-                                       </span>
-                               </div>
-
-                               <div id="Kirsch">
-                                       <span class="refName">[Kirsch]</span>
-                                       <span class="refBody">
-                                               Adam Kirsch and Michael 
Mitzenmacher. Building a Better Bloom
-                                               Filter.
-                                               2008. url:
-                                               <a
-                                                       
href="https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf
-                                               </a>
-                                               (Accessed 26 Jan 2020).
-                                       </span>
-                               </div>
-
-                               <div id="Mitzenmacher">
-                                       <span 
class="refName">[Mitzenmacher]</span>
-                                       <span class="refBody">
-                                               Michael Mitzenmacher and Eli 
Upfal. Probability
-                                               and computing:
-                                               Randomized algorithms and 
probabilistic analysis.
-                                               Cambridge, Cambridgeshire,
-                                               UK: Cambridge University Press, 
2005,
-                                               pp. 109–111, 308. isbn:
-                                               9780521835404.
-                                       </span>
-                               </div>
-
-                               <div id="Tajima">
-                                       <span class="refName">[Tajima]</span>
-                                       <span class="refBody">
-                                               Arisa Tajima, Hiroki Sato, and 
Hayato Yamana. Privacy-Preserving
-                                               Join
-                                               Processing over outsourced 
private datasets with Fully
-                                               Homomorphic En-
-                                               cryption and Bloom Filters. 
2018. url:
-                                               <a 
href="https://db-event.jpn.org/deim2018/data/papers/201.pdf";>https://db-event.jpn.org/deim2018/data/papers/201.pdf
-                                               </a>
-                                               (Accessed 26 Jan 2020).
-                                       </span>
-                               </div>
-
-                               <div id="Warren1">
-                                       <span class="refName">[Warren1]</span>
-                                       <span class="refBody">
-                                               Claude N. Warren Jr.
-                                               Bloom Encrypted Index. 2019. 
url:
-                                               <a 
href="https://github.com/Claudenw/BloomEncryptedIndex";>
-                                                       
https://github.com/Claudenw/BloomEncryptedIndex
-                                               </a>
-                                               (Accessed 26 Jan 2020).
-                                       </span>
-                               </div>
-
-                               <div id="Warren2">
-                                       <span class="refName">[Warren2]</span>
-                                       <span class="refBody">
-                                               Claude N. Warren Jr.
-                                               Multidimentional Bloom Filter 
Implementations.
-                                               2019. url:
-                                               <a 
href="https://github.com/Claudenw/MultidimentionalBloom";>https://github.com/Claudenw/MultidimentionalBloom</a>
-                                               (Accessed 26 Jan 2020).
-                                       </span>
-                               </div>
-                       </div>
-               </section>
-
-       </body>
-</document>

(commons-collections) branch master updated: initial changes. (#507)

Reply via email to