Say, there is an index of business names (fairly short text snippets), containing: Walmart, Walmart Bakery and Mini Mart. And say we need a query for 'wal mart' to match all three, with an appropriate ranking order. Also need 'walmart', 'walmart bakery' and 'bakery' to find the right things in the right order.
Here is the solution we came up with: 1. edismax query parser (we don't need it for this, but do for a number of other requirements) 2. On the index, apply ShingleFilter, then remove word separators in the shingles, so that "walmart bakery" is indexed as "walmart", "bakery", "walmartbakery" Schema for this index looks like this: <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="'+" replacement=""/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true"/> <filter class="solr.PatternReplaceFilterFactory" pattern="\W+" replacement=""/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> 3. Before sending the original query to Solr, modify it by adding a whitespace-stripped version of it. Thus, 'wal mart' becomes 'wal mart walmart' and walmart bakery becomes 'walmart bakery walmartbakery'. Don't modify the query if it only has one word in it, or contains any edismax syntax (double quotes; pluses and minuses in the beginning of a query or after whitespace). 4. ... profit. The reason we have to shingle the query before Solr is that edismax parser treats 'wal mart' as two queries - 'wal' OR 'mart', so applying the ShingleFilter in the query analyzer doesn't do anything. This works, but feels a little dirty. Is there a more elegant way to solve this problem? -- Alex Verkhovsky