If you are trying to serve results as users are typing, then you can use
EdgeNGramFilter (see
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
).

Let's say you configure your field like this, as shown in the Solr wiki:

<fieldType name="text_general_edge_ngram" class="solr.TextField"
positionIncrementGap="100">
   <analyzer type="index">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="15" side="front"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
   </analyzer>
</fieldType>

Then this is what happens at index time for your tokens:

David ---> | LowerCaseTokenizerFactory | ---> david ---> |
EdgeNGramFilterFactory
| ---> da dav davi david
Dave ---> | LowerCaseTokenizerFactory | ---> dave ---> | EdgeNGramFilterFactory
| ---> da dav dave

And at query time, when your user enters 'Dav' it will match both those
tokens. Note that the moment your user starts typing more, say 'davi' it
won't match 'Dave' since you are doing edge N gramming only at index time
and not at query time. You can also do edge N gramming at query time if you
want 'Dave' to match 'David', probably keeping a larger minGramSize (in
this case 3) to avoid noise (like say 'Dave' matching 'Dana' though with a
lower score), but it will be expensive to do n-gramming at query time.




On Fri, Feb 28, 2014 at 3:22 PM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:

> Hi,
>
> We have name searches on Solr for millions of documents. User may search
> like "Morrison Dave" or other may search like "Morrison David".  What's the
> best way to handle that both brings similar results. Adding Synonym is the
> option we are using right.
>
> But we may need to add around such 50,000+ synonyms for different names
> for each specific name there can be couple of synonyms like for Richard, it
> can be Rich, Rick, Richie etc.
>
> Any experience adding so many synonyms or any other thoughts? Stemming may
> help in few situations but not like Dave and David.
>
> Thanks,
> Susheel
>

Reply via email to