Re: Including phonetic search in text field

Paul Libbrecht Mon, 23 May 2011 12:57:29 -0700

Jamie,

the problem with that is that you cannot do exact matching anymore.
For this reason, it is good style to have two fields, to use a query expander 
such as dismax (prefer exact matches, and less phonetic matches), and to only 
use that when you sort by score.


hope it helps

paul


Le 23 mai 2011 à 21:43, Jamie Johnson a écrit :

> I am new to solr and am trying to determine the best way to take the text
> field type (the one in the example) and add phonetic searches to it.
> Currently I have done the following:
> 
>    <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.DoubleMetaphoneFilterFactory"/>
>        <!-- in this example, we will only use synonyms at query time
>        <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>        -->
>        <!-- Case insensitive stop word removal.
>          add enablePositionIncrements=true in both the index and query
>          analyzers to leave a 'gap' for more accurate phrase queries.
>        -->
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
> 
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.DoubleMetaphoneFilterFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>    </fieldType>
> 
> which seems to work.  Is this appropriate or is there a better way of doing
> this?  I had previously defined a custom phonetic field but that would mean
> for each field which I wanted to support a phonetic style search I would
> need to add an additional field.  Adding it to the text seemed much more
> elegant since it would work for all text fields.  Is there a reason not to
> do this (i.e. performance, index size, etc)?  Any insight/guidance would be
> greatly appreciated.
> 
> Also if anyone could point me to what exactly filters do (docs) I would
> appreciate it.  My assumption is that they inject additional tokens based on
> the specific filter class.  Am I correct?

Re: Including phonetic search in text field

Reply via email to