Ah, yes very helpful thanks Paul. I knew there would be something that I broke :). I will need to go back and consider the use cases and see which will and will not require exact matches. Thanks again!
I have never heard of DisMax so this is new to me as well but have found some posts about it. I am sure this will generate other questions :) Again thanks. On Mon, May 23, 2011 at 3:56 PM, Paul Libbrecht <p...@hoplahup.net> wrote: > Jamie, > > the problem with that is that you cannot do exact matching anymore. > For this reason, it is good style to have two fields, to use a query > expander such as dismax (prefer exact matches, and less phonetic matches), > and to only use that when you sort by score. > > hope it helps > > paul > > > Le 23 mai 2011 à 21:43, Jamie Johnson a écrit : > > > I am new to solr and am trying to determine the best way to take the text > > field type (the one in the example) and add phonetic searches to it. > > Currently I have done the following: > > > > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100" > > autoGeneratePhraseQueries="true"> > > <analyzer type="index"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.DoubleMetaphoneFilterFactory"/> > > <!-- in this example, we will only use synonyms at query time > > <filter class="solr.SynonymFilterFactory" > > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > > --> > > <!-- Case insensitive stop word removal. > > add enablePositionIncrements=true in both the index and query > > analyzers to leave a 'gap' for more accurate phrase queries. > > --> > > <filter class="solr.StopFilterFactory" > > ignoreCase="true" > > words="stopwords.txt" > > enablePositionIncrements="true" > > /> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > > protected="protwords.txt"/> > > <filter class="solr.PorterStemFilterFactory"/> > > > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.DoubleMetaphoneFilterFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.StopFilterFactory" > > ignoreCase="true" > > words="stopwords.txt" > > enablePositionIncrements="true" > > /> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > > protected="protwords.txt"/> > > <filter class="solr.PorterStemFilterFactory"/> > > </analyzer> > > </fieldType> > > > > which seems to work. Is this appropriate or is there a better way of > doing > > this? I had previously defined a custom phonetic field but that would > mean > > for each field which I wanted to support a phonetic style search I would > > need to add an additional field. Adding it to the text seemed much more > > elegant since it would work for all text fields. Is there a reason not > to > > do this (i.e. performance, index size, etc)? Any insight/guidance would > be > > greatly appreciated. > > > > Also if anyone could point me to what exactly filters do (docs) I would > > appreciate it. My assumption is that they inject additional tokens based > on > > the specific filter class. Am I correct? > >