Hi Michael, I think the problem you're seeing is that no document contains "reebox", and you've used the "explicit" syntax (source=>dest) instead of the "equivalent" syntax (term,term,term).
I'm guessing that if you convert your synonym file from: reebox => Reebok to: reebox, Reebok and leave expand=true, and then reindex, everything will work: your indexed documents containing "Reebok" will be made to include "reebox", so queries for "reebox" will produce hits on those documents. Steve > -----Original Message----- > From: mtdowling [mailto:mtdowl...@gmail.com] > Sent: Tuesday, August 17, 2010 2:24 PM > To: solr-user@lucene.apache.org > Subject: Solr synonyms format query time vs index time > > > My company recently started using Solr for site search and autocomplete. > It's working great, but we're running into a problem with synonyms. We > are > generating a synonyms.txt file from a database table and using that > synonyms.txt file at index time on a text type field. Here's an excerpt > from the synonyms file: > > reebox => Reebok > shinguards => Shin Guards > shirt => T-Shirt,Shirt > shmak => Shmack > shocks => shox > skateboard => Skate > skateboarding => Skate > skater => Skate > skates => Skate > skating => Skate > skirt => Dresses > > When we do a search for reebox, we want the term to be mapped to "Reebok" > through explicit mapping, but for some reason this isn't happening. We do > have multi-word synonyms, and from what I've read on the mailing list, > those > only work at index time, so we are only using the synonym filter factory > at > index time: > > <fieldType name="search" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="0" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="0" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > Here's more relevant schema.xml configs: > > <field name="mashup" type="search" indexed="true" stored="false" > multiValued="true"/> > <copyField source="keywords" dest="mashup"/> > <copyField source="category" dest="mashup"/> > <copyField source="name" dest="mashup"/> > <copyField source="brand" dest="mashup"/> > <copyField source="description_overview" dest="mashup"/> > <copyField source="sku" dest="mashup"/> > <!-- other copy fields... --> > > The output of the query analyzer shows the following: > > Query Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > term position 1 > term text reebox > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, > ignoreCase=true} > term position 1 > term text reebox > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.WordDelimiterFilterFactory > {generateNumberParts=0, > catenateWords=1, generateWordParts=0, catenateAll=0, catenateNumbers=1} > term position 1 > term text reebox > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.LowerCaseFilterFactory {} > term position 1 > term text reebox > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.SnowballPorterFilterFactory > {protected=protwords.txt, language=English} > term position 1 > term text reebox > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} > term position 1 > term text reebox > term type word > source start,end 0,6 > payload > > So "reebox" is never being converted to "Reebok". I thought that if I had > index time synonyms with expansion configured that I wouldn't need query > time synonyms. Maybe my dynamic synonyms generation isn't formatted > correctly for my desired result? > > If I use the same synonyms.txt file and use the index analyzer, reebox is > mapped to Reebok and then indexed correctly: > > Index Analyzer > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > term position 1 > term text reebox > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, > expand=true, ignoreCase=true} > term position 1 > term text Reebok > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, > ignoreCase=true} > term position 1 > term text Reebok > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.WordDelimiterFilterFactory > {generateNumberParts=0, > catenateWords=1, generateWordParts=0, catenateAll=0, catenateNumbers=1} > term position 1 > term text Reebok > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.LowerCaseFilterFactory {} > term position 1 > term text reebok > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.SnowballPorterFilterFactory > {protected=protwords.txt, language=English} > term position 1 > term text reebok > term type word > source start,end 0,6 > payload > org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} > term position 1 > term text reebok > term type word > source start,end 0,6 > payload > > > Should I use equivalent mapping instead of explicit mapping if I'm only > using index-time synonyms? Or should I turn query time synonyms on for my > search field? > > Thanks, > Michael