solr/admin/analysis.jsp lets you see how this works. Use the index boxes. Lance
On Tue, Aug 17, 2010 at 11:56 AM, Steven A Rowe <sar...@syr.edu> wrote: > Hi Michael, > > I think the problem you're seeing is that no document contains "reebox", and > you've used the "explicit" syntax (source=>dest) instead of the "equivalent" > syntax (term,term,term). > > I'm guessing that if you convert your synonym file from: > > reebox => Reebok > > to: > > reebox, Reebok > > and leave expand=true, and then reindex, everything will work: your indexed > documents containing "Reebok" will be made to include "reebox", so queries > for "reebox" will produce hits on those documents. > > Steve > >> -----Original Message----- >> From: mtdowling [mailto:mtdowl...@gmail.com] >> Sent: Tuesday, August 17, 2010 2:24 PM >> To: solr-user@lucene.apache.org >> Subject: Solr synonyms format query time vs index time >> >> >> My company recently started using Solr for site search and autocomplete. >> It's working great, but we're running into a problem with synonyms. We >> are >> generating a synonyms.txt file from a database table and using that >> synonyms.txt file at index time on a text type field. Here's an excerpt >> from the synonyms file: >> >> reebox => Reebok >> shinguards => Shin Guards >> shirt => T-Shirt,Shirt >> shmak => Shmack >> shocks => shox >> skateboard => Skate >> skateboarding => Skate >> skater => Skate >> skates => Skate >> skating => Skate >> skirt => Dresses >> >> When we do a search for reebox, we want the term to be mapped to "Reebok" >> through explicit mapping, but for some reason this isn't happening. We do >> have multi-word synonyms, and from what I've read on the mailing list, >> those >> only work at index time, so we are only using the synonym filter factory >> at >> index time: >> >> <fieldType name="search" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.SynonymFilterFactory" >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt"/> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="0" generateNumberParts="0" catenateWords="1" >> catenateNumbers="1" catenateAll="0"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory" >> language="English" protected="protwords.txt"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt"/> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="0" generateNumberParts="0" catenateWords="1" >> catenateNumbers="1" catenateAll="0"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory" >> language="English" protected="protwords.txt"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> </fieldType> >> >> Here's more relevant schema.xml configs: >> >> <field name="mashup" type="search" indexed="true" stored="false" >> multiValued="true"/> >> <copyField source="keywords" dest="mashup"/> >> <copyField source="category" dest="mashup"/> >> <copyField source="name" dest="mashup"/> >> <copyField source="brand" dest="mashup"/> >> <copyField source="description_overview" dest="mashup"/> >> <copyField source="sku" dest="mashup"/> >> <!-- other copy fields... --> >> >> The output of the query analyzer shows the following: >> >> Query Analyzer >> org.apache.solr.analysis.WhitespaceTokenizerFactory {} >> term position 1 >> term text reebox >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, >> ignoreCase=true} >> term position 1 >> term text reebox >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.WordDelimiterFilterFactory >> {generateNumberParts=0, >> catenateWords=1, generateWordParts=0, catenateAll=0, catenateNumbers=1} >> term position 1 >> term text reebox >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.LowerCaseFilterFactory {} >> term position 1 >> term text reebox >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.SnowballPorterFilterFactory >> {protected=protwords.txt, language=English} >> term position 1 >> term text reebox >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} >> term position 1 >> term text reebox >> term type word >> source start,end 0,6 >> payload >> >> So "reebox" is never being converted to "Reebok". I thought that if I had >> index time synonyms with expansion configured that I wouldn't need query >> time synonyms. Maybe my dynamic synonyms generation isn't formatted >> correctly for my desired result? >> >> If I use the same synonyms.txt file and use the index analyzer, reebox is >> mapped to Reebok and then indexed correctly: >> >> Index Analyzer >> org.apache.solr.analysis.WhitespaceTokenizerFactory {} >> term position 1 >> term text reebox >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, >> expand=true, ignoreCase=true} >> term position 1 >> term text Reebok >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, >> ignoreCase=true} >> term position 1 >> term text Reebok >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.WordDelimiterFilterFactory >> {generateNumberParts=0, >> catenateWords=1, generateWordParts=0, catenateAll=0, catenateNumbers=1} >> term position 1 >> term text Reebok >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.LowerCaseFilterFactory {} >> term position 1 >> term text reebok >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.SnowballPorterFilterFactory >> {protected=protwords.txt, language=English} >> term position 1 >> term text reebok >> term type word >> source start,end 0,6 >> payload >> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} >> term position 1 >> term text reebok >> term type word >> source start,end 0,6 >> payload >> >> >> Should I use equivalent mapping instead of explicit mapping if I'm only >> using index-time synonyms? Or should I turn query time synonyms on for my >> search field? >> >> Thanks, >> Michael > -- Lance Norskog goks...@gmail.com