People, Thanks for all the replies,
The business requirement I have is to update the synonyms list every time someone from the sales department establishes a new dictionary (they do that a couple times in a week) I must add the new synonyms to the index. I think I will stick with query time synonyms only for Grant's reason. At least bad is better than worse. 2008/12/31 Grant Ingersoll <gsing...@apache.org> > > On Dec 30, 2008, at 4:38 PM, Smiley, David W. wrote: > > Grant, the Solr wiki recommends doing expansion at index time and gives >> reasons: >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46 >> >> > I personally think "recommends" is too strong of a word, but the points are > valid reasons to do index time synonyms. In Alexandar's case, I think > index-time is a bit more problematic, since he is frequently updating the > synonym list, meaning he would have to reindex every time, otherwise his > stats are going to be even more skewed. > > As for multi-word expansions, the query parser can be fixed or an alternate > one used. > > > > Query-time doesn't work for multi-word expansion. For everyone's >> convenience, I'll quote the remainder of the problems: >> >> >> Even when you aren't worried about multi-word synonyms, idf differences >> still make index time synonyms a good idea. Consider the following scenario: >> >> * An index with a "text" field, which at query time uses the >> SynonymFilter with the synonym TV, Televesion and expand="true" >> * Many thousands of documents containing the term "text:TV" >> * A few hundred documents containing the term "text:Television" >> >> A query for text:TV will expand into (text:TV text:Television) and the >> lower docFreq for text:Television will give the documents that match >> "Television" a much higher score then docs that match "TV" comparably -- >> which may be somewhat counter intuitive to the client. Index time expansion >> (or reduction) will result in the same idf for all documents regardless of >> which term the original text contained. >> >> ~ David Smiley >> >> On 12/30/08 4:33 PM, "Grant Ingersoll" <gsing...@apache.org> wrote: >> >> >> >> On Dec 30, 2008, at 11:05 AM, Alexander Ramos Jardim wrote: >> >> Hey Grant, >>> >>> Thanks for the info! >>> >>> 2008/12/30 Grant Ingersoll <gsing...@apache.org> >>> >>> I'd probably write a new TokenFilter that was aware of the reload >>>> policy >>>> (in a generic way) such that I didn't have to go through a whole >>>> core reload >>>> every time. Are you just using them during query time or also during >>>> indexing? >>>> >>>> >>> I am using it at indexing time. >>> >> >> I think that is a bit more problematic. How do you deal with new >> documents having the new synonyms while old docs don't? >> >> Any particular reason you use syns at indexing and not search? Not >> saying there aren't reasons to do it, just query side usually works >> better for this very reason. >> >> > -------------------------- > Grant Ingersoll > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > -- Alexander Ramos Jardim