Grant, the Solr wiki recommends doing expansion at index time and gives reasons:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

Query-time doesn't work for multi-word expansion.  For everyone's convenience, 
I'll quote the remainder of the problems:


Even when you aren't worried about multi-word synonyms, idf differences still 
make index time synonyms a good idea. Consider the following scenario:

    *  An index with a "text" field, which at query time uses the SynonymFilter 
with the synonym TV, Televesion and expand="true"
    *  Many thousands of documents containing the term "text:TV"
    *  A few hundred documents containing the term "text:Television"

A query for text:TV will expand into (text:TV text:Television) and the lower 
docFreq for text:Television will give the documents that match "Television" a 
much higher score then docs that match "TV" comparably -- which may be somewhat 
counter intuitive to the client. Index time expansion (or reduction) will 
result in the same idf for all documents regardless of which term the original 
text contained.

~ David Smiley

On 12/30/08 4:33 PM, "Grant Ingersoll" <gsing...@apache.org> wrote:



On Dec 30, 2008, at 11:05 AM, Alexander Ramos Jardim wrote:

> Hey Grant,
>
> Thanks for the info!
>
> 2008/12/30 Grant Ingersoll <gsing...@apache.org>
>
>> I'd probably write a new TokenFilter that was aware of the reload
>> policy
>> (in a generic way) such that I didn't have to go through a whole
>> core reload
>> every time.  Are you just using them during query time or also during
>> indexing?
>>
>
> I am using it at indexing time.

I think that is a bit more problematic.  How do you deal with new
documents having the new synonyms while old docs don't?

Any particular reason you use syns at indexing and not search?  Not
saying there aren't reasons to do it, just query side usually works
better for this very reason.

Reply via email to