On Thu, Jul 22, 2010 at 4:01 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote: > I think the Synonym filter should actually do exactly what you want, no? > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory > > Hmm, maybe not exactly what you want as you describe it. It comes close, > maybe good enough. Do you REALLY need to support "I Business M" or "I B > Machines" as source/query? Your spec suggests yes, synonym filter won't > easily do that.But if you just want "International Business Machines" == > "IBM", keeping positions intact for subsequent terms, I think synonym filter > will do it. > If not, I suppose you could look at it's source to write your own. Or maybe > there's some way to combine the PositionFilter with something else to do it, > but I can't figure one out.
The synonym approach won't work as I need to provide them in a file. The variants may be more dynamic and not known in advance, the process creating the documents to index does have that logic and could easily put them into the document in a format a tokenizer could pull apart later. --Paul