Yeah, this is a pretty ugly problem. You have two problems, neither of which is all that amenable to simple solutions.
1> context at index time. St, in your example, is either Saint or Street. Solr has nothing built in to it to distinguish this. so you need to do some processing "somewhere else" to get the proper substitutions. 2> Query time. Same issue, but you have virtually no context to figure this out... But, it is NOT the case that "Synonyms only work with Whitespace tokenizer". Synonyms will work with any tokenizer, the problem is that the tokens produced have to match when they get to the SynonymFilter. Even KeywordTokenizer will "work with synonyms", with the caveat that you'd have to have single-word input.... The admin/analysis page will help you see how all this fits together. For instance, if you have the stemmer _before_ the synonym filter, and your original input contains, say, "story", by the time it gets to the synonym filter, the word being matched will be something like "stori". But even getting synonyms working with other tokenizers won't help you with the context problem.... Best Erick On Thu, Apr 19, 2012 at 4:25 AM, Daniel Persson <mailto.wo...@gmail.com> wrote: > Hi solr users. > > I'm trying to create an index of geographic data to search with solr. > > And I get a problem with searches with abbreviations. > > At the moment I use an index filter with > > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.ICUFoldingFilterFactory" /> > </analyzer> > > This is because my searches at the moment are need to be full Keywords to > enable correct hits and ranking. > > I have other tokenizers for other types of searches. > > The problem I got now is with a streets with names like > > East Saint James Street. > > This could be abbreviated as > > E St James St > > Anyone got a suggestion what to try? > > My guess was to use synonyms but that seems to work only with > WhitespaceTokenizer. I've thought about PatternReplaceCharFilter but that > will be a lot of rules to cover all abbreviations. > > Best regards > > Daniel