No, there isn't a tokenizer that'll do what you want that I know about. Really, I suspect you need to back up a bit and re-think the problem. It looks to me like you've taken a path that's going to cause you endless grief when, as Jack says, phrase searches are built in to the tokenization process.
Best, Erick On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky <j...@basetechnology.com> wrote: > Query by phrase is a core feature of tokenized text in Lucene and Solr, so > there is no need to use a pattern token filter for that purpose. And yes, > doing so pretty much breaks most token filters that would assume that the > text is tokenized. > > -- Jack Krupansky > > -----Original Message----- From: solr-user > Sent: Wednesday, April 2, 2014 12:46 PM > To: solr-user@lucene.apache.org > > Subject: Re: how do I get search for "fort st john" to match "ft saint john" > > Hi Eric. > > No, that doesnt fix the problem either (I have tested this previously and > did so again just now) > > Since the PatternTokenizerFactory is not tokenizing on whitespace(by design > since I want the user to search by phrase), the phrase "marina former fort > ord" (for example) does not get turned into four tokens ("marina", "former", > "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms > for them (by design) > > the original question remains: is there a tokenizer/plugin that will allow > me to synonym words in a unbroken phrase? > > note: the reason I dont want to tokenize the data by whitespace is that it > would cause way to many results to get returned if I, for example, search on > "new" or "st" ... However, I still want to be able to include "fort saint > john" in the results if the user searches for "ft st john" or "fort st john" > or ... > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html > Sent from the Solr - User mailing list archive at Nabble.com.