Hi Mark, Are you familiar with shingles aka token n-grams?
http://lucene.apache.org/solr/api/org/apache/solr/analysis/ShingleFilterFactory.html Use the empty string for the tokenSeparator to get wordstogether style tokens in your index. I think you'll want to apply this filter only at index-time, since the users will supply the shingles all by themselves :). Steve > -----Original Message----- > From: Mark Mandel [mailto:mark.man...@gmail.com] > Sent: Thursday, June 09, 2011 8:37 AM > To: solr-user@lucene.apache.org > Subject: Re: Tokenising based on known words? > > Synonyms really wouldn't work for every possible combination of words in > our > index. > > Thanks for the idea though. > > Mark > > On Thu, Jun 9, 2011 at 3:42 PM, Gora Mohanty <g...@mimirtech.com> wrote: > > > On Thu, Jun 9, 2011 at 4:37 AM, Mark Mandel <mark.man...@gmail.com> > wrote: > > > Not sure if this possible, but figured I would ask the question. > > > > > > Basically, we have some users who do some pretty rediculous things > ;o) > > > > > > Rather than writing "red jacket", they write "redjacket", which > obviously > > > returns no results. > > [...] > > > > Have you tried using synonyms, > > > > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymF > ilterFactory > > It seems like they should fit your use case. > > > > Regards, > > Gora > > > > > > -- > E: mark.man...@gmail.com > T: http://www.twitter.com/neurotic > W: www.compoundtheory.com > > cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia > http://www.cfobjective.com.au > > Hands-on ColdFusion ORM Training > www.ColdFusionOrmTraining.com