Hi Mark,

Are you familiar with shingles aka token n-grams?

http://lucene.apache.org/solr/api/org/apache/solr/analysis/ShingleFilterFactory.html

Use the empty string for the tokenSeparator to get wordstogether style tokens 
in your index. 

I think you'll want to apply this filter only at index-time, since the users 
will supply the shingles all by themselves :).

Steve

> -----Original Message-----
> From: Mark Mandel [mailto:mark.man...@gmail.com]
> Sent: Thursday, June 09, 2011 8:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Tokenising based on known words?
> 
> Synonyms really wouldn't work for every possible combination of words in
> our
> index.
> 
> Thanks for the idea though.
> 
> Mark
> 
> On Thu, Jun 9, 2011 at 3:42 PM, Gora Mohanty <g...@mimirtech.com> wrote:
> 
> > On Thu, Jun 9, 2011 at 4:37 AM, Mark Mandel <mark.man...@gmail.com>
> wrote:
> > > Not sure if this possible, but figured I would ask the question.
> > >
> > > Basically, we have some users who do some pretty rediculous things
> ;o)
> > >
> > > Rather than writing "red jacket", they write "redjacket", which
> obviously
> > > returns no results.
> > [...]
> >
> > Have you tried using synonyms,
> >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymF
> ilterFactory
> > It seems like they should fit your use case.
> >
> > Regards,
> > Gora
> >
> 
> 
> 
> --
> E: mark.man...@gmail.com
> T: http://www.twitter.com/neurotic
> W: www.compoundtheory.com
> 
> cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia
> http://www.cfobjective.com.au
> 
> Hands-on ColdFusion ORM Training
> www.ColdFusionOrmTraining.com

Reply via email to