Ah I see. Thanks for the explanation. Could you set the defaultOperator to "AND"? That way both "Bill" and "Cl" must be a match and that would exclude "Clyde Phillips".
--- On Thu, 11/11/10, Robert Gründler <rob...@dubture.com> wrote: > From: Robert Gründler <rob...@dubture.com> > Subject: Re: EdgeNGram relevancy > To: solr-user@lucene.apache.org > Date: Thursday, November 11, 2010, 3:51 PM > according to the fieldtype i posted > previously, i think it's because of: > > 1. WhiteSpaceTokenizer splits the String "Clyde Phillips" > into 2 tokens: "Clyde" and "Phillips" > 2. EdgeNGramFilter gets the 2 tokens, and creates an > EdgeNGram for each token: "C" "Cl" "Cly" > ... AND "P" "Ph" "Phi" ... > > The Query String "Bill Cl" gets split up in 2 Tokens "Bill" > and "Cl" by the WhitespaceTokenizer. > > This creates a match for the 2nd token "Ci" of the query, > and one of the "sub"tokens the EdgeNGramFilter created: > "Cl". > > > -robert > > > > > On Nov 11, 2010, at 21:34 , Andy wrote: > > > Could anyone help me understand what does "Clyde > Phillips" appear in the results for "Bill Cl"?? > > > > "Clyde Phillips" doesn't produce any EdgeNGram that > would match "Bill Cl", so why is it even in the results? > > > > Thanks. > > > > --- On Thu, 11/11/10, Ahmet Arslan <iori...@yahoo.com> > wrote: > > > >> You can add an additional field, with > >> using KeywordTokenizerFactory instead of > >> WhitespaceTokenizerFactory. And query both these > fields with > >> an OR operator. > >> > >> edgytext:(Bill Cl) OR edgytext2:"Bill Cl" > >> > >> You can even apply boost so that begins with > matches comes > >> first. > >> > >> --- On Thu, 11/11/10, Robert Gründler <rob...@dubture.com> > >> wrote: > >> > >>> From: Robert Gründler <rob...@dubture.com> > >>> Subject: EdgeNGram relevancy > >>> To: solr-user@lucene.apache.org > >>> Date: Thursday, November 11, 2010, 5:51 PM > >>> Hi, > >>> > >>> consider the following fieldtype (used for > >>> autocompletion): > >>> > >>> <fieldType > name="edgytext" > >> class="solr.TextField" > >>> positionIncrementGap="100"> > >>> <analyzer type="index"> > >>> <tokenizer > >>> class="solr.WhitespaceTokenizerFactory"/> > >>> <filter > >>> class="solr.LowerCaseFilterFactory"/> > >>> <filter > >>> class="solr.StopFilterFactory" > ignoreCase="true" > >>> words="stopwords.txt" > enablePositionIncrements="true" > >>> /> > >>> <filter > >>> class="solr.PatternReplaceFilterFactory" > >> pattern="([^a-z])" > >>> replacement="" replace="all" /> > >>> <filter > >>> class="solr.EdgeNGramFilterFactory" > minGramSize="1" > >>> maxGramSize="25" /> > >>> </analyzer> > >>> <analyzer type="query"> > >>> <tokenizer > >>> class="solr.WhitespaceTokenizerFactory"/> > >>> <filter > >>> class="solr.LowerCaseFilterFactory"/> > >>> <filter > >>> class="solr.StopFilterFactory" > ignoreCase="true" > >>> words="stopwords.txt" > enablePositionIncrements="true" > >> /> > >>> <filter > >>> class="solr.PatternReplaceFilterFactory" > >> pattern="([^a-z])" > >>> replacement="" replace="all" /> > >>> </analyzer> > >>> </fieldType> > >>> > >>> > >>> This works fine as long as the query string is > a > >> single > >>> word. For multiple words, the ranking is > weird > >> though. > >>> > >>> Example: > >>> > >>> Query String: "Bill Cl" > >>> > >>> Result (in that order): > >>> > >>> - Clyde Phillips > >>> - Clay Rogers > >>> - Roger Cloud > >>> - Bill Clinton > >>> > >>> "Bill Clinton" should have the highest rank in > that > >>> case. > >>> > >>> Has anyone an idea how to to configure this > fieldtype > >> to > >>> make matches in both tokens rank higher than > those who > >> match > >>> in either token? > >>> > >>> > >>> thanks! > >>> > >>> > >>> -robert > >>> > >>> > >>> > >>> > >> > >> > >> > >> > > > > > > > >