Re: Using Shingles to Increase Phrase Search Performance

Norberto Meijome Wed, 24 Sep 2008 07:45:31 -0700

On Sat, 16 Aug 2008 15:39:44 -0700
"Chris Harris" <[EMAIL PROTECTED]> wrote:


[...]
> So finally I modified the Lucene ShingleFilter class to add an
> "outputUnigramIfNoNgram option". Basically, if you set that option,
> and also set outputUnigrams=false, then the filter will tokenize just
> as in Exhibit B, except that if the query is only one word long, it
> will return a corresponding single token, rather than zero tokens. In
> other words,
> 
> [Exhibit C]
> "please" ->
>   "please"
> 
> Things were still zippy. And, so far, I think I have seriously
> improved my phrase search performance without ruining anything.

hi Chris,
 is this change part of 1.3 ? 

I've tried 
        <fieldType name="shingle4_mark2" class="solr.TextField">
                        <analyzer>
                                <tokenizer 
class="solr.StandardTokenizerFactory" />
                                <filter class="solr.ShingleFilterFactory"
                                        maxShingleSize="4" 
outputUnigrams="false" outputUnigramIfNoNgram="true" />
                                <filter class="solr.LowerCaseFilterFactory" />
                        </analyzer>
                </fieldType>


but analysis.jsp shows no tokens generated when there is only 1 word. 

thanks!
B

_________________________
{Beto|Norberto|Numard} Meijome

 I sense much NT in you.
 NT leads to Bluescreen.
 Bluescreen leads to downtime.
 Downtime leads to suffering.
 NT is the path to the darkside.
 Powerful Unix is.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

Re: Using Shingles to Increase Phrase Search Performance

Reply via email to