On Sat, 16 Aug 2008 15:39:44 -0700
"Chris Harris" <[EMAIL PROTECTED]> wrote:

[...]
> So finally I modified the Lucene ShingleFilter class to add an
> "outputUnigramIfNoNgram option". Basically, if you set that option,
> and also set outputUnigrams=false, then the filter will tokenize just
> as in Exhibit B, except that if the query is only one word long, it
> will return a corresponding single token, rather than zero tokens. In
> other words,
> 
> [Exhibit C]
> "please" ->
>   "please"
> 
> Things were still zippy. And, so far, I think I have seriously
> improved my phrase search performance without ruining anything.

hi Chris,
 is this change part of 1.3 ? 

I've tried 
        <fieldType name="shingle4_mark2" class="solr.TextField">
                        <analyzer>
                                <tokenizer 
class="solr.StandardTokenizerFactory" />
                                <filter class="solr.ShingleFilterFactory"
                                        maxShingleSize="4" 
outputUnigrams="false" outputUnigramIfNoNgram="true" />
                                <filter class="solr.LowerCaseFilterFactory" />
                        </analyzer>
                </fieldType>


but analysis.jsp shows no tokens generated when there is only 1 word. 

thanks!
B

_________________________
{Beto|Norberto|Numard} Meijome

 I sense much NT in you.
 NT leads to Bluescreen.
 Bluescreen leads to downtime.
 Downtime leads to suffering.
 NT is the path to the darkside.
 Powerful Unix is.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

Reply via email to