On Sat, 16 Aug 2008 15:39:44 -0700 "Chris Harris" <[EMAIL PROTECTED]> wrote:
[...] > So finally I modified the Lucene ShingleFilter class to add an > "outputUnigramIfNoNgram option". Basically, if you set that option, > and also set outputUnigrams=false, then the filter will tokenize just > as in Exhibit B, except that if the query is only one word long, it > will return a corresponding single token, rather than zero tokens. In > other words, > > [Exhibit C] > "please" -> > "please" > > Things were still zippy. And, so far, I think I have seriously > improved my phrase search performance without ruining anything. hi Chris, is this change part of 1.3 ? I've tried <fieldType name="shingle4_mark2" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="false" outputUnigramIfNoNgram="true" /> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType> but analysis.jsp shows no tokens generated when there is only 1 word. thanks! B _________________________ {Beto|Norberto|Numard} Meijome I sense much NT in you. NT leads to Bluescreen. Bluescreen leads to downtime. Downtime leads to suffering. NT is the path to the darkside. Powerful Unix is. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.