On Tue, 24 Jun 2008 19:17:58 -0700 Ryan McKinley <[EMAIL PROTECTED]> wrote:
> also, check the LukeRequestHandler > > if there is a document you think *should* match, you can see what > tokens it has actually indexed... > hi Ryan, I can't see the tokens generated using LukeRequestHandler. I can get to the document I want : http://localhost:8983/solr/_test_/admin/luke/?id=Jay%20Rock and for the field I am interested , i get only : [...] <lst name="artist_ngram"> <str name="type">ngram</str> <str name="schema">ITS----------</str> <str name="flags">ITS----------</str> <str name="value">Jay Rock</str> <str name="internal">Jay Rock</str> <float name="boost">1.0</float> <int name="docFreq">0</int> </lst> [...] ( all the other fields look pretty much identical , none of them show the tokens generated). using the luke tool itself ( lukeall.jar ,source # 0.8.1, linked against Lucene's 2.4 libs bundled with the nightly build), I see the following tokens, for this document + field: ja, ay, y , r, ro, oc, ck, jay, ay , y r, ro, roc, ock, jay , ay r, y ro, roc, rock, jay r, ay ro, y roc, rock, jay ro, ay roc, y rock, jay roc, ay rock, jay rock Which is precisely what I expect, given that my 'ngram' type is defined as : <!-- n-gram tokenization --> <fieldType name="ngram" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="org.apache.solr.analysis.NGramTokenizerFactory" minGramSize="2" maxGramSize="15" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="org.apache.solr.analysis.NGramTokenizerFactory" minGramSize="2" maxGramSize="15" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> </analyzer> </fieldType> My question now is, was I supposed to get any more information from LukeRequestHandler ? furthermore, if I perform , on this same core with exactly this data : http://localhost:8983/solr/_test_/select?q=artist_ngram:ro I get this document returned (and many others). but, if I search for 'roc' instead of 'ro' : http://localhost:8983/solr/_test_/select?q=artist_ngram:roc − <response> − <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">48</int> − <lst name="params"> <str name="q">artist_ngram:roc</str> <str name="debugQuery">true</str> </lst> </lst> <result name="response" numFound="0" start="0"/> − <lst name="debug"> <str name="rawquerystring">artist_ngram:roc</str> <str name="querystring">artist_ngram:roc</str> <str name="parsedquery">PhraseQuery(artist_ngram:"ro oc roc")</str> <str name="parsedquery_toString">artist_ngram:"ro oc roc"</str> <lst name="explain"/> <str name="QParser">OldLuceneQParser</str> − <lst name="timing"> .[...] Is searching on nGram tokenized fields limited to the minGramSize ? Thanks for any pointers you can provide, B _________________________ {Beto|Norberto|Numard} Meijome "I didn't attend the funeral, but I sent a nice letter saying I approved of it." Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.