RE: Using terms and N-gram

Bob Sandiford Thu, 03 Feb 2011 10:17:05 -0800

I don't suppose it's something silly like the fact that your indexing chain 
includes 'words="stopwords.txt"', and your query chain does not?


Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
_____
Early COSUGI birds get the worm! 
Register by 15 February and get a one time viewing of the three course 
Circulation Basics self-paced training suite.
http://www.cosugi.org/ 




> -----Original Message-----
> From: openvictor Open [mailto:openvic...@gmail.com]
> Sent: Thursday, February 03, 2011 12:02 AM
> To: solr-user@lucene.apache.org
> Subject: Using terms and N-gram
> 
> Dear all,
> 
> I am trying to implement an autocomplete system for research. But I am
> stuck
> on some problems that I can't solve.
> 
> Here is my problem :
> I give text like :
> "the cat is black" and I want to explore all 1 gram to 8 gram for all
> the
> text that are passed :
> the, cat, is, black, the cat, cat is, is black, etc...
> 
> In order to do that I have defined the following fieldtype in my schema
> :
> 
>     <!--Custom fieldtype-->
>     <fieldType name="ngram_field" class="solr.TextField">
>       <analyzer type="index">
>     <tokenizer class="solr.LowerCaseTokenizerFactory" />
>     <filter class="solr.CommonGramsFilterFactory" words="stopwords.txt"
> ignoreCase="true" maxGramSize="8"
>            minGramSize="1"/>
>       </analyzer>
>       <analyzer type="query">
>     <tokenizer class="solr.LowerCaseTokenizerFactory" />
>     <filter class="solr.CommonGramsFilterFactory" ignoreCase="true"
> maxGramSize="8"
>            minGramSize="1"/>
>       </analyzer>
>     </fieldType>
> 
> 
> Then the following field :
> 
>     <field name="p_title_ngram" type="ngram_field" indexed="true"
> stored="true"/>
> 
> Then I feed solr with some phrases and I was really surprised to see
> that
> Solr didn't behave as expected.
> I went to the schema browser to see the result for the very profound
> query :
> "the cat is black and it rains"
> 
> The results are quite deceiving : first 1 grams are not found. some 2
> grams
> are found like : the_cat, "and_it" etc... But not what I expected.
> Is there something I am missing here ? (by the way I also tried to
> remove
> the mingramsize and maxgramsize even the words).
> 
> Thank you,
> Victor Kabdebon

RE: Using terms and N-gram

Reply via email to