Re: NGram with words

Walter Underwood Fri, 14 Dec 2012 09:02:09 -0800

I specified "edge ngrams" because that is the one I've investigated. --wunder


On Dec 14, 2012, at 8:30 AM, Jack Krupansky wrote:

> I can believe it.
> 
> Note: He's using "ngrams", not "edge" ngrams.
> 
> -- Jack Krupansky
> -----Original Message----- From: Walter Underwood
> Sent: Friday, December 14, 2012 11:21 AM
> To: solr-user@lucene.apache.org
> Cc: ark...@smartbit.be
> Subject: Re: NGram with words
> 
> Positions for edge ngrams are wrong. They should be handled like synonyms. 
> This breaks phrase matching with ngrams. Not sure if there is a bug filed for 
> this.
> 
> wunder
> 
> On Dec 14, 2012, at 8:16 AM, Jack Krupansky wrote:
> 
>> Yeah, the positions for ngrams have a good chance of not being what you want.
>> 
>> But do try the Solr Admin Analysis web page for that index text and see what 
>> positions it generates for the sub-words. The two generated words used in 
>> your query may not have adjacent positions.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Arkadi Colson
>> Sent: Friday, December 14, 2012 9:10 AM
>> To: solr-user@lucene.apache.org
>> Subject: NGram with words
>> 
>> Hi
>> 
>> When "abcdefg 123456" is in Solr I would like to have match with
>> 
>> - abcd
>> - cdef
>> - abcdefg 123456
>> - "abcdefg 123456"
>> - "defg 1234"
>> 
>> The last one is actually not working.
>> What am I doing wrong?
>> My config looks like this.
>> 
>> /<field name="smsc_description" type="text" indexed="true"
>> stored="false" multiValued="true" omitNorms="true" omitPositions="false"
>> omitTermFreqAndPositions="false"/>
>>  <field name="smsc_description_ngram" type="text_ngram"
>> indexed="true" stored="false" multiValued="true" omitNorms="true"
>> omitPositions="false" omitTermFreqAndPositions="false"/>
>> 
>> <copyField source="smsc_description" dest="smsc_description_ngram"/>
>> 
>> //<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>>     <analyzer type="index">
>>       <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>>     <analyzer type="query">
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
>>       <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>>   </fieldType>
>> 
>>   <fieldType name="text_ngram" class="solr.TextField"
>> positionIncrementGap="100">
>>     <analyzer type="index">
>>       <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>       <filter class="solr.NGramFilterFactory" minGramSize="2"
>> maxGramSize="8"/>
>> </analyzer>
>>     <analyzer type="query">
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>>   </fieldType>
>> /
>> 
>> BR,
>> Arkadi
>> 
> 
> --
> Walter Underwood
> wun...@wunderwood.org
> 
> 
> 

--
Walter Underwood
wun...@wunderwood.org

Re: NGram with words

Reply via email to