I can believe it.
Note: He's using "ngrams", not "edge" ngrams.
-- Jack Krupansky
-----Original Message-----
From: Walter Underwood
Sent: Friday, December 14, 2012 11:21 AM
To: solr-user@lucene.apache.org
Cc: ark...@smartbit.be
Subject: Re: NGram with words
Positions for edge ngrams are wrong. They should be handled like synonyms.
This breaks phrase matching with ngrams. Not sure if there is a bug filed
for this.
wunder
On Dec 14, 2012, at 8:16 AM, Jack Krupansky wrote:
Yeah, the positions for ngrams have a good chance of not being what you
want.
But do try the Solr Admin Analysis web page for that index text and see
what positions it generates for the sub-words. The two generated words
used in your query may not have adjacent positions.
-- Jack Krupansky
-----Original Message----- From: Arkadi Colson
Sent: Friday, December 14, 2012 9:10 AM
To: solr-user@lucene.apache.org
Subject: NGram with words
Hi
When "abcdefg 123456" is in Solr I would like to have match with
- abcd
- cdef
- abcdefg 123456
- "abcdefg 123456"
- "defg 1234"
The last one is actually not working.
What am I doing wrong?
My config looks like this.
/<field name="smsc_description" type="text" indexed="true"
stored="false" multiValued="true" omitNorms="true" omitPositions="false"
omitTermFreqAndPositions="false"/>
<field name="smsc_description_ngram" type="text_ngram"
indexed="true" stored="false" multiValued="true" omitNorms="true"
omitPositions="false" omitTermFreqAndPositions="false"/>
<copyField source="smsc_description" dest="smsc_description_ngram"/>
//<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt,stopwords_du.txt"
enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt,stopwords_du.txt"
enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_ngram" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt,stopwords_du.txt"
enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2"
maxGramSize="8"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt,stopwords_du.txt"
enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
/
BR,
Arkadi
--
Walter Underwood
wun...@wunderwood.org