I specified "edge ngrams" because that is the one I've investigated. --wunder
On Dec 14, 2012, at 8:30 AM, Jack Krupansky wrote: > I can believe it. > > Note: He's using "ngrams", not "edge" ngrams. > > -- Jack Krupansky > -----Original Message----- From: Walter Underwood > Sent: Friday, December 14, 2012 11:21 AM > To: solr-user@lucene.apache.org > Cc: ark...@smartbit.be > Subject: Re: NGram with words > > Positions for edge ngrams are wrong. They should be handled like synonyms. > This breaks phrase matching with ngrams. Not sure if there is a bug filed for > this. > > wunder > > On Dec 14, 2012, at 8:16 AM, Jack Krupansky wrote: > >> Yeah, the positions for ngrams have a good chance of not being what you want. >> >> But do try the Solr Admin Analysis web page for that index text and see what >> positions it generates for the sub-words. The two generated words used in >> your query may not have adjacent positions. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Arkadi Colson >> Sent: Friday, December 14, 2012 9:10 AM >> To: solr-user@lucene.apache.org >> Subject: NGram with words >> >> Hi >> >> When "abcdefg 123456" is in Solr I would like to have match with >> >> - abcd >> - cdef >> - abcdefg 123456 >> - "abcdefg 123456" >> - "defg 1234" >> >> The last one is actually not working. >> What am I doing wrong? >> My config looks like this. >> >> /<field name="smsc_description" type="text" indexed="true" >> stored="false" multiValued="true" omitNorms="true" omitPositions="false" >> omitTermFreqAndPositions="false"/> >> <field name="smsc_description_ngram" type="text_ngram" >> indexed="true" stored="false" multiValued="true" omitNorms="true" >> omitPositions="false" omitTermFreqAndPositions="false"/> >> >> <copyField source="smsc_description" dest="smsc_description_ngram"/> >> >> //<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> >> <analyzer type="index"> >> <charFilter class="solr.HTMLStripCharFilterFactory"/> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.SynonymFilterFactory" >> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>--> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> </analyzer> >> </fieldType> >> >> <fieldType name="text_ngram" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <charFilter class="solr.HTMLStripCharFilterFactory"/> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.NGramFilterFactory" minGramSize="2" >> maxGramSize="8"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords_en.txt,stopwords_du.txt" enablePositionIncrements="true"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> </analyzer> >> </fieldType> >> / >> >> BR, >> Arkadi >> > > -- > Walter Underwood > wun...@wunderwood.org > > > -- Walter Underwood wun...@wunderwood.org