Internals of Analysis and Token Matching

Pritesh Patel Mon, 17 Nov 2014 13:46:28 -0800

Hi Community.

Hoping someone can help explain this ...


Once all the analysis is done on a field all the tokens to identify that
field are stored.  What else is affecting a match to the document beyond a
simple token match and frequency of terms that match?

All the searches I did produce the same tokens (verified by using the
analysis screen in the admin, and looking at the terms indexed in solr
through the schema browser for field).  But some match and some don't when
I actually do the search.  I don't know why some of the searches don't
match even though everything in the analysis tells me they have the same
tokens.  What am I missing?

*Descriptions*

*Indexed in a field*: "4048860461"

*Searches that Match*
"4048860461"
"(404)8860461"

*Searches that don't match*
"404-886-0461"
"404)8860461"
"404)886)0461"

*Field analysis*
Field analysis is pretty simple, just used the "text_en_splitting_tight"
field but added an "ngram" filter to it.  See below.

<fieldType name="text_en_splitting_tight_ngram" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer> <
tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class=
"solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand
="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words=
"lang/stopwords_en.txt"/> <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/> <filter class=
"solr.LowerCaseFilterFactory"/> <filter class=
"solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class=
"solr.EnglishMinimalStemFilterFactory"/> <filter class=
"solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/> <!-- this
filter can remove any duplicate tokens that appear at the same position -
sometimes possible with WordDelimiterFilter in conjuncton with stemming. -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </
fieldType>

Internals of Analysis and Token Matching

Reply via email to