Are you trying to match phone numbers despite the spaces/dashes/brackets? By prefix? Suffix?
If so, you may look at something more like: <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="([^0-9])" replacement="" replace="all"/> And remember, if you are using ngrams, you probably want them in the index-chain of the analyzer, but not in the query-chain. Otherwise, you will be matching on anything that has 3 characters overlapping. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 17 November 2014 16:43, Pritesh Patel <priteshpate...@gmail.com> wrote: > Hi Community. > > Hoping someone can help explain this ... > > Once all the analysis is done on a field all the tokens to identify that > field are stored. What else is affecting a match to the document beyond a > simple token match and frequency of terms that match? > > All the searches I did produce the same tokens (verified by using the > analysis screen in the admin, and looking at the terms indexed in solr > through the schema browser for field). But some match and some don't when > I actually do the search. I don't know why some of the searches don't > match even though everything in the analysis tells me they have the same > tokens. What am I missing? > > *Descriptions* > > *Indexed in a field*: "4048860461" > > *Searches that Match* > "4048860461" > "(404)8860461" > > *Searches that don't match* > "404-886-0461" > "404)8860461" > "404)886)0461" > > *Field analysis* > Field analysis is pretty simple, just used the "text_en_splitting_tight" > field but added an "ngram" filter to it. See below. > > <fieldType name="text_en_splitting_tight_ngram" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer> < > tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class= > "solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand > ="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words= > "lang/stopwords_en.txt"/> <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> <filter class= > "solr.LowerCaseFilterFactory"/> <filter class= > "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class= > "solr.EnglishMinimalStemFilterFactory"/> <filter class= > "solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/> <!-- this > filter can remove any duplicate tokens that appear at the same position - > sometimes possible with WordDelimiterFilter in conjuncton with stemming. --> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </ > fieldType>