At a first glance, you're replacing the apostrophe with a space so INT'L becomes INT L, two separate tokens. Why not replace with ""? I.e. remove the apostrophe?
I also suspect you actually want WhitespaceTokensizerFactory, KeywordTokenizerFactory will cause "my dog has fleas" to be indexed exactly as one token consisting of 4 words. Unless this is a very specialized field, it's usually the situation that you'd like to index 4 tokens, but you know your problem space better than I do. Admin/analysis is your friend. You could also consider WordDelimiterFilterFactory with catenateWords="1". Best Erick On Fri, Jul 12, 2013 at 5:11 AM, [email protected] <[email protected]> wrote: > Hi, > > Scenario: > > User who perform search forget to put punctuation mark (apostrophe) for ex, > when user wants to search for a value like INT'L, they just key in INTL > (with no punctuation). In this scenario, I wish to return both values with > INTL and INT'L that currently are indexed on SOLR instance. Currently, if I > search for INTL it wont return the row having value INT'L. > > Schema Configuration entry for the field type: > > <fieldType name="customStr" class="solr.TextField" > positionIncrementGap="100" sortMissingLast="true"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.TrimFilterFactory" /> > <filter class="solr.PatternReplaceFilterFactory" > pattern="\s*[,.]\s*" replacement=" " replace="all" /> > <filter class="solr.PatternReplaceFilterFactory" pattern="\s+" > replacement=" " replace="all" /> > <filter class="solr.PatternReplaceFilterFactory" pattern="[';]" > replacement="" replace="all" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="\s*[,.]\s*" replacement=" " replace="all" /> > <filter class="solr.PatternReplaceFilterFactory" pattern="\s+" > replacement=" " replace="all" /> > <filter class="solr.PatternReplaceFilterFactory" pattern="[';]" > replacement="" replace="all"/> > </analyzer> > </fieldType> > > Please suggest as to what mechanism should I use to fetch both the values > like INTL and INT'L, when the search is performed for INTL. Also, does the > reg-ex look correct for the analyzers? What all different filters/ tokenizer > can be used to overcome this issue. > > Thanks! > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Search-with-punctuations-tp4077510.html > Sent from the Solr - User mailing list archive at Nabble.com.
