Hello, I am using Solr 6.6's Suggester functionality to power an autosuggest widget that returns lists of people's names.
One requirement that we have is that the suggester be punctuation-insensitive. For example, entering: 'Dr Joh' should provide the suggestion 'Dr. John', despite the fact that the user omitted the period after 'dr'. 'Hank Williams Jr' should provide the suggestion 'Hank Williams, Jr.' despite the omission of both the comma and the period. This functionality is present - but the punctuation-stripping appears to be causing highlighting offsets to be miscalculated: we end up with '<b>Dr Jo</b>hn' for the first query and '<b>Hank Williams, J</b>r.' for the second Here's are the relevant parts of the solrconfig.xml and schema.xml configurations: <!-- solrconfig.xml --> <searchComponent class="solr.SuggestComponent" name="suggestEntity"> <lst name="suggester"> <str name="name">suggestEntity</str> <str name="lookupImpl">AnalyzingInfixLookupFactory</str> <str name="dictionaryImpl">DocumentDictionaryFactory</str> <str name="field">skos_prefLabel</str> <str name="weightField">derived_score</str> <str name="payloadField">payload</str> <str name="suggestAnalyzerFieldType">suggestType</str> <str name="minPrefixChars">2</str> <str name="buildOnStartup">false</str> <str name="buildOnCommit">false</str> <str name="buildOnOptimize">true</str> <str name="contextField">suggest_filters</str> </lst> </searchComponent> <requestHandler class="org.apache.solr.handler.component.SearchHandler" startup="lazy" name="/suggestEntity"> <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.highlight">true</str> <str name="suggest.count">10</str> <str name="suggest.dictionary">suggestEntity</str> </lst> <arr name="components"> <str>suggestEntity</str> </arr> </requestHandler> <!-- schema.xml --> <fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100" termVectors="true" termPositions="true" termOffsets="true" storeOffsetsWithPositions="true"> <analyzer> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[!'#%'()*+,-./:;=>?@[/]^{|}~]" replacement=""/> <charFilter class="solr.MappingCharFilterFactory" mapping="accent-map.txt"/> <tokenizer class="solr.PatternTokenizerFactory" pattern="_"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> As you can see from the schema.xml document, I've tried storing term vectors, offsets, etc., but the Suggester highlighter doesn't seem to take advantage of them. Does anyone know what I'm doing wrong here? Or is this a bug in the highlighter? Thanks, Tim Hill