Hi Steve, Sorry for a late reply, been quite busy. I have had afterthoughts immediately after sending the question, in line with what you said: I meant the source token start and end offset positions.
When MCFF is removed, the $ disappears after ST and start and end offsets of all the terms are correct. Is MCFF's behaviour correct? Should I raise a jira for retaining the start and end offsets of the original tokens? On Thu, Jun 18, 2015 at 10:06 PM, Steve Rowe <sar...@gmail.com> wrote: > Hi Dmitry, > > It’s weird that start and end offsets are the same - what do you see for > the start/end of ‘$’, i.e. if you take out MCFF? (I think it should be > start:5, end:6.) > > As far as offsets “respecting the remapped token”, are you asking for > offsets to be set as if ‘dollarsign' were part of the original text? If > so, there is no setting that would do that - the intent is for offsets to > map to the *original* text. You can work around this by performing the > substitution prior to Solr analysis, e.g. in an update processor like > RegexReplaceProcessorFactory. > > Steve > www.lucidworks.com > > > On Jun 18, 2015, at 3:07 AM, Dmitry Kan <solrexp...@gmail.com> wrote: > > > > Hi, > > > > It looks like MappingCharFilter sets start and end offset to the same > > value. Can this be affected on by some setting? > > > > For a string: test $ test2 and mapping "$" => " dollarsign " (we insert > > extra space to separate $ into its own token) > > > > we get: http://snag.gy/eJT1H.jpg > > > > Ideally, we would like to have start and end offset respecting the > remapped > > token. Can this be achieved with settings? > > > > -- > > Dmitry Kan > > Luke Toolbox: http://github.com/DmitryKey/luke > > Blog: http://dmitrykan.blogspot.com > > Twitter: http://twitter.com/dmitrykan > > SemanticAnalyzer: www.semanticanalyzer.info > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info