Hi Steve,

Sorry for a late reply, been quite busy. I have had afterthoughts
immediately after sending the question, in line with what you said: I meant
the source token start and end offset positions.

When MCFF is removed, the $ disappears after ST and start and end offsets
of all the terms are correct.

Is MCFF's behaviour correct? Should I raise a jira for retaining the start
and end offsets of the original tokens?

On Thu, Jun 18, 2015 at 10:06 PM, Steve Rowe <sar...@gmail.com> wrote:

> Hi Dmitry,
>
> It’s weird that start and end offsets are the same - what do you see for
> the start/end of ‘$’, i.e. if you take out MCFF?  (I think it should be
> start:5, end:6.)
>
> As far as offsets “respecting the remapped token”, are you asking for
> offsets to be set as if ‘dollarsign' were part of the original text?  If
> so, there is no setting that would do that - the intent is for offsets to
> map to the *original* text.  You can work around this by performing the
> substitution prior to Solr analysis, e.g. in an update processor like
> RegexReplaceProcessorFactory.
>
> Steve
> www.lucidworks.com
>
> > On Jun 18, 2015, at 3:07 AM, Dmitry Kan <solrexp...@gmail.com> wrote:
> >
> > Hi,
> >
> > It looks like MappingCharFilter sets start and end offset to the same
> > value. Can this be affected on by some setting?
> >
> > For a string: test $ test2 and mapping "$" => " dollarsign " (we insert
> > extra space to separate $ into its own token)
> >
> > we get: http://snag.gy/eJT1H.jpg
> >
> > Ideally, we would like to have start and end offset respecting the
> remapped
> > token. Can this be achieved with settings?
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: www.semanticanalyzer.info
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Reply via email to