Re: Position increment in WordDelimiterFilter.

Modassar Ather Mon, 18 Jan 2016 21:42:07 -0800

Thanks Shawn for your explanation.

Everything else about the analysis looks
correct to me, and the positions you see are needed for a phrase query
to work correctly.


Here the "WiFi device" will not be searched as there is a gap in between
because Fi is at position 2. The document containing WiFi device will be
seen as a phrase with no word in between hence it should match phrase "WiFi
device" but it will not whereas "WiFi device"~1 will matched.

Best,
Modassar

On Mon, Jan 18, 2016 at 7:57 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 1/18/2016 6:21 AM, Modassar Ather wrote:
> > Can you please send us tokens you get (and positions) when you analyze
> > *WiFi device*
> >
> > Tokens generated and their respective positions.
> >
> > WiFi                1
> > Wi                  1
> > WiFi                1
> > Fi                  2
> > device              3
>
> It seems very odd to me that the original value would show up twice with
> the preserveOriginal parameter set, but I am seeing the same behavior on
> 4.7 and 5.3.  Because both copies are at the same position, this will
> not affect search, but will slightly affect relevance if you are not
> specifying a sort parameter.  Everything else about the analysis looks
> correct to me, and the positions you see are needed for a phrase query
> to work correctly.
>
> I have seen working configurations where preserveOriginal is set on the
> index analysis but NOT set on query analysis.  This is how my own schema
> is configured.  One of the reasons for this configuration is to reduce
> the number of terms in the query so it is faster than it would be if
> preserveOriginal were present and generated additional terms.  The
> preserveOriginal on the index side ensures a match whether mixed case is
> used or not.
>
> Thanks,
> Shawn
>
>

Re: Position increment in WordDelimiterFilter.

Reply via email to