Thanks Shawn for your explanation. Everything else about the analysis looks correct to me, and the positions you see are needed for a phrase query to work correctly.
Here the "WiFi device" will not be searched as there is a gap in between because Fi is at position 2. The document containing WiFi device will be seen as a phrase with no word in between hence it should match phrase "WiFi device" but it will not whereas "WiFi device"~1 will matched. Best, Modassar On Mon, Jan 18, 2016 at 7:57 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 1/18/2016 6:21 AM, Modassar Ather wrote: > > Can you please send us tokens you get (and positions) when you analyze > > *WiFi device* > > > > Tokens generated and their respective positions. > > > > WiFi 1 > > Wi 1 > > WiFi 1 > > Fi 2 > > device 3 > > It seems very odd to me that the original value would show up twice with > the preserveOriginal parameter set, but I am seeing the same behavior on > 4.7 and 5.3. Because both copies are at the same position, this will > not affect search, but will slightly affect relevance if you are not > specifying a sort parameter. Everything else about the analysis looks > correct to me, and the positions you see are needed for a phrase query > to work correctly. > > I have seen working configurations where preserveOriginal is set on the > index analysis but NOT set on query analysis. This is how my own schema > is configured. One of the reasons for this configuration is to reduce > the number of terms in the query so it is faster than it would be if > preserveOriginal were present and generated additional terms. The > preserveOriginal on the index side ensures a match whether mixed case is > used or not. > > Thanks, > Shawn > >