On 19 January 2016 at 05:41, Modassar Ather <modather1...@gmail.com> wrote:
> Thanks Shawn for your explanation. > > Everything else about the analysis looks > correct to me, and the positions you see are needed for a phrase query > to work correctly. > > Here the "WiFi device" will not be searched as there is a gap in between > because Fi is at position 2. The document containing WiFi device will be > seen as a phrase with no word in between hence it should match phrase "WiFi > device" but it will not whereas "WiFi device"~1 will matched. > > ,Let's try to summarise in detail as this is quite confusing : 1) Index : "WiFi device" tokenized as you described [ WiFi 1 > Wi 1 > WiFi 1 > Fi 2 > device 3 ] 2) Query time simple whitespace tokenized : "WiFi device" [ WiFi(0) device(1) ] In this case, it will happen what you exactly quoted. I should take a look to an old message in the mailing list, pretty sure we faced this very same discussion. The problem with word expansion is that whatever you do you are going to get some side effect. Cheers > Best, > Modassar > > On Mon, Jan 18, 2016 at 7:57 PM, Shawn Heisey <apa...@elyograg.org> wrote: > > > On 1/18/2016 6:21 AM, Modassar Ather wrote: > > > Can you please send us tokens you get (and positions) when you analyze > > > *WiFi device* > > > > > > Tokens generated and their respective positions. > > > > > > WiFi 1 > > > Wi 1 > > > WiFi 1 > > > Fi 2 > > > device 3 > > > > It seems very odd to me that the original value would show up twice with > > the preserveOriginal parameter set, but I am seeing the same behavior on > > 4.7 and 5.3. Because both copies are at the same position, this will > > not affect search, but will slightly affect relevance if you are not > > specifying a sort parameter. Everything else about the analysis looks > > correct to me, and the positions you see are needed for a phrase query > > to work correctly. > > > > I have seen working configurations where preserveOriginal is set on the > > index analysis but NOT set on query analysis. This is how my own schema > > is configured. One of the reasons for this configuration is to reduce > > the number of terms in the query so it is faster than it would be if > > preserveOriginal were present and generated additional terms. The > > preserveOriginal on the index side ensures a match whether mixed case is > > used or not. > > > > Thanks, > > Shawn > > > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England