On 10/23/2012 8:16 AM, Jay Luker wrote:
From looking at the analysis debugger I can see that the WDF is
getting the term "Telescope,SALT" and correctly splitting on the
comma. The problem seems to be that the original term is given the 1st
position, e.g.:
Pos Term
1 Southern
2 African
3 Large
4 Telescope,SALT <-- original term
5 Telescope
6 SALT
Jay, I have WDF with preserveOriginal turned on. I get the following
from WDF parsing in the analysis page on either 3.5 or 4.1-SNAPSHOT, and
the analyzer shows that all four of the query words are found in
consecutive fields. On the new version, I had to slide a scrollbar to
the right to see the last term. Visually they were not in consecutive
fields on the new version (they were on 3.5), but the position number
says otherwise.
1 Southern
2 African
3 Large
4 Telescope,SALT
4 Telescope
5 SALT
5 TelescopeSALT
My full WDF parameters:
index: {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1,
catenateWords=1, splitOnNumerics=1, stemEnglishPossessive=1,
luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0,
catenateNumbers=1}
query: {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1,
catenateWords=0, splitOnNumerics=1, stemEnglishPossessive=1,
luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0,
catenateNumbers=0}
I understand from other messages on the mailing list that I should not
have preserveOriginal on the query side, but I have not yet changed it.
If your position numbers really are what you indicated, you may have
found a bug. I have not tried the released 4.0.0 version, I expect to
deploy from the 4.x branch under development.
Thanks,
Shawn