On 10/23/2012 8:16 AM, Jay Luker wrote:
 From looking at the analysis debugger I can see that the WDF is
getting the term "Telescope,SALT" and correctly splitting on the
comma. The problem seems to be that the original term is given the 1st
position, e.g.:

Pos  Term
1      Southern
2      African
3      Large
4      Telescope,SALT  <-- original term
5      Telescope
6      SALT

Jay, I have WDF with preserveOriginal turned on. I get the following from WDF parsing in the analysis page on either 3.5 or 4.1-SNAPSHOT, and the analyzer shows that all four of the query words are found in consecutive fields. On the new version, I had to slide a scrollbar to the right to see the last term. Visually they were not in consecutive fields on the new version (they were on 3.5), but the position number says otherwise.

1    Southern
2    African
3    Large
4    Telescope,SALT
4    Telescope
5    SALT
5    TelescopeSALT

My full WDF parameters:
index: {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1, catenateWords=1, splitOnNumerics=1, stemEnglishPossessive=1, luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0, catenateNumbers=1} query: {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1, catenateWords=0, splitOnNumerics=1, stemEnglishPossessive=1, luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0, catenateNumbers=0}

I understand from other messages on the mailing list that I should not have preserveOriginal on the query side, but I have not yet changed it.

If your position numbers really are what you indicated, you may have found a bug. I have not tried the released 4.0.0 version, I expect to deploy from the 4.x branch under development.

Thanks,
Shawn

Reply via email to