Hi,

I'm having an issue with the WDF preserveOriginal="1" setting and the
matching of a phrase query. Here's an example of the text that is
being indexed:

"...obtained with the Southern African Large Telescope,SALT..."

A lot of our text is extracted from PDFs, so this kind of formatting
junk is very common.

The phrase query that is failing is:

"Southern African Large Telescope"

>From looking at the analysis debugger I can see that the WDF is
getting the term "Telescope,SALT" and correctly splitting on the
comma. The problem seems to be that the original term is given the 1st
position, e.g.:

Pos  Term
1      Southern
2      African
3      Large
4      Telescope,SALT  <-- original term
5      Telescope
6      SALT

Only by adding a phrase slop of "~1" do I get a match.

I realize that the WDF is behaving correctly in this case (or at least
I can't imagine a rational alternative). But I'm curious if anyone can
suggest an way to work around this issue that doesn't involve adding
phrase query slop.

Thanks,
--jay

Reply via email to