Your "query" analyzer should not have preserveOriginal="1". You should have
separate "index" and "query" analyzers; they may be almost identical, but
the "query" analyzer must not have preserveOriginal="1" so that it generate
a clean sequence of terms that were indexed in that exact order.
-- Jack Krupansky
-----Original Message-----
From: Jay Luker
Sent: Tuesday, October 23, 2012 10:16 AM
To: solr-user
Subject: WordDelimiterFilter preserveOriginal & position increment
Hi,
I'm having an issue with the WDF preserveOriginal="1" setting and the
matching of a phrase query. Here's an example of the text that is
being indexed:
"...obtained with the Southern African Large Telescope,SALT..."
A lot of our text is extracted from PDFs, so this kind of formatting
junk is very common.
The phrase query that is failing is:
"Southern African Large Telescope"
From looking at the analysis debugger I can see that the WDF is
getting the term "Telescope,SALT" and correctly splitting on the
comma. The problem seems to be that the original term is given the 1st
position, e.g.:
Pos Term
1 Southern
2 African
3 Large
4 Telescope,SALT <-- original term
5 Telescope
6 SALT
Only by adding a phrase slop of "~1" do I get a match.
I realize that the WDF is behaving correctly in this case (or at least
I can't imagine a rational alternative). But I'm curious if anyone can
suggest an way to work around this issue that doesn't involve adding
phrase query slop.
Thanks,
--jay