Re: WordDelimiterFilter preserveOriginal & position increment

Jack Krupansky Tue, 23 Oct 2012 07:40:48 -0700

Your "query" analyzer should not have preserveOriginal="1". You should haveseparate "index" and "query" analyzers; they may be almost identical, butthe "query" analyzer must not have preserveOriginal="1" so that it generatea clean sequence of terms that were indexed in that exact order.


-- Jack Krupansky

-----Original Message-----From: Jay Luker

Sent: Tuesday, October 23, 2012 10:16 AM
To: solr-user
Subject: WordDelimiterFilter preserveOriginal & position increment

Hi,

I'm having an issue with the WDF preserveOriginal="1" setting and the
matching of a phrase query. Here's an example of the text that is
being indexed:

"...obtained with the Southern African Large Telescope,SALT..."

A lot of our text is extracted from PDFs, so this kind of formatting
junk is very common.

The phrase query that is failing is:

"Southern African Large Telescope"

From looking at the analysis debugger I can see that the WDF is

getting the term "Telescope,SALT" and correctly splitting on the
comma. The problem seems to be that the original term is given the 1st
position, e.g.:

Pos  Term
1      Southern
2      African
3      Large
4      Telescope,SALT  <-- original term
5      Telescope
6      SALT

Only by adding a phrase slop of "~1" do I get a match.

I realize that the WDF is behaving correctly in this case (or at least
I can't imagine a rational alternative). But I'm curious if anyone can
suggest an way to work around this issue that doesn't involve adding
phrase query slop.

Thanks,

--jay

Re: WordDelimiterFilter preserveOriginal & position increment

Reply via email to