Chris Hostetter wrote:
A couple of things make your question really hard to answer ... first off,
you can specify differnet analyser chains for index time and query time --
shen dealing with the WordDelim filter (or the synonym fitler) this is
frequently neccessary -- so the ansers to your questions really depend on
wether you use WordDelim at both index time and query time (or if you do
use it in both cases, but configure it differnetly)
For clarification, I'm using the filter both at index and query time.
Have you by any chance played with the "Analysis" page on your Solr index?
http://localhost:8983/solr/admin/analysis.jsp?name=&verbose=on&highlight=on&qverbose=on&
...it makes it really easy to see exactly how your various fields will get
parsed at index time and query time. I would also suggest you use the
"debugQuery=on" option when doing some searches -- even if there aren't
nay documents in your index, that will help you see how your query is
getting parsed and what Query structure QueryParser is building based on
the tokens it gets from each of hte Anaalyzers.
Will try that, played with it in the past, but not for this particular
problem, good idea :)
: My thought is to parse the user query and rephrase it to do phrase
: searches on nearby terms containing single letters / numbers. If an user
: search for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR
: ("1 hepatitis" AND hiv). Is it a sensible solution?
that's kind of a strange behavior for a search application to have ... you
might just wnat to trust that your users will be smart and if they find
that 'HIV 1 hepatitis' is matching docs where "1" doesn't appear near
"HIV" or "hepatitis" then they will start entering '"HIV 1" hepatitis" (or
'HIV "1 hepatits"' if that's what they ment.)
Sadly I can't rely on users smartness for this :) I have concerns that
for stuff like Hepatitis A, it will match just about every document
containing hepatitis and the very common 'a' word, anywhere in the
document. I can't stopword single letters, cause then there would be no
way to find documents about 'hepatitis c' and not about 'hepatitis b'
for example. I will test my solution and report; if you have any other
ideas, just tell me.
And thanks for the help! :)