Hello again,
- Let's say I index "HIV-1" with <filter
class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="1"/>. Would a search on HIV AND 1 (or even HIV-1, which
after parsing by the above filter would yield HIV1 or HIV 1) also find
documents which have HIV and the number "1" somewhere in the document,
but not directly after HIV? If so, how should I fix this? I could boost
score by proximity, but I'm doing a sort on date anyway, so I guess it
would be pointless to do so.
- Somewhat related : Let's say I index "Polymyxin B". If I stopword
single letters, would a phrase search ("Polymyxin B") still find the
right documents (I don't think so, but still)? If not, I'll have to
index single letters; how do I prevent the same problem as in the first
question (i.e., a search on Polymyxin B yielding documents with
Polymyxin and B, but not close to one another).
My thought is to parse the user query and rephrase it to do phrase
searches on nearby terms containing single letters / numbers. If an user
search for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR
("1 hepatitis" AND hiv). Is it a sensible solution?
Thanks,
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212