Hello again,

- Let's say I index "HIV-1" with <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1"/>. Would a search on HIV AND 1 (or even HIV-1, which after parsing by the above filter would yield HIV1 or HIV 1) also find documents which have HIV and the number "1" somewhere in the document, but not directly after HIV? If so, how should I fix this? I could boost score by proximity, but I'm doing a sort on date anyway, so I guess it would be pointless to do so.

- Somewhat related : Let's say I index "Polymyxin B". If I stopword single letters, would a phrase search ("Polymyxin B") still find the right documents (I don't think so, but still)? If not, I'll have to index single letters; how do I prevent the same problem as in the first question (i.e., a search on Polymyxin B yielding documents with Polymyxin and B, but not close to one another).

My thought is to parse the user query and rephrase it to do phrase searches on nearby terms containing single letters / numbers. If an user search for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR ("1 hepatitis" AND hiv). Is it a sensible solution?

Thanks,

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212

Reply via email to