Index & search questions; special cases

Michael Imbeault Sun, 12 Nov 2006 15:55:26 -0800

Hello again,

- Let's say I index "HIV-1" with <filterclass="solr.WordDelimiterFilterFactory" generateWordParts="1"generateNumberParts="1" catenateWords="1" catenateNumbers="1"catenateAll="1"/>. Would a search on HIV AND 1 (or even HIV-1, whichafter parsing by the above filter would yield HIV1 or HIV 1) also finddocuments which have HIV and the number "1" somewhere in the document,but not directly after HIV? If so, how should I fix this? I could boostscore by proximity, but I'm doing a sort on date anyway, so I guess itwould be pointless to do so.

- Somewhat related : Let's say I index "Polymyxin B". If I stopwordsingle letters, would a phrase search ("Polymyxin B") still find theright documents (I don't think so, but still)? If not, I'll have toindex single letters; how do I prevent the same problem as in the firstquestion (i.e., a search on Polymyxin B yielding documents withPolymyxin and B, but not close to one another).

My thought is to parse the user query and rephrase it to do phrasesearches on nearby terms containing single letters / numbers. If an usersearch for HIV 1 hepatitis, I'd rewrite it as ("HIV 1" AND hepatitis) OR("1 hepatitis" AND hiv). Is it a sensible solution?


Thanks,

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212

Index & search questions; special cases

Reply via email to