Hello everyone,

I'm trying to do some sentence-level searching with Solr; basically, I want to find if two words are in the same sentence. As I read on the Lucene mailing list, there's many ways to do this, including but not limited to :

-inserting special boundary terms to denote the start and end of a sentence. It is unclear to me what kind of query should be used to fetch results from within one sentence (something like: start_sentence_token word1 word2 end_sentence_token)? -increase token position at a sentence boundary by a large factor (1000?) so that "x y"~500 (or more) won't match across sentence boundaries.

Is there an existing filter class that I could use to do this, or should I first parse my text fields with PHP and some NLP tool, and index the result (for the first case)? For the second case (increment token position), how should I do this within Solr?

Is there any plans to implement such functionality as standard?

Thanks for the help,

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212

Reply via email to