Hello everyone,
I'm trying to do some sentence-level searching with Solr; basically, I
want to find if two words are in the same sentence. As I read on the
Lucene mailing list, there's many ways to do this, including but not
limited to :
-inserting special boundary terms to denote the start and end of a
sentence. It is unclear to me what kind of query should be used to fetch
results from within one sentence (something like: start_sentence_token
word1 word2 end_sentence_token)?
-increase token position at a sentence boundary by a large factor
(1000?) so that "x y"~500 (or more) won't match across sentence boundaries.
Is there an existing filter class that I could use to do this, or should
I first parse my text fields with PHP and some NLP tool, and index the
result (for the first case)? For the second case (increment token
position), how should I do this within Solr?
Is there any plans to implement such functionality as standard?
Thanks for the help,
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212