Re: Sentence level searching

Yonik Seeley Sun, 12 Nov 2006 18:32:38 -0800

On 11/12/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:

I'm trying to do some sentence-level searching with Solr; basically, I
want to find if two words are in the same sentence. As I read on the
Lucene mailing list, there's many ways to do this, including but not
limited to :


-inserting special boundary terms to denote the start and end of a
sentence. It is unclear to me what kind of query should be used to fetch
results from within one sentence (something like: start_sentence_token
word1 word2 end_sentence_token)?


Span queries... but there isn't really query parser support for them.

-increase token position at a sentence boundary by a large factor
(1000?) so that "x y"~500 (or more) won't match across sentence boundaries.


That's probably the easiest and simplest.

Is there an existing filter class that I could use to do this, or should
I first parse my text fields with PHP and some NLP tool, and index the
result (for the first case)? For the second case (increment token
position), how should I do this within Solr?


Solr puts a configurable gap between values of the same field, so you
could index every sentence as a separate value of a multi-valued
field.

A better solution would be to have a tokenizer that could detect the
end of sentences and either insert a gap or a special token that
another filter could act on.

-Yonik

Re: Sentence level searching

Reply via email to