On 11/13/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
Hello everyone,
Thanks for all your answers; synonyms based approaches won't work
because the medical / research field is evolving way too fast; it would
Another approach is to extract the term explicitly. An
easy-to-implement approach
Hello everyone,
Thanks for all your answers; synonyms based approaches won't work
because the medical / research field is evolving way too fast; it would
become unmaintainable very quickly, and the list would be huge. Anyway,
I can't rely on score because I'm sorting by date, so I need to
eli
Indeed. CommonGrams.java in Nutch is the place to look.
Otis
- Original Message
From: Erik Hatcher <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 13, 2006 2:08:51 PM
Subject: Re: Index & search questions; special cases
On Nov 13, 2006, at 1:51 PM, Chris Hos
On Nov 13, 2006, at 1:51 PM, Chris Hostetter wrote:
That reminds me ... i seem to remember someone saying once that
Nutch lso
builds word based n-grams out of it's stop words, so searches on "the"
or "on" won't match anything because those words are never indexed
as a
single tokens, but if a
On 11/13/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
The SynonymFilter could have the following config:
hepatitis a, hepatitis_a
Oops, the synonyms should be reversed like so:
hepatitis_a, hepatitis a
so that when expand="false" for querying, hepatitis a is mapped to hepatitis_a
-Yonik
On 11/12/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
- Somewhat related : Let's say I index "Polymyxin B". If I stopword
single letters, would a phrase search ("Polymyxin B") still find the
right documents (I don't think so, but still)? If not, I'll have to
index single letters; how do I prev
: > Sadly I can't rely on users smartness for this :) I have concerns that
: > for stuff like Hepatitis A, it will match just about every document
: > containing hepatitis and the very common 'a' word, anywhere in the
: > document. I can't stopword single letters, cause then there would be no
: >
On 11/13/06, Walter Underwood <[EMAIL PROTECTED]> wrote:
Another approach is to implement protected phrases, similar to the
protected words in stemming. These would be protected from stopword
processing.
One could use the synonym filter (which can handle multi-token
synonyms) to get this effect
On 11/12/06 8:52 PM, "Michael Imbeault" <[EMAIL PROTECTED]>
wrote:
> Sadly I can't rely on users smartness for this :) I have concerns that
> for stuff like Hepatitis A, it will match just about every document
> containing hepatitis and the very common 'a' word, anywhere in the
> document. I can't