Re: Re: Index & search questions; special cases

2006-11-13 Thread Mike Klaas
On 11/13/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Hello everyone, Thanks for all your answers; synonyms based approaches won't work because the medical / research field is evolving way too fast; it would Another approach is to extract the term explicitly. An easy-to-implement approach

Re: Index & search questions; special cases

2006-11-13 Thread Michael Imbeault
Hello everyone, Thanks for all your answers; synonyms based approaches won't work because the medical / research field is evolving way too fast; it would become unmaintainable very quickly, and the list would be huge. Anyway, I can't rely on score because I'm sorting by date, so I need to eli

Re: Index & search questions; special cases

2006-11-13 Thread Otis Gospodnetic
Indeed. CommonGrams.java in Nutch is the place to look. Otis - Original Message From: Erik Hatcher <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, November 13, 2006 2:08:51 PM Subject: Re: Index & search questions; special cases On Nov 13, 2006, at 1:51 PM, Chris Hos

Re: Index & search questions; special cases

2006-11-13 Thread Erik Hatcher
On Nov 13, 2006, at 1:51 PM, Chris Hostetter wrote: That reminds me ... i seem to remember someone saying once that Nutch lso builds word based n-grams out of it's stop words, so searches on "the" or "on" won't match anything because those words are never indexed as a single tokens, but if a

Re: Index & search questions; special cases

2006-11-13 Thread Yonik Seeley
On 11/13/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: The SynonymFilter could have the following config: hepatitis a, hepatitis_a Oops, the synonyms should be reversed like so: hepatitis_a, hepatitis a so that when expand="false" for querying, hepatitis a is mapped to hepatitis_a -Yonik

Re: Index & search questions; special cases

2006-11-13 Thread Yonik Seeley
On 11/12/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: - Somewhat related : Let's say I index "Polymyxin B". If I stopword single letters, would a phrase search ("Polymyxin B") still find the right documents (I don't think so, but still)? If not, I'll have to index single letters; how do I prev

Re: Index & search questions; special cases

2006-11-13 Thread Chris Hostetter
: > Sadly I can't rely on users smartness for this :) I have concerns that : > for stuff like Hepatitis A, it will match just about every document : > containing hepatitis and the very common 'a' word, anywhere in the : > document. I can't stopword single letters, cause then there would be no : >

Re: Index & search questions; special cases

2006-11-13 Thread Yonik Seeley
On 11/13/06, Walter Underwood <[EMAIL PROTECTED]> wrote: Another approach is to implement protected phrases, similar to the protected words in stemming. These would be protected from stopword processing. One could use the synonym filter (which can handle multi-token synonyms) to get this effect

Re: Index & search questions; special cases

2006-11-13 Thread Walter Underwood
On 11/12/06 8:52 PM, "Michael Imbeault" <[EMAIL PROTECTED]> wrote: > Sadly I can't rely on users smartness for this :) I have concerns that > for stuff like Hepatitis A, it will match just about every document > containing hepatitis and the very common 'a' word, anywhere in the > document. I can't