Adding to Mike's comments, for this specific query, one can see that
both words stem to "illeg":
http://localhost:8983/solr/select?q=illegible+illegal&debugQuery=on

You could fix this specific case with either configuring protected
words on the stemmer, or by using the synonym filter and mapping one
of the alternatives to something that won't be stemmed (but the former
is probably a better option).

More generally, some have noted that Lucene (and hence Solr) would
benefit from the option of a "weaker" stemmer.

-Yonik

On 2/8/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
On 2/8/07, Michael Kimsal <[EMAIL PROTECTED]> wrote:
> Hello Solr friends:
>
> Mr. Klaas - I've not tested your patch yet (will try to get to it soon) but
> I've found almost the opposite problem now and people are questioning
> how/why things are happening as they are.
>
> I'm searching for the word "illegal" and the query results are coming back
> with an entry that has "illegible" in it.  "illegible" is highlighted as
> well.  I'm strictly searching for "illegal" - no modifiers (well, a + to
> indicate I have to have it, but no ~ modifier).
>
> I'm a newb to all this, so please bear with me.  I'm using the standard
> 'text' field schema definition in the default 'schema.xml' to index this
> field data.  Does that account for partial and/or soundalike matches by
> default?

Yes, the default text field contains EnglishPorterFilterFactory, which
"stems" english words.  This normalizes pluralization and tense, but
can also result in confusion as you have noted.

The "default" fields in the example schema.xml should be treated as
examples and not as canonical field definitions.  If you decide to use
one or all for your own application, it is important to  understand
what they are going (the comments in the file as well as the analysis
screen in Solr adminui are good tools for this).

-Mike

Reply via email to