Adding to Mike's comments, for this specific query, one can see that both words stem to "illeg": http://localhost:8983/solr/select?q=illegible+illegal&debugQuery=on
You could fix this specific case with either configuring protected words on the stemmer, or by using the synonym filter and mapping one of the alternatives to something that won't be stemmed (but the former is probably a better option). More generally, some have noted that Lucene (and hence Solr) would benefit from the option of a "weaker" stemmer. -Yonik On 2/8/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
On 2/8/07, Michael Kimsal <[EMAIL PROTECTED]> wrote: > Hello Solr friends: > > Mr. Klaas - I've not tested your patch yet (will try to get to it soon) but > I've found almost the opposite problem now and people are questioning > how/why things are happening as they are. > > I'm searching for the word "illegal" and the query results are coming back > with an entry that has "illegible" in it. "illegible" is highlighted as > well. I'm strictly searching for "illegal" - no modifiers (well, a + to > indicate I have to have it, but no ~ modifier). > > I'm a newb to all this, so please bear with me. I'm using the standard > 'text' field schema definition in the default 'schema.xml' to index this > field data. Does that account for partial and/or soundalike matches by > default? Yes, the default text field contains EnglishPorterFilterFactory, which "stems" english words. This normalizes pluralization and tense, but can also result in confusion as you have noted. The "default" fields in the example schema.xml should be treated as examples and not as canonical field definitions. If you decide to use one or all for your own application, it is important to understand what they are going (the comments in the file as well as the analysis screen in Solr adminui are good tools for this). -Mike