And I closed the JIRA, see the comments. But the short form is that
it's not worth the effort because of the edge cases. Jack writes
up some of them; the short form is "what does stemming
do with terms like organiz* ". Sure, it would produce one token (which is
the main restriction on a MultiTermAware filter), but the output
might not be anything equivalent to the stem of "organization", maybe
not even "organize". Better to avoid that rat-hole, it seems like one of those
problems that could suck up enormous amounts of time and _still_ not
do what's expected.

If you _really_ want to try this, you could always define your own
"multiterm" analysis component that included the stemmer, see:
http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
But don't say I didn't warn you <G>...

Best
Erick

On Sun, Jun 3, 2012 at 8:25 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> Chiming in late here, just back from vacation. But off the top of my
> head, I don't see any reason SnowballPorterFilterFactory shouldn't
> be MultiTermAware.
>
> I've created https://issues.apache.org/jira/browse/SOLR-3503 as
> a placeholder.
>
> Erick
>
> On Fri, May 25, 2012 at 1:31 PM,  <spr...@gmx.eu> wrote:
>>> I don't know the specific rules in these specific stemmers,
>>> but generally a
>>> "less aggressive" stemming (e.g., "plural-only") of
>>> "paintings" would be
>>> "painting", while a "more aggressive" stemming would be
>>> "paint". For some
>>> "aggressive" stemmers the stemmed word is not even a word.
>>
>> Sounds logically :)
>>
>>> It would be nice to have doc with some example words for each stemmer.
>>
>> Absolutely!
>>
>> Thx alot!
>>

Reply via email to