I'm shooting a bit in the dark here, but I'd guess that these are
actually understandable results.

If you replace then stem, the stemming algorithm
works on the exact same word. And you got the
results you expect.

If you stem then replace, the inputs are different to thestemmer, so the
fact that your outputs are different
isn't a surprise.

That is your implicit assumption, it seems to me, is that'wärme'  and
'waerme' should go through the stemmer and
become 'wärm'  and 'waerm', that you can then do the substitution
on and produce the same output. I don't think that's a valid
assumption.

You could probably check the actual contents of your index
with Luke and verify whether your assumptions are correct
or not

Best
Erick

On Thu, Jul 2, 2009 at 9:27 AM, Michael Lackhoff <mich...@lackhoff.de>wrote:

> In Germany we have a strange habbit of seeing some sort of equivalence
> between Umlaut letters and a two letter representation. Example 'ä' and
> 'ae' are expected to give the same search results. To achieve this I
> added this filter to the "text" fieldtype definition:
>        <filter class="solr.PatternReplaceFilterFactory"
>                pattern="ä" replacement="ae" replace="all"
>        />
> to both index and query analyzers (and more for the other umlauts).
>
> This works well when I search for a name (a word not stemmed) but not
> e.g. with the word "Wärme".
> search for 'wärme' works
> search for 'waerme' does not work
> search for 'waerm' works if I move the EnglishPorterFilterFactory after
> the PatternReplaceFilterFactory.
>
> DebugQuery for "waerme" gives a parsedquery FS:waerm.
> What I don't understand is why the (existing) records are not found. If
> I understand it right, there should be 'waerm' in the index as well.
>
> By the way, the reason why I keep the EnglishPorterFilterFactory is that
> the records are in many languages and the English stemming gives good
> results in many cases and I don't want (yet) to multiply my fields to
> have language specific versions.
> But even if the stemming is not right because the language is not
> English I think records should be found as long as the analyzers are the
> same for index and query.
>
> This is with Solr 1.3.
>
> Can someone shed some light on what is going on and how I can achieve my
> goal?
>
> -Michael
>

Reply via email to