Also, check out MappingCharFilterFactory in Solr 1.4
and mapping-ISOLatin1Accent.txt in example/solr/conf

-Yonik
http://www.lucidimagination.com


On Thu, Jul 2, 2009 at 9:27 AM, Michael Lackhoff<mich...@lackhoff.de> wrote:
> In Germany we have a strange habbit of seeing some sort of equivalence
> between Umlaut letters and a two letter representation. Example 'ä' and
> 'ae' are expected to give the same search results. To achieve this I
> added this filter to the "text" fieldtype definition:
>        <filter class="solr.PatternReplaceFilterFactory"
>                pattern="ä" replacement="ae" replace="all"
>        />
> to both index and query analyzers (and more for the other umlauts).
>
> This works well when I search for a name (a word not stemmed) but not
> e.g. with the word "Wärme".
> search for 'wärme' works
> search for 'waerme' does not work
> search for 'waerm' works if I move the EnglishPorterFilterFactory after
> the PatternReplaceFilterFactory.
>
> DebugQuery for "waerme" gives a parsedquery FS:waerm.
> What I don't understand is why the (existing) records are not found. If
> I understand it right, there should be 'waerm' in the index as well.
>
> By the way, the reason why I keep the EnglishPorterFilterFactory is that
> the records are in many languages and the English stemming gives good
> results in many cases and I don't want (yet) to multiply my fields to
> have language specific versions.
> But even if the stemming is not right because the language is not
> English I think records should be found as long as the analyzers are the
> same for index and query.
>
> This is with Solr 1.3.
>
> Can someone shed some light on what is going on and how I can achieve my
> goal?
>
> -Michael
>

Reply via email to