Also, check out MappingCharFilterFactory in Solr 1.4 and mapping-ISOLatin1Accent.txt in example/solr/conf
-Yonik http://www.lucidimagination.com On Thu, Jul 2, 2009 at 9:27 AM, Michael Lackhoff<mich...@lackhoff.de> wrote: > In Germany we have a strange habbit of seeing some sort of equivalence > between Umlaut letters and a two letter representation. Example 'ä' and > 'ae' are expected to give the same search results. To achieve this I > added this filter to the "text" fieldtype definition: > <filter class="solr.PatternReplaceFilterFactory" > pattern="ä" replacement="ae" replace="all" > /> > to both index and query analyzers (and more for the other umlauts). > > This works well when I search for a name (a word not stemmed) but not > e.g. with the word "Wärme". > search for 'wärme' works > search for 'waerme' does not work > search for 'waerm' works if I move the EnglishPorterFilterFactory after > the PatternReplaceFilterFactory. > > DebugQuery for "waerme" gives a parsedquery FS:waerm. > What I don't understand is why the (existing) records are not found. If > I understand it right, there should be 'waerm' in the index as well. > > By the way, the reason why I keep the EnglishPorterFilterFactory is that > the records are in many languages and the English stemming gives good > results in many cases and I don't want (yet) to multiply my fields to > have language specific versions. > But even if the stemming is not right because the language is not > English I think records should be found as long as the analyzers are the > same for index and query. > > This is with Solr 1.3. > > Can someone shed some light on what is going on and how I can achieve my > goal? > > -Michael >