Re: EnglishPorterFilterFactory and PatternReplaceFilterFactory

2009-07-02 Thread Walter Underwood
You might try a German stemmer. English gets a small benefit from stemming, maybe 5%. German is more heavily inflected than English, so may get a bigger improvement. German search usually needs wordbreaking, so that Orgelmusik can be split into Orgel and Musik. To get that, you will probably need

Re: EnglishPorterFilterFactory and PatternReplaceFilterFactory

2009-07-02 Thread Michael Lackhoff
On 02.07.2009 17:28 Erick Erickson wrote: > I'm shooting a bit in the dark here, but I'd guess that these are > actually understandable results. Perhaps not too much in the dark > That is your implicit assumption, it seems to me, is that'wärme' and > 'waerme' should go through the stemmer and >

Re: EnglishPorterFilterFactory and PatternReplaceFilterFactory

2009-07-02 Thread Yonik Seeley
Also, check out MappingCharFilterFactory in Solr 1.4 and mapping-ISOLatin1Accent.txt in example/solr/conf -Yonik http://www.lucidimagination.com On Thu, Jul 2, 2009 at 9:27 AM, Michael Lackhoff wrote: > In Germany we have a strange habbit of seeing some sort of equivalence > between Umlaut lette

Re: EnglishPorterFilterFactory and PatternReplaceFilterFactory

2009-07-02 Thread Michael Lackhoff
On 02.07.2009 16:34 Walter Underwood wrote: > First, don't use an English stemmer on German text. It will give some odd > results. I know but at the moment I only have the choice between no stemmer at all and one stemmer and since more than half of the records are English (about 60% English, 30%

Re: EnglishPorterFilterFactory and PatternReplaceFilterFactory

2009-07-02 Thread Erick Erickson
I'm shooting a bit in the dark here, but I'd guess that these are actually understandable results. If you replace then stem, the stemming algorithm works on the exact same word. And you got the results you expect. If you stem then replace, the inputs are different to thestemmer, so the fact that

Re: EnglishPorterFilterFactory and PatternReplaceFilterFactory

2009-07-02 Thread Walter Underwood
First, don't use an English stemmer on German text. It will give some odd results. Are you using the same conversions on the index and query side? The German stemmer might already handle "typewriter umlauts". If it doesn't, use the pattern replace factory. You will also need to convert "ß" to "ss