You might try a German stemmer. English gets a small benefit from stemming,
maybe 5%. German is more heavily inflected than English, so may get a bigger
improvement.
German search usually needs wordbreaking, so that Orgelmusik can be split
into Orgel and Musik. To get that, you will probably need
On 02.07.2009 17:28 Erick Erickson wrote:
> I'm shooting a bit in the dark here, but I'd guess that these are
> actually understandable results.
Perhaps not too much in the dark
> That is your implicit assumption, it seems to me, is that'wärme' and
> 'waerme' should go through the stemmer and
>
Also, check out MappingCharFilterFactory in Solr 1.4
and mapping-ISOLatin1Accent.txt in example/solr/conf
-Yonik
http://www.lucidimagination.com
On Thu, Jul 2, 2009 at 9:27 AM, Michael Lackhoff wrote:
> In Germany we have a strange habbit of seeing some sort of equivalence
> between Umlaut lette
On 02.07.2009 16:34 Walter Underwood wrote:
> First, don't use an English stemmer on German text. It will give some odd
> results.
I know but at the moment I only have the choice between no stemmer at
all and one stemmer and since more than half of the records are English
(about 60% English, 30%
I'm shooting a bit in the dark here, but I'd guess that these are
actually understandable results.
If you replace then stem, the stemming algorithm
works on the exact same word. And you got the
results you expect.
If you stem then replace, the inputs are different to thestemmer, so the
fact that
First, don't use an English stemmer on German text. It will give some odd
results.
Are you using the same conversions on the index and query side?
The German stemmer might already handle "typewriter umlauts". If it doesn't,
use the pattern replace factory. You will also need to convert "ß" to "ss