English and French are messy, so heuristic methods are the only possible. Spanish is rigorously clean, and stemming should be done from the declension rules and irregular conjugation tables. This involves large (fast) tables in ram rather than small (slow) string-shuffling.
Lance Norskog -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bertrand Delacretaz Sent: Thursday, September 20, 2007 8:11 AM To: solr-user@lucene.apache.org Subject: Re: Strange behavior when searching with accents On 9/20/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: > ...Betrand, does the French Snowball work fine?... I've seen some weirdnesses, like "tennis" and "tenir" (means to hold) both stemmed to "ten", but in all of our (simple) tests it was ok. The application where we're using it does not require high precision though, so it looked good enough and we didn't do create very extensive tests for it. -Bertrand