Hello, I am trying to configure Solr to index Spanish documents and I've found some problems with the Spanish stemmer. I have a basic install using Tomcat.
I suspect that the Spanish stemmer isn't working very well. The site http://snowball.tartarus.org/algorithms/spanish/stemmer.html shows a sample of Spanish vocabulary with the stemmed forms that will be generated with the algorithm. I tried with several of them and I didn't get the same result. For example: the site says that the term "chicas" is stemmed as "chic". However, in my project, the term "chicas" is stemmed as "chica" (I can see it using Luke - Lucene Index Toolbox). I don't realize where the problem is. Here is a fragment of my schema.xml file: <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" languange="Spanish"/> </analyzer> Please, if someone can provide me any information related to this I would be very grateful. Thanks in advance, Darien ________________________________ Universidad Central "Marta Abreu" de Las Villas. http://www.uclv.edu.cu - VI Conferencia Internacional Medio Ambiente Siglo XXI. Universidad Central de Las Villas, del 3 al 6 de noviembre de 2009. http://eventos.fim.uclv.edu.cu/masxxi - IV Conferencia Internacional de ECOMATERIALES. Hotel Sierra Maestra. Bayamo, del 24 al 27 de noviembre de 2009 - Universidad 2010, La Habana, del 8 al 12 de febrero de 2010. http://www.universidad2010.cu