Xavier Sanchez Loro created LUCENE-10248: --------------------------------------------
Summary: Add SpanishPluralStemFilter Key: LUCENE-10248 URL: https://issues.apache.org/jira/browse/LUCENE-10248 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 9.0 Reporter: Xavier Sanchez Loro We propose a new Spanish stemmer just for stemming plural to singular whilst maintaining gender: the SpanishPluralStemmer. Our goal is to provide a lightweight algorithmic approach with better precision and recall than current approaches. In the following [article|https://medium.com/inside-wallapop/spanish-plural-stemmer-matching-plural-and-singular-forms-in-spanish-using-lucene-93e005e38373] we made a comparison of different Spanish Stemmers and use cases and which value adds our contribution Our Solution is an algorithmic approach Spanish rules for building plural forms based on rules defined in [wikilengua| http://www.wikilengua.org/index.php/Plural_(formaci%C3%B3n)] Some characteristics: * Designed to stem just plural to singular form * Distinguishes between masculine and feminine forms * It will increase recall but precision can be reduced depending on the use case/information need * Stems plural words of foreign origin: i.e. complots, bits, punks, robots * Support for invariant words: same plural and singular form or plural does not make sense: i.e. crisis, jueves, lapsus, abrebotellas, etc * Support for special cases: i.e. yoes, clubes, itemes, faralaes * Use it when the distinction between singular and plural is not relevant but gender is relevant * Produces meaningful tokens in form of singular ** Not strange stems like “amig”: it’s true that stemmers must not generate grammatically correct tokens, but if we generate correct stems we decrease the possibility of collisions with other words -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org