Xavier Sanchez Loro created LUCENE-10248:
--------------------------------------------

             Summary: Add SpanishPluralStemFilter
                 Key: LUCENE-10248
                 URL: https://issues.apache.org/jira/browse/LUCENE-10248
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 9.0
            Reporter: Xavier Sanchez Loro


We propose a new Spanish stemmer just for stemming plural to singular whilst 
maintaining gender: the SpanishPluralStemmer. Our goal is to provide a 
lightweight algorithmic approach with better precision and recall than current 
approaches.

In the following 
[article|https://medium.com/inside-wallapop/spanish-plural-stemmer-matching-plural-and-singular-forms-in-spanish-using-lucene-93e005e38373]
 we made a comparison of different Spanish Stemmers and use cases and which 
value adds our contribution

Our Solution is an algorithmic approach Spanish rules for building plural forms
based on rules defined in [wikilengua| 
http://www.wikilengua.org/index.php/Plural_(formaci%C3%B3n)]

Some characteristics:
 * Designed to stem just plural to singular form
 * Distinguishes between masculine and feminine forms
 * It will increase recall but precision can be reduced depending on the use 
case/information need
 * Stems plural words of foreign origin: i.e. complots, bits, punks, robots
 * Support for invariant words: same plural and singular form or plural does 
not make sense: i.e. crisis, jueves, lapsus, abrebotellas, etc
 * Support for special cases: i.e. yoes, clubes, itemes, faralaes
 * Use it when the distinction between singular and plural is not relevant but 
gender is relevant
 * Produces meaningful tokens in form of singular
 ** Not strange stems like “amig”: it’s true that stemmers must not generate 
grammatically correct tokens, but if we generate correct stems we decrease the 
possibility of collisions with other words



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to