Dominik created LUCENE-10290: -------------------------------- Summary: analysis-stempel incorrect tokens generation for numbers Key: LUCENE-10290 URL: https://issues.apache.org/jira/browse/LUCENE-10290 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 8.7 Environment: **Elasticsearch version** 7.11.2:
**Plugins installed**: [analysis-stempel] **OS version** CentOS Reporter: Dominik {*}Actual{*}: I observed unexpected behaviour. Some numbers are affected by stemmer. It causes wrong search results. For example "2021" -> "20ć". {*}Expected{*}: string numbers should not be changed. {*}Reproduce{*}: Issue can be reproduced with elasticsearch: request: {code:json} POST _analyze { "tokenizer": "standard", "filter": ["polish_stem"], "text": "2021" } {code} response: {code:json} { "tokens": [ { "token": "20ć", "start_offset": 0, "end_offset": 4, "type": "<NUM>", "position": 0 } ] } {code} I suspect the newer versions are also affected, but I don't have possibility to verify it. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org