Dominik created LUCENE-10290:
--------------------------------

             Summary: analysis-stempel incorrect tokens generation for numbers
                 Key: LUCENE-10290
                 URL: https://issues.apache.org/jira/browse/LUCENE-10290
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 8.7
         Environment: **Elasticsearch version** 7.11.2:

**Plugins installed**: [analysis-stempel]

**OS version** CentOS
            Reporter: Dominik


{*}Actual{*}:
I observed unexpected behaviour. Some numbers are affected by stemmer. It 
causes wrong search results.
For example "2021" -> "20ć".

{*}Expected{*}:
string numbers should not be changed.

{*}Reproduce{*}:

Issue can be reproduced with elasticsearch:

request:
{code:json}
POST _analyze
{
  "tokenizer": "standard",
  "filter": ["polish_stem"],
  "text": "2021"
}
{code}
response:
{code:json}
{
  "tokens": [
    {
      "token": "20ć",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<NUM>",
      "position": 0
    }
  ]
}
{code}

I suspect the newer versions are also affected, but I don't have possibility to 
verify it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to