yin Lin created LUCENE-9308:
-------------------------------

             Summary: tokenizer supports preserving delimiters
                 Key: LUCENE-9308
                 URL: https://issues.apache.org/jira/browse/LUCENE-9308
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: yin Lin


currently there s no way to preserve the delimiter in tokenizer, because the 
basic tokenizer like CharTokenizer ignore them.

this s to make the basic tokenizer more customizable 

e.g. "mac_book_pro" -> [mac_, book_, pro]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to