rmuir commented on issue #11976:
URL: https://github.com/apache/lucene/issues/11976#issuecomment-1327969322

   yes, normally composed/decomposed (NFC vs NFD) does not change tokenization. 
so you may do it before or after, doesn't matter.
   
   but compatibility characters like this don't really work well in unicode for 
text processing: they are just really for compatibility/round-trip. you have to 
apply NFKC/D first before you can really do anything with them. Maybe for now, 
normalize documents before you send them to elasticsearch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to