Hope this is the right list to ask this, not sure if this is a bug or if
I'm doing something wrong.

We're running some text with some emojis through this filter and if I'm
reading the code right when it finds a U+203C (:bangbang: | double
exclamation) it replaces that with an appropriate !! ASCII characters, but
if its a "fully qualified" emoji then it also includes U+FE0E after, which
is a zero length "VARIATION SELECTOR-16".

The issue we are running into is that the emoji is replaced with !! like it
should be, but then directly after the ASCII !! there is this character
that's just hanging out now because it's not matched or changed into
anything. This causes some weird behavior down the line in other filters
and trying to strip off punctuation, for some reason it doesn't seem to be
detected as punctuation anymore. Ultimately we are trying to get down to an
array of meaningful tokens out of the content, but we are getting certain
emoji's all the way through the filters and we aren't sure why these ones
that are ASCII folded are making it through, where the ones that aren't are
filtered out like normal.

Thanks,
Jarett

Reply via email to