I don't have some code lying around to do that, but I might make one. The name strings would require several megabytes of storage, but as long as you don't mind that...
In the mean time, I have perhaps the next best thing: a function escape-supp that replaces these supplementary characters with strings like <U+XXXXXX>, where XXXXXX is the hexadecimal code point for the supplementary character. It also does something similar if a string has an unpaired surrogate character (i.e. a leading surrogate not followed by a trailing surrogate, or a trailing surrogate not preceded by a leading surrogate), which should never appear in a valid Unicode string encoded as UTF-16. As long as your software can handle all of the basic multilingual plane characters, they should be able to handle strings returned by escape-supp. It is pretty easy to modify it to escape some or all characters in the basic multilingual plane, too, if you wish. Once you have a code point, I've found that doing a Google search for "unicode code point 1f47c" often does the trick for looking up the name. Link to code (search for escape-supp among the other functions there): https://github.com/jafingerhut/text.unicode/blob/master/src/com/fingerhutpress/text/unicode.clj Example: user=> (def s1 (str "smily \ud83d\ude03 I like comedian Tim Hawkins \ud83d\udc7c")) #'user/s1 user=> (escape-supp s1) "smily <U+01F603> I like comedian Tim Hawkins <U+01F47C>" Andy (Tim Hawkins has a comedy routine where he describes eating Krispy Kreme donuts as "It's like eating baby angels." I recently happened across Unicode character with code point 1F47C by accident.) On Tue, Jan 17, 2012 at 2:20 AM, joachim <[email protected]> wrote: > Thanks a lot Andy! > > I am using your function now to catch "bad" cases and it works. Not > really a solution as I said, but I am very happy that at least I can > continue now with this problem out of sight :-) > > Actually, from reading your response I did think of what would be even > better, namely a function that replaces these UTF-16 characters in a > string with their description (e.g. "SMILING FACE WITH SMILLING EYES" > etc.) I'll probably be able to figure out how to do that, but if you > happen to have some code lying around for that it would definitely > save me a lot of time. > > Anyway, thanks again! > Joachim. > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to [email protected] > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en
