Re: strange problem

Andy Fingerhut Wed, 18 Jan 2012 11:24:14 -0800

I don't have some code lying around to do that, but I might make one.  The
name strings would require several megabytes of storage, but as long as you
don't mind that...

In the mean time, I have perhaps the next best thing: a function
escape-supp that replaces these supplementary characters with strings like
<U+XXXXXX>, where XXXXXX is the hexadecimal code point for the
supplementary character.  It also does something similar if a string has an
unpaired surrogate character (i.e. a leading surrogate not followed by a
trailing surrogate, or a trailing surrogate not preceded by a leading
surrogate), which should never appear in a valid Unicode string encoded as
UTF-16.  As long as your software can handle all of the basic multilingual
plane characters, they should be able to handle strings returned by
escape-supp.  It is pretty easy to modify it to escape some or all
characters in the basic multilingual plane, too, if you wish.

Once you have a code point, I've found that doing a Google search for
"unicode code point 1f47c" often does the trick for looking up the name.

Link to code (search for escape-supp among the other functions there):

https://github.com/jafingerhut/text.unicode/blob/master/src/com/fingerhutpress/text/unicode.clj

Example:

user=> (def s1 (str "smily \ud83d\ude03  I like comedian Tim Hawkins
\ud83d\udc7c"))
#'user/s1
user=> (escape-supp s1)
"smily <U+01F603>  I like comedian Tim Hawkins <U+01F47C>"

Andy

(Tim Hawkins has a comedy routine where he describes eating Krispy Kreme
donuts as "It's like eating baby angels."  I recently happened across
Unicode character with code point 1F47C by accident.)

On Tue, Jan 17, 2012 at 2:20 AM, joachim <[email protected]> wrote:

> Thanks a lot Andy!
>
> I am using your function now to catch "bad" cases and it works. Not
> really a solution as I said, but I am very happy that at least I can
> continue now with this problem out of sight :-)
>
> Actually, from reading your response I did think of what would be even
> better, namely a function that replaces these UTF-16 characters in a
> string with their description (e.g. "SMILING FACE WITH SMILLING EYES"
> etc.) I'll probably be able to figure out how to do that, but if you
> happen to have some code lying around for that it would definitely
> save me a lot of time.
>
> Anyway, thanks again!
> Joachim.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: strange problem

Reply via email to