I don't have enough knowledge to tell you "Oh, just do this, and your Emacs issues will be solved." but I can give some hints as to what these characters are, so perhaps others can say, or you can direct your Google searches in a more focused manner.
I believe those are Unicode characters, and ones called "supplementary" characters, and thus due to the way Java and Clojure store strings are stored (which is in an encoding called UTF-16 [1]), they require 2 consecutive 16-bit Java chars to represent. These don't get tested as often as characters in the BMP (Basic Multilingual Plane -- this covers the most common characters used, and require only a single Java char to store), in most software. When it comes to copying and pasting them between applications, or sending them across debug sockets, every piece of software along the way gets its own chance to muck things up. Likely a day will come when software that doesn't handle these things properly will be rare, but I don't think we are there yet. The particular characters you gave as an example appear to be Unicode characters with these code points: U+1F60A "SMILING FACE WITH SMILING EYES" U+1F60F "SMIRKING FACE" I am not sure if these are considered Emoji [2] characters or not, but I have heard that these characters are getting popular in Twitter, phone text messages, and a few other places. I found that out by saving the web page as an HTML file, opening that file in Emacs, moving the cursor over those characters, pressing C-x =. At the bottom of the window doing so shows this for the first character: Char: <empty rectangle> (128522, #o373012, #x1f60a, file ...) point=26267 of 54273 (48%) columns=91 The empty rectangle is because the font I was using didn't include a glyph for this character. the 3 numbers in parentheses are the decimal, octal, and hex value of the Unicode code point -- I copies the hex value above and looked up its name in this file: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt There are other web sites that let you search for these things, too, and see them graphically in the browser window even if you don't have the proper fonts installed. I don't know if it is only supplementary characters that will cause you problems, but if so, you can detect strings containing them with a function like this: (defn contains-supp? "Returns true if the string or CharSequence s contains supplementary characters, outside the Basic Multilingual Plane. Returns false if the string only contains characters in the BMP. For Java/Clojure strings, which are encoded in UTF-16, a string contains supplementary characters only if the string contains at least one surrogate code unit in the range U+D800 through U+DFFF." [^CharSequence s] (if (first (filter #(<= (int Character/MIN_SURROGATE) % (int Character/MAX_SURROGATE)) (map int s))) true false)) You can leave out the "(if" and "true false)" if you don't mind getting back nil for false and non-nil for true, which Clojure if/when/etc. all interpret as false (nil), or true (any value other than nil or false). Andy [1] http://en.wikipedia.org/wiki/UTF-16 [2] http://en.wikipedia.org/wiki/Emoji On Mon, Jan 16, 2012 at 7:58 AM, joachim <[email protected]> wrote: > Dear All, > > I'm not sure if this is the right place to ask, but I am experiencing > a strange and rather annoying problem, probably in the interaction > between clojure and emacs. > > Basically, I have to deal with strings. Sometimes the strings contain > non-standard characters (I do not know the nature of these characters > myself). Here is an example string, with the non-standard characters > printed as squares: > > "Michelle Obama is a Capricorn !!! Jan 17th. 😊😏 ---> > http://t.co/1moZ4IUZ" > > When I run clojure from a terminal and input the above string there is > no problem: > > joachim@joachim-HP-EliteBook-8440p:~/opt/clojure-1.3.0$ java -jar > clojure-1.3.0.jar > Clojure 1.3.0 > user=> "Michelle Obama is a Capricorn !!! Jan 17th. 😊😏 ---> > http://t.co/1moZ4IUZ" > "Michelle Obama is a Capricorn !!! Jan 17th. 😊😏 ---> > http://t.co/1moZ4IUZ" > user=> > > However, when I try the same in an emacs repl, I get "Lisp connection > closed unexpectedly: connection broken by remote peer". I have no idea > what is going on or how to deal with this problem. Sometimes during > development I like to print the strings to see what is going on, but > this also causes the connection to close. > > I would also be happy if I could recognize "problematic" strings, so > that I can skip them when printing, thus avoiding the problem > (although this would not really be a solution), > > Any ideas? > > Joachim. > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to [email protected] > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en
