In article <[email protected]>, rusi <[email protected]> wrote:
> On Apr 3, 6:43 pm, Roy Smith <[email protected]> wrote: > > This has to inspect the entire string, no? I posted (essentially) this > > a few days ago: > > > > if all(ord(c) <= 0xffff for c in s): > > return "it's all bmp" > > else: > > return "it's got astral crap in it" > > Astral crap? CRAP? > Verily sir I am offended! > [...] > You are American! This is true. But, to be fair, in the (I don't have the exact number here) roughly 200 million records in our recent big data import job, I found exactly FOUR strings with astral characters. Which boiled down to two versions of each of two different song titles. One had a Unicode Character 'BALLOON' (U+1F388). The other had some heart symbol (sorry, I don't remember the exact code point). These hardly seem a matter of national pride. And, if you don't believe there is astral crap, how do you explain U+1F4A9?
-- http://mail.python.org/mailman/listinfo/python-list
