Re: How to Split Chinese Character with backslash representation?

Fredrik Lundh Thu, 26 Oct 2006 23:20:44 -0700

Wijaya Edward wrote:

> Since there are separator I need to include as delimiter
> Especially for the case like this:
> 
>>>> str = '\xc5\xeb\xc7\xd5\xbc--FOO--BAR'
>>>> field = list(str)
>>>> print field
> ['\xc5', '\xeb', '\xc7', '\xd5', '\xbc', '-', '-', 'F', 'O', 'O', '-', '-', 
> 'B', 'A', 'R']
> 
> What we want as the output is this instead:
> ['\xc5', '\xeb', '\xc7', '\xd5', '\xbc','FOO','BAR]


 >>> s = '\xc5\xeb\xc7\xd5\xbc--FOO--BAR'
 >>> re.findall("(?i)[a-z]+|[\xA0-\xFF]", s)
'\xd5', '\xbc', 'FOO', 'BAR']

the RE matches either a sequence of latin characters, *or* a single 
non-ASCII character.

you may want to adjust the character ranges to match the encoding you're 
using, and your definition of non-chinese words.

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to Split Chinese Character with backslash representation?

Reply via email to