Hi,

I am looking for test data with accented and multibyte characters. I have found 
a good resource that I could use to cobble something together 
(http://www.inter-locale.com/whitepaper/learn/learn-to-test.html) but I was 
hoping somebody knows some ready resource.

I also have some questions about encoding. In the code below, is there a 
difference between unicode() and .decode?
s = "§ÇǼÍÍ"
x = unicode(s, "utf-8")
y = s.decode("utf-8")
x == y # returns True

Also, is it, at least theoretically, possible to mix different encodings in 
byte strings? I'd say no, unless there are multiple BOMs or so. Not that I'd 
like to try this, but it'd improve my understanding of this sort of obscure 
topic.


Cheers!!

Albert-Jan



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have 
the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to