On 10/11/2012 04:40 AM, eryksun wrote: > On Wed, Oct 10, 2012 at 9:23 PM, boB Stepp <robertvst...@gmail.com> wrote: > . >> What is the intended use of byte types? > > bytes objects are important for low-level data processing, such as > file and socket I/O. The fundamental addressable value in a computer > is a byte (at least for all common, modern computers). When you write > a string to a file or socket, it has to be encoded as a sequence of > bytes. > > <SNIP> > > Another common encoding is UTF-8. This maps each code to 1-4 bytes,
Actually, the upper limit for a decoded utf-8 character is at least 6 bytes. I think it's 6, but it's no less than 6. > without requiring a BOM (though the 3-byte BOM 0xefbbbf can be used > when saving to a file). Since ASCII is so common, and since on many > systems backward compatibility with ASCII is required, UTF-8 includes > ASCII as a subset. In other words, codes below 128 are stored > unmodified as a single byte. Non-ASCII codes are encoded as 2-4 bytes. > See the UTF-8 Wikipedia article for the details: > > http://en.wikipedia.org/wiki/UTF-8#Description This shows cases for up to 6 bytes. > <snip> Three other thing worth pointing out: 1) Python didn't define all these byte formats. These are standards which exist outside of the python world, and Python lets you coexist with them. If you want to create a text file that can be seen properly by an editor that only supports utf-8, you can't output UCS-4 and expect it to come up with anything but gibberish. 2) There are many more byte formats, most of them predating Unicode entirely. Many of these are specific to a particular language or national environment, and contain just those extensions to ASCII that the particular language deems useful. Python provides encoders and decoders to many of these as well. 3) There are many things read and written in byte format that have no relationship to characters. The notion of using text formats for all data (eg. xml) is a fairly recent one. Binary files are quite common, and many devices require binary transfers to work at all. So byte strings are not necessarily strings at all. -- DaveA _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor