Matthew Garrett wrote:
Eh? A BOM (Byte Order Marker) is only needed where there's confusion about what the byte order is. It's needed for UTF-16 (which is a fairly decent demonstration of why UTF-16 is a Bad Thing), but not UTF-8. There may be desirable to mark a file as being in UTF-8, but calling it a BOM is just wrong.
Well, character U+FEFF has the name "ZERO WIDTH NO-BREAK SPACE" and the aliases "BYTE ORDER MARK (BOM)" and "ZWNBSP", see http://www.unicode.org/charts/PDF/UFE70.pdf Encoding this character in UTF-8 gives the byte sequence \xef\xbb\xbf, which is the byte sequence that this package looks for. I usually call this byte sequence "UTF-8 signature", but the character that this byte sequence represents is a proper Unicode character, and it happens to be called BOM. See also the UTF and BOM FAQ published by the Unicode consortium, at http://www.unicode.org/unicode/faq/utf_bom.html#29 It says "Yes, UTF-8 can contain a BOM." It ends with "the use of a BOM will interfere with [...] the use of "#!" of at the beginning of Unix shell scripts." which is the issue that this package addresses. Regards, Martin