-=| Mathieu ROY, 05.06.2015 14:34:42 +0200 |=- > Ok, so after further testing, it turns out that if I change the coding of the > string from UTF-8 to ISO-8859..., it encode to the proper entities.
This is because in the absence of explicit encoding statement the perl interpreter consider the source text to be encoded in Latin1. >From 'perldoc encoding', "Implicit upgrading for byte strings" By default, if strings operating under byte semantics and strings with Unicode character data are concatenated, the new string will be created by decoding the byte strings as ISO 8859-1 (Latin-1). The encoding pragma changes this to use the specified encoding instead. (Although note that the encoding pragma is deprecated. Better use the utf8 pragma and encode your source as UTF-8). > I obviously can adjust the script to pre convert UTF-8 to ISO-8859 > but it should be at least documented (but I dont see any reason why > encode_entities should actually not be able to deal with UTF-8) encode_entities deals with whatever the perl interpreter supplies. And the perl interpreter needs your help in determining the meaning of the byte sequence you feed it with. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org