On Sun, Jan 06, 2002 at 08:06:38PM +0100, Holger Rauch wrote: | Hi! | | I want to substitute element content by entity references in UTF-8 encoded | XML files using Perl. My script currently only works with ISO 8859 | encodings. Is there a module that can be used in Perl scripts that | correctly reads and writes files according to specified encoding? If so, | what's the name of it and where can I obtain it from? | | Additional info that might be helpful: | | I'm not using a DOM module to retrieve an element's contents, just | ordinary regexps.
So the regexps you're using are in a 8859-n source file, right? Can perl handle UTF-8 source files? Are you trying to use things like the posix character class [:alpha:]? I don't think those will handle all alphabetic characters in all unicode supported languages (probably just ascii/english alphabet). I don't know much about perl, but with cpython you have to decode the the string you read in to get the unicode string in memory, and also specify your source code string literals as unicode strings. (cpython doesn't yet support non-ascii source encodings, but jython does) HTH, -D -- He who belongs to God hears what God says. The reason you do not hear is that you do not belong to God. John 8:47