Package: python-debian Version: 0.1.16 Severity: normal -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
hi, attached is a patch that does something similar to what i was suggesting earlier. i don't make claims that it covers all vectors for this problem, for example multi-valued fields might technically still have this problem but i don't know of any such fields that might have mixed latin-1 and/or utf-8 values and it's kinda a corner case to begin with, so i'm trying to keep the patch as non-intrusive as possible. the patch could be trivially extended for this later if it were necessary, i just don't want to get my paws dirty. basically, i add an optional 'encoding' parameter to the __getitem__ mixin function which by default has the previous behavior but can be overridden to supply an alternate encoding. beyond that there are just then a few points where the encoding parameter from dump() has to be carried through to get to the location where the underlying method is being called. i thought this was just a bit better than another option that came to mind (setting the object encoding temporarily and then setting it back). this latter option would make for a simpler patch but is both aesthetically poor and techically kinda sketchy, so i opted for the former. i've tested this on sources files from etch -> sid, and have had no problems. sean - -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 2.6.34-rc5minime-00802-g48f4092 (SMP w/2 CPU cores) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages python-debian depends on: ii python 2.5.4-9 An interactive high-level object-o ii python-support 1.0.8 automated rebuilding support for P Versions of packages python-debian recommends: ii python-apt 0.7.95 Python interface to libapt-pkg Versions of packages python-debian suggests: ii gpgv 1.4.10-3 GNU privacy guard - signature veri - -- no debconf information - -- debsums errors found: debsums: changed file /usr/share/pyshared/debian/deb822.py (from python-debian package) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFMHJ2UynjLPm522B0RAqEiAJ9EEO397COSdeuJn6RmYjzBJGr3gACdGaB6 W5zXhZCrogp0+6dNT4lN/Q4= =0cT3 -----END PGP SIGNATURE-----
--- /usr/lib/pymodules/python2.5/debian/deb822.py.orig 2010-06-19 12:11:09.000000000 +0200 +++ /usr/lib/pymodules/python2.5/debian/deb822.py 2010-06-19 12:21:57.000000000 +0200 @@ -164,7 +164,7 @@ self.__keys.add(key) self.__dict[key] = value - def __getitem__(self, key): + def __getitem__(self, key, encoding=None): key = _strI(key) try: value = self.__dict[key] @@ -176,7 +176,10 @@ if isinstance(value, str): # Always return unicode objects instead of strings - value = value.decode(self.encoding) + object_encoding = encoding + if not object_encoding: + object_encoding = self.encoding + value = value.decode(object_encoding) return value def __delitem__(self, key): @@ -352,14 +355,17 @@ # __repr__ is handled by Deb822Dict - def get_as_string(self, key): + def get_as_string(self, key, encoding=None): """Return the self[key] as a string (or unicode) The default implementation just returns unicode(self[key]); however, this can be overridden in subclasses (e.g. _multivalued) that can take special values. """ - return unicode(self[key]) + if not encoding: + return unicode(self[key]) + else: + return unicode(self.__getitem__(key, encoding=encoding)) def dump(self, fd=None, encoding=None): """Dump the the contents in the original format @@ -384,7 +390,7 @@ encoding = self.encoding for key in self.iterkeys(): - value = self.get_as_string(key) + value = self.get_as_string(key, encoding=encoding) if not value or value[0] == '\n': # Avoid trailing whitespace after "Field:" if it's on its own # line or the value is empty @@ -873,7 +879,7 @@ for line in filter(None, contents.splitlines()): updater_method(Deb822Dict(zip(fields, line.split()))) - def get_as_string(self, key): + def get_as_string(self, key, encoding=None): keyl = key.lower() if keyl in self._multivalued_fields: fd = StringIO.StringIO() @@ -901,7 +907,7 @@ fd.write("\n") return fd.getvalue().rstrip("\n") else: - return Deb822.get_as_string(self, key) + return Deb822.get_as_string(self, key, encoding) ###