Martin v. Löwis wrote: > Sam Ruby wrote: >> If we can agree on the behavior, I would be glad to write up a patch. >> >> It seems to me that the simplest way to proceed would be for the code >> that attempts to resolve character references (both named and numeric) >> in attributes to be isolated in a single method. Subclasses that desire >> different behavior (including the existing Python 2.4 and prior >> behaviour) could simply override this method. > > In SGML, this is problematic: The named things are not character > references, they are entity references, and it isn't necessarily > the case that they expand to a character. For example, &author; > might expand to "Martin v. Löwis", and &logo; might refer to a > bitmap image which is unparsed. > > That said, providing a overridable replacement function sounds > like the right approach. To keep with tradition, I would still > distinguish between character references and entity references, > i.e. providing two overridable functions instead. Returning > None could mean that no replacement is available. > > As for default implementations, I think they should do what > currently happens: entity references are replaced according to > entitydefs, character references are replaced to bytes if > they are smaller than 256. > > Contrary to what others said, it appears that SGML *does* > support hexadecimal character references, provided that > the SGML declaraction contains the HCRO definition (which, > for HTML and XML, is defined as HCRO "&#x"). So it seems > safe to process hex character references by default (although > it isn't safe to assume Unicode, IMO).
I don't see why expanding to multiple characters is a problem. Just so that we have a tracking number and real code to anchor this discussion, I've opened the following and attached a patch: http://python.org/sf/1504676 This implementation does handle multiple character expansions. It does default to exactly what the current code does. It does *not* currently handle hexadecimal character references. It also does pass all the current sgmllib tests, though I did not include any additional tests in this initial patch. - Sam Ruby _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com