[issue7626] Entity references without semicolon in HTMLParser

2010-01-05 Thread R. David Murray
R. David Murray added the comment: w3m (a text mode browser) does not treat the é without the ; as an entity ref (it puts é literally into the display), while firefox does turn it into an eacute with or without the ;. I'm sure somebody somewhere has a table listing which browsers have what b

[issue7626] Entity references without semicolon in HTMLParser

2010-01-05 Thread Florent Xicluna
Florent Xicluna added the comment: For the record, this is valid HTML 4.01 Strict: Sample La clé La clé des champs La clé des champs Tested with http://validator.w3.org/check and Mozilla Firefox 3.5.6 Reference: http://www.is-thought.co.uk/book/sgml-6.htm#General But HTML5 should pro

[issue7626] Entity references without semicolon in HTMLParser

2010-01-05 Thread Stefan Schweizer
Stefan Schweizer added the comment: I do not think that the semicolon can be omitted here, because it is not at a line break or immediately before a tag, it is in the middle of a paragraph. Anyway, I guess I have to live with the decision in issue500073. Also I could not find an 'unknown_enti

[issue7626] Entity references without semicolon in HTMLParser

2010-01-05 Thread Florent Xicluna
Florent Xicluna added the comment: It is a documented behavior. http://bip.cnrs-mrs.fr/bip10/scowl.htm#semi Quoted from issue500073: "If you want to process such a document in a specific way, I recommend to subclass HTMLParser, overriding unknown_entityref." -- nosy: +flox resolution:

[issue7626] Entity references without semicolon in HTMLParser

2010-01-03 Thread Ezio Melotti
Changes by Ezio Melotti : -- nosy: +ezio.melotti ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue7626] Entity references without semicolon in HTMLParser

2010-01-03 Thread Stefan Schweizer
New submission from Stefan Schweizer : HTMLParser should only handle entity references that are terminated with a semicolon. I know that the semicolon can be omitted in some cases (http://www.w3.org/TR/html4/charset.html#h-5.3) and that some browsers are more tolerant, but the following exampl