As I'm thinking about this, I have other questions:

Can a Markdown parser/processor fail? Is there a concept of Markdown validity--i.e., can Markdown content be invalid (from the perspective of Markdown, not (X)HTML)?

As I understand it:
A Markdown processor identifies Markdown control sequences (aka markdown, in lowercase) in a stream of text and converts these sequences to the target markup--namely (X)HTML. A Markdown processor identifies (X)HTML in markdown and passes this content to the target markup. <-- Do Markdown processors (i.e., existing implementations) attempt to fix or normalize the markup (by deserializing and then reserializing the markup), or is it a straight pass? It sounds like whether or not a Markdown processor reserializes the markup is implementation-dependent; Gruber's syntax rules do not say. However, if you have Markdown in the HTML content with markdown="1" as with PHP Markdown Extra, it is necessary to parse the HTML with something other than a straight HTML parser since the straight HTML parser will misinterpret the Markdown (e.g., & will be a validation error).


Therefore:
Markdown has no concept of markdown validity. A Markdown processor never fails due to invalid markdown input. If a sequence of text is not recognized as markdown (i.e., control sequences), it is treated as text and passed accordingly to the target markup. (This property is directly related to the "degradation" feature of Markdown, namely, if your processor cannot understand the markdown, the output is "worse" than an author intended, but does not cause utter failure--the non-understood markdown is visible in the output. This is in contrast to HTML, where tags or attributes that are not understood have no effect on the presentation of the HTML.)

Markdown may have a concept of HTML validity. A Markdown processor that identifies HTML in Markdown content may determine that the HTML is valid or invalid. For example, it may identify <div> ... [end of document] as HTML that is invalid because it lacks a closing </div> tag. Then, it has five choices: 1. treat the invalid HTML as text--pass the text-as-text to the markup (i.e., turn & into &amp; , < into &lt; , etc.) 2. treat the invalid HTML as Markdown--keep on processing the input and look for markdown inside of it (thus *hello* inside the invalid HTML will get marked up...and <div><a href="http://www.example.com/";>hello</a>[end of document] will become a real link with the literal text '<div>' preceding it) <-- this is the same behavior as "not identifying the text as HTML in the first place"
3. pass the invalid HTML as HTML
4. attempt to fix the HTML...thus <div><a href="http://www.example.com/";>hello</a>[end of document] might become <div><a href="http://www.example.com/";>hello</a></div>
5. fail due to HTML invalidity

?

Sean

_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to