On 7/12/2014 12:31 PM, Waylan Limberg wrote:
On Jul 12, 2014, at 2:52 PM, Michel Fortin <[email protected]> wrote:
[snip]
When you have a question like this, just try it Babelmark 2:
http://johnmacfarlane.net/babelmark2/?normalize=1&text=%3Cdiv%3E
Yes, that's what we all do. And to answer your other question, notice that only
two of the implementations on Babelmark2 failed. Remember, most of these
implementations were written to be run on web servers. We can't have our web
servers crashing just because a user submitted invalid markdown. What a parser
doesn't understand is just passes through. What it misunderstands is garbles
but it is specifically designed to never choke.
As Michel alluded to, most parsers are simply a series of regular expression
substitutions which are run in a predetermined order. If a regex never matches
a part of the text, then that part passes through untouched. Yes, that means
the HTML is parsed by regex - which we all know is a bad idea -- but it is not
really parsed in the way that browsers parse HTML. The regex just finds
anything surrounded by angle brackets and ignores it. With the exception of the
limited block level stuff, we don't even care if there are opening and/or
closing tags. Yes, that can result in improperly nested stuff, but that is the
authors fault and the parser should not bring the whole server down for that.
The Author can (should?) preview in a browser and fix it before publishing.
However, I should point out that while the above describes most parsers (as
most are more or less direct ports of markdown.pl - which works this way),
there are a few that use other methods under the hood. For example, a few
generate a parse tree which is then fed into a renderer (I believe Pandoc works
like that, which allows it to output many more formats than just HTML), but
they are the rare exception.
I see.
Here is a real-world example of what I was citing:
http://johnmacfarlane.net/babelmark2/?text=Hello+I+am+some+*text*.%0A%3Cdiv%3EHello+%3Ca+href%3D%22http%3A%2F%2Fwww.example.com%2F%22%3Ethat+is+nice%3C%2Fa%3E+chance+%26+circumstance%26hellip%3B%0A%0AThe+end.
Truly, it looks like there is great diversity in Markdown-land.
Ok, so any standard mentioning Historical Markdown cannot say that any
particular behavior is normative when it comes to HTML validity. Some
check for HTML (island) validity and behave differently; others don't.
The end...I guess.
Sean
_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss