On 7/9/2014 12:06 PM, Michel Fortin wrote:
Le 9-juil.-2014 à 11:49, Sean Leonard <[email protected]> a écrit :
The "flavor" parameter is a good idea in theory. [...] Nobody is going to 
annotate their file with the right flavor unless there's a tangible benefit[...]

[...] HTML never got anything like a "flavor" parameter in its MIME type, and 
even if it did it'd not have helped clear the mess in any way.

About this "flavors" thing. I know there are several lists floating out there of different Markdown implementations and variants (or if you don't like them being called Markdown, you can call them Illegitimate Sons of Markdown™). Which list is the most complete? Can someone show me (or make for the community) a really comprehensive list, and agree to update it?

When I wrote the -00 draft, I tried to follow the Media Type Registration Procedures. One requirement is to list required and optional parameters. Parameters are defined in RFC 6838 as "companion data". See RFC 6838 and in particular, Sections 1, 4.2.1, and 4.3.

All text/ types have at least one parameter: the charset. That is because all text data has to be interpreted according to a code (i.e., character set) that converts the bits of data into useful information. Nowadays we take Unicode (specifically UTF-8) for granted, but it's just not the case in reality. You can't just open a text file and hope for the best--you have to have /metadata/, express or implied, that tells you how to handle the blob of bits. The very fact that it is textual data has to be inferred from other things, such as the filename extension (when the data is in a file). A filename is just another piece of metadata.

When dealing with HTML, the charset could determined at least six ways:
1. as express external metadata, when the Content-Type has a charset parameter in the HTTP header. 2. as implied external metadata, when the HTTP header is absent but the client infers it from "other things" (e.g., the server, the IP address, or by looking at the ccTLD). 3. as express internal "metadata", with <meta charset="iso-2022-jp"> or <meta http-equiv="Content-Type" content="text/html; charset=iso-2022-jp">; or in the case of XHTML, <?xml version="1.0" encoding="iso-2022-jp"?>. 4. as express internal *data*, that is, the first bytes are 0xFF 0xFE (likely UTF-16LE), 0xFE 0xFF (likely UTF-16BE), or 0xEF 0xBB 0xBF (likely UTF-8). 5. as implied internal *data*, that is, "take the first 256 bytes and try to see if it decodes to something approximating HTML soup using some common character sets; if it fits, you quit". 6. as express user preference, that is, "I'm Japanese in Japan on a Windows machine, therefore on my browser, just assume everything is Shift-JIS".


See...there are all these crazy options...because nobody standardized on the character set when HTTP/HTML was developed; people assumed it was US-ASCII and then shoehorned lots of zany ways to make it something else.

At least with Markdown, we can probably safely eliminate #3 since Markdown is not intended to generate the <head> part of (X)HTML.

The operating question is: What metadata (companion data) is /necessary/ to reflect the creator's intent with respect to the data?

With Markdown, I think the answer is: you need the character set, and you need to know how to turn the text into HTML (or XHTML, PDF, RTF, MS Word/Office Open XML, or whatever).

Markdown has no way to communicate the character set in the document (other than the Unicode Byte Order Marks, which is a generalized property about text streams, not specific to Markdown)--and it would be counterproductive to invent one. So that is a perfect example of relevant metadata. And the second one, is how to turn it into something else that the author wants. If it's not communicated, it's going to be implied. Implied means "guessing" and likely "guessing wrong".

Hopefully this makes sense. I want to be more educated about this. Thanks!

Sean

_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to