[ 
https://issues.apache.org/jira/browse/DOXIA-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855066#comment-15855066
 ] 

Wolfgang Illmeyer commented on DOXIA-542:
-----------------------------------------

Yes there is. Unicode is about *characters*, not about glyphs. See Unicode p2.2 
section »Characters, not glyphs«. The first of your linked article completely 
misses this distinction. An apostrophe character conveys the semantics of an 
apostrophe as used in many languages (see e.g. 
https://en.wikipedia.org/wiki/Apostrophe: »The apostrophe looks the same as a 
closing single quotation mark, although they have different meanings.«). 
Notice, how the character is called »APOSTROPHE« and not »Prime«, nor »Single 
quotation mark« and how the the »RIGHT SINGLE QUOTATION MARK« is not called 
»Apostrophe«. The second hand quote of the Unicode standard in the other 
article is of course valid, but it is about *typesetting*, not storage (»When 
text is set, […]«). The recommendation about using a single quote is only 
relevant, when you are a browser or a DTP package, not for semantics bearing 
Unicode-Text (even if it is in HTML).

At the very least, this »feature« has to be optional, preferably off by 
default. And I have high hopes, that the upcoming pegdown replacement may not 
do this kind of damage to my documentation at all.

> Markdown module converts all apostrophes to quotation marks
> -----------------------------------------------------------
>
>                 Key: DOXIA-542
>                 URL: https://issues.apache.org/jira/browse/DOXIA-542
>             Project: Maven Doxia
>          Issue Type: Bug
>          Components: Module - Markdown
>    Affects Versions: 1.4, 1.7
>            Reporter: Wolfgang Illmeyer
>              Labels: close-pending
>
> Whenever there is some text in a markdown file containing an apostrophe 
> (U+0027, e.g. »don't«), it is seemingly unconditionally replaced by a »right 
> single quotation mark« (U+2019).
> The problem seems to be an out-of-whack »smart« feature of the underlying 
> pegdown library, which is supposed to perform all kinds of typographic black 
> magic. I'd suggest disabling that (or at least make it configurable), because 
> apostrophes are not quotation marks and modern keyboard layouts have all the 
> fancy typographic characters such as different length dashes, ellipses, and 
> all sorts of quotation marks already easily available.
> The fix is relatively trivial:
> {code}
> --- 
> a/doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
> +++ 
> b/doxia-modules/doxia-module-markdown/src/main/java/org/apache/maven/doxia/module/markdown/MarkdownParser.java
> @@ -70,7 +70,7 @@ public class MarkdownParser
>       * The {@link PegDownProcessor} used to convert Pegdown documents to 
> HTML.
>       */
>      protected static final PegDownProcessor PEGDOWN_PROCESSOR =
> -        new PegDownProcessor( Extensions.ALL & ~Extensions.HARDWRAPS, 
> Long.MAX_VALUE );
> +        new PegDownProcessor( Extensions.ALL & ~Extensions.HARDWRAPS & 
> ~Extensions.SMARTYPANTS, Long.MAX_VALUE );
>  
>      /**
>       * Regex that identifies a multimarkdown-style metadata section at the 
> start of the document
> {code}
> But this makes some tests fail and I didn't have the time to figure out how 
> to fix them.
> Also, the resulting apostrophes probably need to be escaped in the HTML.
> I tested the patch with 1.7.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to