Package: asciidoc Version: 8.6.7-1 1. Diagnostics
Asciidoc can convert inline markup texts. For example, the text --- [line-through]*some erased words* --- is converted into line striken form in the html/xhtml output (with the default asciidoc.css). However, for japanese utf-8 character whose third byte code is 0xa0, asciidoc fails to do it. For example, the japanese utf-8 character of the code E383A0 (the japanese KATAKANA character "mu") cannot be convert. That is, the text --- [line-through]*<utf-8 char of the code E383A0>* --- will be still remain. This is not only one example. It seems to happen for all utf-8 chars of the last byte if 0xa0. Also chinese and korean characters can cause same problem. The reason is that asciidoc doesn't convert the texts such --- [line-through]*This is a some text.\s* --- where "\s" means whitespace chars including all characters classified as space in the Unicode character properties database. The character of the code A0 is a such one(non-breaking space char). 2. A Patch The following patch seems to improve the problem. ---- *** /usr/bin/asciidoc 2012-03-31 16:45:59.000000000 +0900 --- /home/myhome/bin/asciidoc 2015-02-19 14:37:39.150689826 +0900 *************** *** 594,600 **** # enveloping quotes and punctuation e.g. a='x', ('x'), 'x', ['x']. reo = re.compile(r'(?msu)(^|[^\w;:}])(\[(?P<attrlist>[^[\]]+?)\])?' \ + r'(?:' + re.escape(lq) + r')' \ ! + r'(?P<content>\S|\S.*?\S)(?:'+re.escape(rq)+r')(?=\W|$)') pos = 0 while True: mo = reo.search(text,pos) --- 594,600 ---- # enveloping quotes and punctuation e.g. a='x', ('x'), 'x', ['x']. reo = re.compile(r'(?msu)(^|[^\w;:}])(\[(?P<attrlist>[^[\]]+?)\])?' \ + r'(?:' + re.escape(lq) + r')' \ ! + r'(?P<content>\S|\S.*?\S)(?:'+re.escape(rq)+r')(?=\W|$)', re.LOCALE) pos = 0 while True: mo = reo.search(text,pos) ---- But I don't know whether or not the above is the best one. The problem is closely related to the mode of space chars in the regular expressions. 3. Environments ---- # uname -a Linux yaya 3.2.0-4-686-pae #1 SMP Debian 3.2.65-1+deb7u1 i686 GNU/Linux # python --version Python 2.7.3 # asciidoc --version asciidoc 8.6.7 ---- koya
[line-through]*ム* [line-through]*加*