On Thu, May 18, 2006 at 01:37:06AM +0300, Eugeniy Meshcheryakov wrote: > 17 травня 2006 о 23:54 +0200 Jens Seidel написав(-ла): > > > > The only problem I could imagine is that SGML will not or wrongly > > > > complain about > > > > invalid characters. I have to check this. > > > > > > > > > -DESCSET 128 32 UNUSED > > > > > +DESCSET 128 32 32 > > > > > @@ -23,10 +23,7 @@ > > > > > SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 > > > > > 10 11 12 13 14 15 16 17 18 19 > > > > > 20 21 22 23 24 25 26 27 28 29 > > > > > - 30 31 127 128 129 > > > > > - 130 131 132 133 134 135 136 137 138 139 > > > > > - 140 141 142 143 144 145 146 147 148 149 > > > > > - 150 151 152 153 154 155 156 157 158 159 > > > > > + 30 31 127 > > > > > Second part tells sgml processor to not ignore characters in range > > > 128-159. > > > > > > So effect of those two parts is - sgml processor handles characters with > > > codes 128-159 as usuall (allowed) characters. > > > > OK. But 0-31 and 127 are still rejected, right? > > I assume these numbers to not refer to UTF-8 characters but to single > > bytes. This makes UTF-8 characters consisting of two bytes with a second > > byte of this range invalid!? Can you confirm this? > > > Characters with codes 0x0..0x7f (0..127) are the same as in ASCII, they
I know. > cannot be found in sequences that correspond to other characters. So if > they are not currently needed, they are not needed for UTF-8 support too. You are right (but I referred to bytes in a UTF-8 multibyte character *not* to characters). I assumed in the past that a UTF-8 character is represented by * 0xxxxxxx (ASCII, 1 byte only) or * 1xxxxxxx xxxxxxxx (not ASCII, two bytes) and worried about ASCII characters (such as <, > which have a special meaning in SGML) in the second byte. But according to the table in http://de.wikipedia.org/wiki/UTF-8 that's wrong and non-ASCII characters in UTF-8 are never represented with ASCII bytes. The same as you explained ... Great! I need to test the patch in more detail, but will probably commit it soon. PS: I wonder why you do not use capitalisation of subsection, paragraph, ... (розділ, параграф) as for chapter, appendix, ... But I'm sure you have good reasons. Jens