Hi Stephan, Thanks a lot for your reply.
On Mon, 23 Jan 2017 10:26:09 +0100, Stephan Bergmann <[email protected]> wrote: > On 01/20/2017 03:25 AM, Takeshi Abe wrote: >> Preparing a patch for tdf#105382 [1], I come across a question about >> character encoding for the path part of a URL representing a >> com.sun.star.frame.XStorable's location. >> I wonder if the original (before percent-encoded) path of such a URL can >> be in an encoding other than UTF-8 or even in a different charset due >> to e.g. a code page of some legacy filesystems. >> Is it possible? >> And, if so, is there any reasonable way to tell the encoding? > > A conforming URL itself, by definition, is written with a subset of ASCII-only > characters. > > For file URLs, there never was a definition how to interpret the octets > encoded > in the URL's path component, so OOo/LO came up with the convention of always > interpreting those as UTF-8. (So any code that converts between file URLs and > native pathnames needs to do that mapping between UTF-8 and the relevant > native > pathname encoding, which LO assumes to be as reported by > osl_getThreadTextEncoding.) Got it. What should be done for tdf#105382 becomes clear now. IIUC the basic strategy to encode a file URL for UNO is the same as a current standard [1] describing in section "2.5. Identifying Data": > (...) A > system that internally provides identifiers in the form of a > different character encoding, such as EBCDIC, will generally perform > character translation of textual identifiers to UTF-8 [STD63] (or > some other superset of the US-ASCII character encoding) at an > internal interface, thereby providing more meaningful identifiers > than those resulting from simply percent-encoding the original > octets. [1] https://tools.ietf.org/html/rfc3986 Cheers, -- Takeshi Abe _______________________________________________ LibreOffice mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/libreoffice
