Package: clojure Version: 1.10.2-1 Severity: normal The clojure(1) command-line interface is documented to take the filename of a script to execute. Here's how well it understands filenames:
$ echo '(println "sadness")' > 'L?on.clj' $ echo '(println "ennui")' > $'L\xef\xbf\xbdon.clj' $ echo '(println "joy")' > $'L\xe9on.clj' $ LC_ALL=C clojure $'L\xe9on.clj' sadness $ LC_ALL=de_DE.utf8 clojure $'L\xe9on.clj' ennui $ LC_ALL=de_DE.iso88591 clojure $'L\xe9on.clj' joy $ In this example I've created three different script files, with different but related names, I've told clojure(1) three times to run one of them, and it's run a different file on each invocation. In the invocations that ran the wrong files, one can see via strace(1) that clojure(1) doesn't use the correct filename for any file operations at all; it completely substitutes the erroneous filename. The use of the wrong filename doesn't depend on the file of that wrong name existing: if there is no file of that name then clojure(1) will fail to find any script to execute, and will generate an error message that shows the erroneous filename. How clojure(1) mangles the script filename on each invocation depends on the locale implied by the environment, and more specifically on the character encoding nominated by the LC_CTYPE component of the locale. Nominating a locale that's not installed behaves the same as nominating the C locale. Part of my example above depends on having the de_DE.utf8 and de_DE.iso88591 locales installed. If you don't have the specific locales that I used then you can get the same results as me by substituting an installed locale that nominates the same encoding. This bug occurs whenever the supplied filename doesn't have the syntax of locale-nominated encoding of text, containing only Unicode codepoints of which clojure approves. The nature of the manglement is that each subsequence of octets that doesn't look like valid encoded text gets replaced with one or more instances of a substitute sequence that is valid encoded text. Where the locale-nominated encoding is ASCII, the substitute is '?'. Where the locale-nominated encoding is UTF-8, the substitute is the three octets $'\xef\xbf\xbd', which is the UTF-8 encoding of U+fffd "replacement character". It appears that the supplied filename is being decoded, according to the locale-nominated encoding, with decoding errors muffled and a replacement character (U+fffd or "?") silently substituted in, and then the lossily-decoded filename is re-encoded according to the locale-nominated encoding, and the result of that process is the filename that gets actually used. As far as I can see the manglement comes only from character decoding: there isn't also any Unicode normalisation. This could cause a security problem in some circumstances that are only slightly strange. Suppose a privileged program is using clojure(1) to run scripts that partly derive from untrusted user input. Suppose it runs with environment settings in which the implied locale nominates an encoding that isn't an 8-bit single-byte encoding. Suppose the program has created an innocuous script to run, has permitted an untrusted user to determine part of the filename for that script, and has ensured that the supplied filename is innocuous from a Unix point of view but isn't preventing the use of filenames with high-half octets. Suppose further that an untrusted user can cause the same program to create another file, of content that the program doesn't intend to execute, under the mangled name, which is equally innocuous from a Unix point of view. Then it could execute code determined by a malicious user, due entirely to clojure(1) misinterpreting a filename. I'm not aware of any specific program that can be exploited in this way, and I haven't based the declared severity of this bug report on this security issue. Preferably, clojure(1) should use the script file of the name that was supplied on the command line. It must pass to the file syscalls the same octet string that was supplied as a command line argument, without assuming anything about its syntax. If it cannot be made to handle arbitrary filenames correctly, then clojure(1) must at least detect that it can't handle the specified filename. It must signal an error on any filename it can't handle, and not use any mangled form of the filename for any purpose. Furthermore, this limitation must be documented. -zefram