Package: clojure
Version: 1.10.2-1
Severity: normal

The clojure(1) command-line interface is documented to take the filename
of a script to execute.  Here's how well it understands filenames:

$ echo '(println "sadness")' > 'L?on.clj'
$ echo '(println "ennui")' > $'L\xef\xbf\xbdon.clj'
$ echo '(println "joy")' > $'L\xe9on.clj'
$ LC_ALL=C clojure $'L\xe9on.clj'
sadness
$ LC_ALL=de_DE.utf8 clojure $'L\xe9on.clj' 
ennui
$ LC_ALL=de_DE.iso88591 clojure $'L\xe9on.clj'
joy
$

In this example I've created three different script files, with different
but related names, I've told clojure(1) three times to run one of them,
and it's run a different file on each invocation.  In the invocations that
ran the wrong files, one can see via strace(1) that clojure(1) doesn't
use the correct filename for any file operations at all; it completely
substitutes the erroneous filename.  The use of the wrong filename
doesn't depend on the file of that wrong name existing: if there is no
file of that name then clojure(1) will fail to find any script to execute,
and will generate an error message that shows the erroneous filename.

How clojure(1) mangles the script filename on each invocation depends
on the locale implied by the environment, and more specifically on the
character encoding nominated by the LC_CTYPE component of the locale.
Nominating a locale that's not installed behaves the same as nominating
the C locale.  Part of my example above depends on having the de_DE.utf8
and de_DE.iso88591 locales installed.  If you don't have the specific
locales that I used then you can get the same results as me by
substituting an installed locale that nominates the same encoding.

This bug occurs whenever the supplied filename doesn't have the syntax
of locale-nominated encoding of text, containing only Unicode codepoints
of which clojure approves.  The nature of the manglement is that each
subsequence of octets that doesn't look like valid encoded text gets
replaced with one or more instances of a substitute sequence that is
valid encoded text.  Where the locale-nominated encoding is ASCII,
the substitute is '?'.  Where the locale-nominated encoding is UTF-8,
the substitute is the three octets $'\xef\xbf\xbd', which is the UTF-8
encoding of U+fffd "replacement character".

It appears that the supplied filename is being decoded, according
to the locale-nominated encoding, with decoding errors muffled and
a replacement character (U+fffd or "?") silently substituted in,
and then the lossily-decoded filename is re-encoded according to the
locale-nominated encoding, and the result of that process is the filename
that gets actually used.  As far as I can see the manglement comes only
from character decoding: there isn't also any Unicode normalisation.

This could cause a security problem in some circumstances that are only
slightly strange.  Suppose a privileged program is using clojure(1) to
run scripts that partly derive from untrusted user input.  Suppose it
runs with environment settings in which the implied locale nominates an
encoding that isn't an 8-bit single-byte encoding.  Suppose the program
has created an innocuous script to run, has permitted an untrusted user
to determine part of the filename for that script, and has ensured that
the supplied filename is innocuous from a Unix point of view but isn't
preventing the use of filenames with high-half octets.  Suppose further
that an untrusted user can cause the same program to create another file,
of content that the program doesn't intend to execute, under the mangled
name, which is equally innocuous from a Unix point of view.  Then it could
execute code determined by a malicious user, due entirely to clojure(1)
misinterpreting a filename.  I'm not aware of any specific program that
can be exploited in this way, and I haven't based the declared severity
of this bug report on this security issue.

Preferably, clojure(1) should use the script file of the name that was
supplied on the command line.  It must pass to the file syscalls the
same octet string that was supplied as a command line argument, without
assuming anything about its syntax.

If it cannot be made to handle arbitrary filenames correctly, then
clojure(1) must at least detect that it can't handle the specified
filename.  It must signal an error on any filename it can't handle, and
not use any mangled form of the filename for any purpose.  Furthermore,
this limitation must be documented.

-zefram

Reply via email to