[Python-Dev] Need help to fix HTTP Header Injection vulnerability
Hi, In May 2017, user "Orange" found a vulnerability in the urllib fix for CVE-2016-5699 (HTTP Header Injection vulnerability): https://bugs.python.org/issue30458 It allows to inject arbitrary HTTP headers. Copy of their message: """ Hi, the patch in CVE-2016-5699 can be broke by an addition space. http://www.cvedetails.com/cve/CVE-2016-5699/ https://hg.python.org/cpython/rev/bf3e1c9b80e9 https://hg.python.org/cpython/rev/1c45047c5102 import urllib, urllib2 urllib.urlopen('http://127.0.0.1\r\n\x20hihi\r\n :11211') urllib2.urlopen('http://127.0.0.1\r\n\x20hihi\r\n :11211') """ Last month, the same bug has been rediscovered by user "ragdoll.guo": https://bugs.python.org/issue36276 Almost one year after the bug has been reported, no one came with a solution. I'm not comfortable with having known security issues impacting HTTP. Can someone please have a look at the issue and try to write a change to fix the issue? According to Karthikeyan Singaravelan, the Go language fixed a similar issue in Go 1.12: throw an error if the URL contains any control character. If we decide that the issue is not a security issue, we should document the behavior properly and close the issue. See also this related issue: "urlopen URL with unescaped space" https://bugs.python.org/issue14826 Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] No longer enable Py_TRACE_REFS by default in debug build
Hi, When Python is built in debug mode, PyObject gets 2 new fields: _ob_prev and _ob_next. These fields change the offset of following fields in the PyObject structure and so breaks the ABI. I propose to modify the debug build (Py_DEBUG) to no longer imply Py_TRACE_REFS. Antoine Pitrou proposed this idea when the C API was discussed to get a stable ABI. https://bugs.python.org/issue36465 https://github.com/python/cpython/pull/12615 This change makes the debug build ABI closer to the release build ABI, but I am not sure how to compare these two ABI. Technically, C extensions still need to be recompiled. What do you think? -- I also wrote a PR to remove all code related to Py_TRACE_REFS: https://github.com/python/cpython/pull/12614 I don't think that it's right approach. I prefer to keep this special build around to see if anyone needs it, and wait one or two Python releases to decide what to do with it. Victor -- Night gathers, and now my watch begins. It shall not end until my death. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] No longer enable Py_TRACE_REFS by default in debug build
On 09Apr2019 0925, Victor Stinner wrote: This change makes the debug build ABI closer to the release build ABI, but I am not sure how to compare these two ABI. Technically, C extensions still need to be recompiled. What do you think? What are the other changes that would be required? And is there another way to get the same functionality without ABI modifications? I think it's worthwhile if we can really get to debug and non-debug builds being ABI compatible. Getting partway there in this case doesn't seem to offer any benefits. Cheers, Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Python Initialization API
Thanks for the replies. Anything I don't comment on means that I agree with you :) On 05Apr2019 0900, Victor Stinner wrote: Honestly, I'm not sure that we really have to distinguish "user error" and "internal error". It's an old debate about calling abort()/DebugBreak() or not. It seems like most users are annoyed by getting a coredump on Unix when abort() is called. I'm also annoyed by the crash reports on Windows when "encodings" cannot be found (because occasionally there are enough of them that the Windows team starts reviewing the issue, and I get pulled in to review and resolve their bugs). Maybe we should just remove Py_INIT_USER_ERR(), always use Py_INIT_ERR(), and never call abort()/DebugBreak() in Py_ExitInitError(). Not calling abort() sounds fine to me. Embedders would likely prefer an error code rather than a crash, but IIRC they'd have to call Py_ExitInitError() to get the crash anyway, right? Or does anyone see a good reason to trigger a debugger on an initialization error? Only before returning from the point where the error occurs. By the time you've returned the error value all the useful context is gone. Note: I'm not talking about Py_FatalError() here, this one will not change. Does this get called as part of initialization? If not, I'm fine with it not changing. Cheers, Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Python Initialization API
On 05Apr2019 0912, Victor Stinner wrote: About PyPreConfig and encodings. [...] * ``PyInitError Py_PreInitialize(const PyPreConfig *config)`` * ``PyInitError Py_PreInitializeFromArgs( const PyPreConfig *config, int argc, char **argv)`` * ``PyInitError Py_PreInitializeFromWideArgs( const PyPreConfig *config, int argc, wchar_t **argv)`` I hope to one day be able to support multiple runtimes per process - can we have an opaque PyRuntime object exposed publicly now and passed into these functions? I hesitated to include a "_PyRuntimeState*" parameter somewhere, but I chose to not do so. Currently, there is a single global variable _PyRuntime which has the type _PyRuntimeState. The _PyRuntime_Initialize() API is designed around this global variable. For example, _PyRuntimeState contains the registry of interpreters: you don't want to have multiple registries :-) I understood that we should only have a single instance of _PyRuntimeState. So IMHO it's fine to keep it private at this point. There is no need to expose it in the API. So I didn't want to expose that particular object right now, but just some sort of "void*" parameter in the new APIs (and require either NULL or a known value be passed). That gives us the freedom to enable multiple runtimes in the future without having to change the API shape. FYI I tried to design an internal API with a "context" to pass _PyRuntimeState, PyPreConfig, _PyConfig, the current interpreter, etc. [...] There are 2 possible implementations: * Modify *all* functions to add a new "context" parameter and modify *all* functions to pass this parameter to sub-functions. * Store the current "context" as a thread local variable or something like that. [...] For the second option: well, there is no API change needed! It can be done later. Moreover, we already have such API! PyThreadState_Get() gets the Python thread state of the current thread: the current interpreter can be accessed from there. Yes, this is what I had in mind as a transition. I think eventually it would be best to have the context parameter, as thread-local variables have overhead and add significant complexity (particularly when debugging crashes), but making that change is huge. ``PyPreConfig`` fields: * ``coerce_c_locale_warn``: if non-zero, emit a warning if the C locale is coerced. * ``coerce_c_locale``: if equals to 2, coerce the C locale; if equals to 1, read the LC_CTYPE to decide if it should be coerced. Can we use another value for coerce_c_locale to determine whether to warn or not? Save a field. coerce_c_locale is already complex, it can have 4 values: -1, 0, 1 and 2. I prefer keep a separated field. Moreover, I understood that you might want to coerce the C locale *and* get the warning, or get the warning but *not* coerce the locale. If we define meaningful constants, then it doesn't matter how many values it has. We could have PY_COERCE_LOCALE_AND_WARN, PY_COERCE_LOCALE_SILENTLY, PY_WARN_WITHOUT_COERCE etc. to represent the states. These actually make things simpler than trying to reason about how two similar parameters interact. * ``legacy_windows_fs_encoding`` (Windows only): if non-zero, set the Python filesystem encoding to ``"mbcs"``. * ``utf8_mode``: if non-zero, enable the UTF-8 mode Why not just set the encodings here? For different technical reasons, you simply cannot specify an encoding name. You can also pass options to tell Python that you have some preferences (PyPreConfig and PyConfig fields). Python doesn't support any encoding and encoding errors combinations. In practice, it only supports a narrow set of choices. The main implementation are Py_EncodeLocale() and Py_DecodeLocale() functions which uses the C codec of the current locale encoding to implement the filesystem encoding, before the codec implemented in Python can be used. Basically, only the current locale encoding or UTF-8 are supported. If you want UTF-8, enable the UTF-8 Mode. If we already had a trivial way to specify the default encodings as a string before any initialization has occurred, I think we would have made UTF-8 mode enabled by setting them to "utf-8" rather than a brand new flag. Again, we either have a huge set of flags to infer certain values at certain times, or we can just make them directly settable. If we make them settable, it's much easier for users to reason about what is going to happen. To load the Python codec, you need importlib. importlib needs to access the filesystem which requires a codec to encode/decode file names (PyConfig.module_search_paths uses Unicode wchar_t* strings, but the C API only supports bytes char* strings). Right, and the few places where we need an encoding *before* we can load any arbitrary ones we can easily compare the strings and fail if someone's trying to do something unusual (or if the platform can do the lookup itself, it could succeed). If we say "passing NULL means use the default" th
Re: [Python-Dev] New Python Initialization API
On 05Apr2019 0922, Victor Stinner wrote: While there are supporters of an "isolated Python" (sometimes called "system python"), the fact that it doesn't exist in any Linux distribution nor on any other operating system (Windows, macOS, FreeBSD), whereas it's already doable in Python 3.6 with Py_IsolatedFlag=1 makes me think that users like the ability to control Python with environment variables and configuration files. I would prefer to leave Python as not isolated by default. It's just a matter of command line arguments. Not for embedders it isn't. When embedding you need to do a whole lot of special things to make sure that your private version of Python doesn't pick up settings relating to a regular Python install. We should make the Python runtime isolated by default, and only (automatically) pick up settings from the environment in the Python binary. * The PEP 432 stores ``PYTHONCASEOK`` into the config. Do we need to add something for that into ``PyConfig``? How would it be exposed at the Python level for ``importlib``? Passed as an argument to ``importlib._bootstrap._setup()`` maybe? It can be added later if needed. Could we convert it into an xoption? It's very rarely used, to my knowledge. The first question is if there is any need for an embedder to change this option. Currently, importlib._bootstrap_external._install() reads the environment variable and it's the only way to control the option. I think the first question should be "is there any reason to prevent an embedder from changing this option". In general, the answer is going to be no. We should expose all the options we rely on to embedders, or else they're going to have to find workarounds. ... By the way, importlib reads PYTHONCASEOK environment varaible even if isolated mode is enabled (sys.flags.isolated is equal to 1). Is it a bug? :-) Yes, I think it's a bug. Perhaps this should become a proper configuration option, rather than a directly-read environment variable? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Python Initialization API
On 05Apr2019 0936, Victor Stinner wrote: For the PyMainConfig structure idea, I cannot comment at this point. I need more time to think about it. About the "path configuration" fields, maybe a first step to enhance the API would be to add the the following function: PyInitError PyConfig_ComputePath(PyConfig *config, const wchar *home); where home can be NULL (and PyConfig.module_search_paths_env field goes away: the function reads PYTHONPATH env var internally). Yes, I like this. Maybe pass PYTHONPATH value in as an "additional paths" parameter? Basically, this function would be the replacement for "Py_GetPath()" (which initializes paths to the defaults the first time it is called), and setting the path fields in PyConfig manually is the replacement for Py_SetPath() (or calling the various Py_Set*() functions to make the default logic infer the paths you want). Similarly, PyConfig_ComputeFromArgv() and/or PyConfig_ComputeFromEnviron() functions would also directly replace the magic we have scattered all over the place right now. It would also make it more obvious to the callers which values take precedence, and easier to see that there should be no side effects. I think it's easier to document as well. Cheers, Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability
I would recommend fixing it since it's potentially remote code execution on systems like Redis (latest versions of Redis have this mitigated) though I must admit I don't fully understand the complexity since there are multiple issues linked. Go was also assigned a CVE for linked issue and it seemed to be the same reporter by username : CVE-2019-9741 . I tried using go's approach in the commit but urlopen accepts more URLs like data URLs [0] that seemed to accept \n as a valid case and the patch broke some tests. Looking at the issue discussion complexity also involves backwards compatibility. golang also pushed an initial fix that seemed to broke their internal tests [0] to arrive at a more simpler fix. [0] https://github.com/python/cpython/blob/a40681dd5db8deaf05a635eecb91498dac882aa4/Lib/test/test_urllib.py#L482 [1] https://go-review.googlesource.com/c/go/+/159157/2#message-39c6be13a192bf760f6318ac641b432a6ab8fdc8 -- Regards, Karthikeyan S ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability
On Tue, Apr 9, 2019 at 4:45 PM Karthikeyan wrote: > I would recommend fixing it since it's potentially remote code execution > on systems like Redis (latest versions of Redis have this mitigated) though > I must admit I don't fully understand the complexity since there are > multiple issues linked. Go was also assigned a CVE for linked issue and it > seemed to be the same reporter by username : CVE-2019-9741 . I tried using > go's approach in the commit but urlopen accepts more URLs like data URLs > [0] that seemed to accept \n as a valid case and the patch broke some > tests. Looking at the issue discussion complexity also involves backwards > compatibility. golang also pushed an initial fix that seemed to broke their > internal tests [0] to arrive at a more simpler fix. > > [0] > https://github.com/python/cpython/blob/a40681dd5db8deaf05a635eecb91498dac882aa4/Lib/test/test_urllib.py#L482 > [1] > https://go-review.googlesource.com/c/go/+/159157/2#message-39c6be13a192bf760f6318ac641b432a6ab8fdc8 > > -- > Regards, > Karthikeyan S > useful references, thanks! limiting the checks to only http and https as those are the text based protocols with urls transmitted in text form makes sense and avoids the data: test failures. proposed simple fix in https://github.com/python/cpython/pull/12755 but tests are needed as is an audit of the code to see where else we may potentially need to do such things. -gps ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need help to fix HTTP Header Injection vulnerability
Thanks Gregory. I think it's a good tradeoff to ensure this validation only for URLs of http scheme. I also agree handling newline is little problematic over the years and the discussion over the level at which validation should occur also prolongs some of the patches. https://bugs.python.org/issue35906 is another similar case where splitlines is used but it's better to raise an error and the proposed fix could be used there too. Victor seemed to wrote a similar PR like linked one for other urllib functions only to fix similar attack in ftplib to reject newlines that was eventually fixed only in ftplib * https://bugs.python.org/issue30713 * https://bugs.python.org/issue29606 Search also brings multiple issues with one duplicate over another that makes these attacks scattered over the tracker and some edge case missing. Slightly off topic, the last time I reported a cookie related issue where the policy can be overriden by third party library I was asked to fix it in stdlib itself since adding fixes to libraries causes maintenance burden to downstream libraries to keep up upstream. With urllib being a heavily used module across ecosystem it's good to have a fix landing in stdlib that secures downstream libraries encouraging users to upgrade Python too. Regards, Karthikeyan S > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com