Creating a reliable sandboxed Python environment

2015-05-25 Thread davidfstr
I am writing a web service that accepts Python programs as input, runs the 
provided program with some profiling hooks, and returns various information 
about the program's runtime behavior. To do this in a safe manner, I need to be 
able to create a sandbox that restricts what the submitted Python program can 
do on the web server.

Almost all discussion about Python sandboxes I have seen on the internet 
involves selectively blacklisting functionality that gives access to system 
resources, such as trying to hide the "open" builtin to restrict access to file 
I/O. All such approaches are doomed to fail because you can always find a way 
around a blacklist.

For my particular sandbox, I wish to allow *only* the following kinds of 
actions (in a whitelist):
* reading from stdin & writing to stdout;
* reading from files, within a set of whitelisted directories;
* pure Python computation.

In particular all other operations available through system calls are banned. 
This includes, but is not limited to:
* writing to files;
* manipulating network sockets;
* communicating with other processes.

I believe it is not possible to limit such operations at the Python level. The 
best you could do is try replacing all the standard library modules, but that 
is again just a blacklist - it won't prevent a determined attacker from doing 
things like constructing their own 'code' object and executing it.

It might be necessary to isolate the Python process at the operating system 
level.
* A chroot jail on Linux & OS X can limit access to the filesystem. Again this 
is just a blacklist.
* No obvious way to block socket creation. Again this would be just a blacklist.
* No obvious way to detect unapproved system calls and block them.

In the limit, I could dynamically spin up a virtual machine and execute the 
Python program in the machine. However that's extremely expensive in 
computational time.

Has anyone on this list attempted to sandbox Python programs in a serious 
fashion? I'd be interested to hear your approach.

- David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Creating a reliable sandboxed Python environment

2015-05-28 Thread davidfstr
Thanks for the responses folks. I will briefly summarize them:

> As you say, it is fundamentally not possible to make this work at 
the Python level.

This is pretty effectively demonstrated by "Tav's admirable but failed attempt 
to sandbox file IO":
* http://tav.espians.com/a-challenge-to-break-python-security.html

Wow there are some impressive ways to confuse the system. I particularly like 
overriding str's equality function to defeat mode checking code when opening 
files.

> When we needed this at edX, we wrote CodeJail 
> (https://github.com/edx/codejail). 
It's a wrapper around AppArmor to provide OS-level protection of code 
execution in subprocesses.  It has Python-specific features, but because it 
is based on AppArmor, can sandbox any process, so long as it's configured 
properly. 

This looks promising. I will take a closer look.

> What about launching the Python process in a Docker container?

This may work in combination with other techniques. Certainly faster than 
spinning up a new VM or snapshot-restoring a fixed VM on a repeated basis. 
Would need to see whether CPU, Memory, and Disk usage could be constrained at 
the level of a container.

- David


On Monday, May 25, 2015 at 7:24:32 PM UTC-7, [email protected] wrote:
> I am writing a web service that accepts Python programs as input, runs the 
> provided program with some profiling hooks, and returns various information 
> about the program's runtime behavior. To do this in a safe manner, I need to 
> be able to create a sandbox that restricts what the submitted Python program 
> can do on the web server.
> 
> Almost all discussion about Python sandboxes I have seen on the internet 
> involves selectively blacklisting functionality that gives access to system 
> resources, such as trying to hide the "open" builtin to restrict access to 
> file I/O. All such approaches are doomed to fail because you can always find 
> a way around a blacklist.
> 
> For my particular sandbox, I wish to allow *only* the following kinds of 
> actions (in a whitelist):
> * reading from stdin & writing to stdout;
> * reading from files, within a set of whitelisted directories;
> * pure Python computation.
> 
> In particular all other operations available through system calls are banned. 
> This includes, but is not limited to:
> * writing to files;
> * manipulating network sockets;
> * communicating with other processes.
> 
> I believe it is not possible to limit such operations at the Python level. 
> The best you could do is try replacing all the standard library modules, but 
> that is again just a blacklist - it won't prevent a determined attacker from 
> doing things like constructing their own 'code' object and executing it.
> 
> It might be necessary to isolate the Python process at the operating system 
> level.
> * A chroot jail on Linux & OS X can limit access to the filesystem. Again 
> this is just a blacklist.
> * No obvious way to block socket creation. Again this would be just a 
> blacklist.
> * No obvious way to detect unapproved system calls and block them.
> 
> In the limit, I could dynamically spin up a virtual machine and execute the 
> Python program in the machine. However that's extremely expensive in 
> computational time.
> 
> Has anyone on this list attempted to sandbox Python programs in a serious 
> fashion? I'd be interested to hear your approach.
> 
> - David
-- 
https://mail.python.org/mailman/listinfo/python-list