Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Stephen J. Turnbull
INADA Naoki writes:

 > > Why not print(obj)?

print(obj) will give mojibake by default if
sys.getfilenameencoding() != sys.getdefaultencoding().

 > > str() is normal high-level API, and __fspath__ and os.fspath() should be
 > > low level API.
 > > Normal users shouldn't use __fspath__ and os.fspath().  Only library
 > > developers should use it.

This is the price we pay for the stubbornness of the
bytes-are-text-too meme.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Paul Moore
On 11 April 2016 at 17:53, Jon Ribbens  wrote:
>> You're limiting the subset of Python that people can use,
>> understood. And you're trying to ensure that people can't do "bad
>> things". Again, understood. But what subset are you actually allowing,
>> and what things are you trying to protect against? (For example, I
>> can't calculate sin(1.2) using the math module - why is that not
>> alllowed?
>
> It wasn't allowed in the earlier version because I wasn't allowing
> import at all, because this is just an experiment. As it happens,
> I added 'import' yesterday so yes you can use math.sin.

Well, I'll ask the obvious question, then. In allowing "import" did
you allow "import ctypes"? If so, then I win :-) Or did you explicitly
whitelist certain modules? And if so, which ones are they, and did I
succeed if I manage to import a module you hadn't whitelisted?

>> It feels at the moment as if I'm playing a game where I don't know the
>> rules, and every time I think I scored a point, the rules are changed
>> to retroactively disallow it.
>
> The challenge is to show some code that will escape from the sandbox,
> in a way that is not trivially fixable with a tiny patch, or in a way
> that demonstrates that such a large number of tiny patches would be
> required as to be unworkable.

But I'm still not clear when I count as "outside the sandbox", given
that I don't know what the rules of what is allowed *in* the sandbox
are...

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Chris Angelico
On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens
 wrote:
> Anyway the code is at https://github.com/jribbens/unsafe
> It requires Python 3.4 or later (it could probably be made to work on
> Python 2.7 as well, but it would need some changes).

Rather annoying point: Your interactive mode allows no editing keys
(readline etc), and also doesn't have underscore for "last result", as
that's a forbidden name. :( Makes tinkering fiddly.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Paul Moore
On 12 April 2016 at 06:28, Stephen J. Turnbull  wrote:
> Donald Stufft writes:
>
>  > I think yes and yes [__fspath__ and fspath should be allowed to
>  > handle bytes, otherwise] it seems like making it needlessly harder
>  > to deal with a bytes path
>
> It's not needless.  This kind of polymorphism makes it hard to review
> code locally.  Once bytes get a foothold inside a text application,
> they metastasize altogether too easily, and you end up with TypeErrors
> or UnicodeErrors quite far from the origin.  Debugging often requires
> tracing data flows over hill and over dale while choking from the
> dusty trail, or band-aids like a top-level "except UnicodeError:
> log_and_quarantine(bytes)".  I can't prove that returning bytes from
> these APIs is a big risk in this sense, but I can't see a way to prove
> that it's not, either, given that their point is duck-typing, and
> therefore they may be generalized in the future, and by third parties.
>
> I understand that there are applications where it's bytes all the way
> down, but by the very nature of computing systems, there are systems
> where bytes are decoded to text.  For historical reasons (the encoding
> Tower of Babel), it's very error-prone to do that on demand.  Best
> practice is to do the conversion as close to the boundary as possible,
> and process only text internally.
>
> In text applications, "bytes as carcinogen" is an apt metaphor.
>
> Now, I'm not Dutch, so I can't tell you it's obvious that the risk to
> text-processing applications is more important than the inconvenience
> to byte-shoveling applications.  But there is a need to be
> parsimonious with polymorphism.

As someone who has done a lot of work helping projects to port from
the 2.x bytes/text model to the 3.x model, I have similar concerns
that rooting out the source of bytes objects appearing in a program
could be an issue with the proposed "return either" approach. The most
effective tool I have found in fixing programs with text/bytes issues
is carefully and thoroughly annotating precisely which functions
accept and return bytes, and which accept and return text. The sort of
mixed-mode processing we're talking about here makes that
substantially harder. And note that the signature of os.fspath can
return bytes or text *independent* of the type of the argument - it's
not a "bytes in, bytes out" function like the usual pattern of
"polymorphic support for bytes".

But just like Stephen, I have no feel for how significant the risk
will be in real life. I've never worked on code that actually has a
need for bytestring paths (particularly now that surrogateescape
ensures that most cases "just work").

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Nick Coghlan
On 12 April 2016 at 15:28, Stephen J. Turnbull  wrote:
> Donald Stufft writes:
>
>  > I think yes and yes [__fspath__ and fspath should be allowed to
>  > handle bytes, otherwise] it seems like making it needlessly harder
>  > to deal with a bytes path
>
> It's not needless.  This kind of polymorphism makes it hard to review
> code locally.  Once bytes get a foothold inside a text application,
> they metastasize altogether too easily, and you end up with TypeErrors
> or UnicodeErrors quite far from the origin.  Debugging often requires
> tracing data flows over hill and over dale while choking from the
> dusty trail, or band-aids like a top-level "except UnicodeError:
> log_and_quarantine(bytes)".  I can't prove that returning bytes from
> these APIs is a big risk in this sense, but I can't see a way to prove
> that it's not, either, given that their point is duck-typing, and
> therefore they may be generalized in the future, and by third parties.
>
> I understand that there are applications where it's bytes all the way
> down, but by the very nature of computing systems, there are systems
> where bytes are decoded to text.  For historical reasons (the encoding
> Tower of Babel), it's very error-prone to do that on demand.  Best
> practice is to do the conversion as close to the boundary as possible,
> and process only text internally.

One possible way to address this concern would be to have the
underlying protocol be bytes/str (since boundary code frequently needs
to handle the paths-are-bytes assumption in POSIX), but offer an
"os.fspathname" API that rejected bytes output from os.fspath. That
is, it would be equivalent to:

def fspathname(path):
name = os.fspath(path)
if not isinstance(name, str):
raise TypeError("Expected str for pathname, not
{}".format(type(name)))
return name

That way folks that wanted the clean "must be str" signature could use
os.fspathname, while those that wanted to accept either could use the
lower level os.fspath.

The ambiguity in question here is inherent in the differences between
the way POSIX and Windows work, so there are limits to how far we can
go in hiding it without making things worse rather than better.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Chris Angelico
On Tue, Apr 12, 2016 at 6:17 PM, Paul Moore  wrote:
> Well, I'll ask the obvious question, then. In allowing "import" did
> you allow "import ctypes"? If so, then I win :-) Or did you explicitly
> whitelist certain modules? And if so, which ones are they, and did I
> succeed if I manage to import a module you hadn't whitelisted?

The module whitelist is given at the top of the source code:

_SAFE_MODULES = frozenset((
"base64", "binascii", "bisect", "calendar", "cmath", "crypt", "datetime",
"decimal", "enum", "errno", "fractions", "functools", "hashlib", "hmac",
"ipaddress", "itertools", "math", "numbers", "queue", "re", "statistics",
"textwrap", "unicodedata", "urllib.parse",
))

And yes, you win if you get another module. Interestingly, you're
allowed to import urllib.parse, but not urllib itself; but "import
urllib.parse" makes urllib available - and, since modules inside
modules are blacklisted, "urllib.parse" doesn't exist
(AttributeError).

You can access the decimal module, and call decimal.getcontext(). This
returns the same default context object that the "outer" Python uses;
consequently, this sandboxing technique MUST NOT be used in any
program that, now or ever in the future, uses the decimal module (or
at least its default context; but I'm not sure how you'd be absolutely
sure you never EVER use the default context).

Even more curiously, you can "import fractions", but you don't get
fractions.Fraction - though you *do* get fractions.Decimal. And
importing enum gives you EnumMeta, but metaclasses seem to be broken,
and you can't get enum.Enum.

The sandbox code assumes that an attacker cannot create files in the
current directory.

rosuav@sikorsky:~/tmp/unsafe$ echo 'import sys; real_module = lambda
mod: sys.modules[mod]' >hashlib.py
rosuav@sikorsky:~/tmp/unsafe$ ./unsafe.py -i
Python 3.6.0a0 (default:78b84ae0b745+, Apr  6 2016, 03:43:18)
[GCC 5.3.1 20160323] on linux
Type "help", "copyright", "credits" or "license" for more information.
(SafeInteractiveConsole)
>>> import hashlib
>>> hashlib.real_module("sys")


Setting LC_ALL and then working with calendar.LocaleTextCalendar()
causes locale files to be read. I'm not sure if you can turn that into
an exploit, but the attack surface depends on the installed locales on
the system.

This is still a massive game of whack-a-mole.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 06:28:34PM +1000, Chris Angelico wrote:
> On Sat, Apr 9, 2016 at 12:18 AM, Jon Ribbens
>  wrote:
> > Anyway the code is at https://github.com/jribbens/unsafe
> > It requires Python 3.4 or later (it could probably be made to work on
> > Python 2.7 as well, but it would need some changes).
> 
> Rather annoying point: Your interactive mode allows no editing keys
> (readline etc), and also doesn't have underscore for "last result", as
> that's a forbidden name. :( Makes tinkering fiddly.

It's just a subclass of the stdlib class code.InteractiveConsole,
which seems not to offer those features unfortunately.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 06:57:37PM +1000, Chris Angelico wrote:
> And yes, you win if you get another module. Interestingly, you're
> allowed to import urllib.parse, but not urllib itself; but "import
> urllib.parse" makes urllib available - and, since modules inside
> modules are blacklisted, "urllib.parse" doesn't exist
> (AttributeError).

Yes, this is issue #3 on github. I'd need to spend a few minutes
thinking about how to make importing of submodules work out properly.

> You can access the decimal module, and call decimal.getcontext(). This
> returns the same default context object that the "outer" Python uses;

OK, decimal goes ;-)

> Even more curiously, you can "import fractions", but you don't get
> fractions.Fraction - though you *do* get fractions.Decimal.

That seems to be because Fraction inherits from numbers.Number,
which has a metaclass, so type(Fraction) is abc.ABCMeta not 'type'.
That's obviously not a security hole and may well be fixable.

> The sandbox code assumes that an attacker cannot create files in the
> current directory.

If the attacker can create such files then the system is already
compromised even if you're not using any sandboxing system, because
you won't be able to trust any normal imports from your own code.

> Setting LC_ALL and then working with calendar.LocaleTextCalendar()
> causes locale files to be read.

I don't think that has any obvious relevance. Doing "import enum"
causes "enum.py" to be read too, and that isn't a security hole.

> This is still a massive game of whack-a-mole.

No, it still isn't. If the names blacklist had to keep being extended
then you would be right, but that hasn't happened so far. Whitelists
by definition contain only a small, limited number of potential moles.

The only thing you found above that even remotely approaches an
exploit is the decimal.getcontext() thing, and even that I don't
think you could use to do any code execution.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Chris Angelico
On Tue, Apr 12, 2016 at 8:06 PM, Jon Ribbens
 wrote:
> On Tue, Apr 12, 2016 at 06:57:37PM +1000, Chris Angelico wrote:
>> The sandbox code assumes that an attacker cannot create files in the
>> current directory.
>
> If the attacker can create such files then the system is already
> compromised even if you're not using any sandboxing system, because
> you won't be able to trust any normal imports from your own code.

Just confirming that, yeah. Though you could protect against it
somewhat by pre-importing everything that can legally be imported;
that way, at least the attack surface ceases once untrusted code
starts executing. Consider it a privilege escalation attack; you can
move from "create file in current directory" to "remote code
execution" simply by creating hashlib.py and then importing it.

>> Setting LC_ALL and then working with calendar.LocaleTextCalendar()
>> causes locale files to be read.
>
> I don't think that has any obvious relevance. Doing "import enum"
> causes "enum.py" to be read too, and that isn't a security hole.

I mean the system locale files, not just locale.py itself. If nothing
else, it's a means of discovering info about the system. I don't know
what you can get by figuring out what locales are installed, but it's
another concern to think about.

>> This is still a massive game of whack-a-mole.
>
> No, it still isn't. If the names blacklist had to keep being extended
> then you would be right, but that hasn't happened so far. Whitelists
> by definition contain only a small, limited number of potential moles.
>
> The only thing you found above that even remotely approaches an
> exploit is the decimal.getcontext() thing, and even that I don't
> think you could use to do any code execution.

decimal.getcontext is a simple and obvious example of a way that
global mutable objects can be accessed across the boundary. There is
no way to mathematically prove that there are no more, so it's still a
matter of blacklisting.

I still think you need to work out a "minimum viable set" and set down
some concrete rules: if any feature in this set has to be blacklisted
in order to achieve security, the experiment has failed.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Paul Moore
On 11 April 2016 at 22:21, Sven R. Kunze  wrote:
> On 11.04.2016 23:08, Random832 wrote:
>>
>> On Mon, Apr 11, 2016, at 17:04, Sven R. Kunze wrote:
>>>
>>> PS: The only way out that I can imagine is to fix pathlib. I am not in
>>> favor of fixing functions of "os" and "os.path" to except "path"
>>> objects;
>>
>> Why not?
>
>
> It occurred to me after pondering over Paul's comments.
>
> "os" and "os.path" is just a completely different level of abstraction.
> There is just no need to mess with them.
>
> The initial failure of my colleague and me of using pathlib can be solely
> attributed to pathlib's lack of functionality. Not to the incompatibility of
> "os" nor "os.path" with "Path" objects.

As your thoughts appear to have been triggered by my comments, I feel
I should clarify.

1. I like pathlib even as it is right now, and I'm strongly -1 on removing it.
2. The "external dependency" aspect of 3rd party solutions makes them
far less useful to me.
3. The work on improving integration with the stdlib (which is nearly
sorted now, as far as I can see) is a big improvement, and I'm all in
favour. But even without it, I wouldn't want pathlib to be removed.
4. There are further improvements that could be made to pathlib,
certainly, but again they are optional, and pathlib is fine without
them.
5. I wish more 3rd party code integrated better with pathlib. The
improved integration work might help with this. But ultimately, Python
2 compatibility is likely to be the biggest block (either perceived or
real - we can make pathlib support as simple as possible, but some 3rd
party authors will remain unwilling to add support for Python 3 only
features in the short term). This isn't a pathlib problem.
6. There will probably always be a place for low-level os/os.path
code. Adding support in those modules for pathlib doesn't affect that
fact, but does make it easier to use pathlib "seamlessly", so why not
do so?

tl; dr; I'm 100% in favour of pathlib, and in the direction the
current discussion (excluding "let's give up on pathlib" digressions)
is going.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 08:27:14PM +1000, Chris Angelico wrote:
> On Tue, Apr 12, 2016 at 8:06 PM, Jon Ribbens
>  wrote:
> > No, it still isn't. If the names blacklist had to keep being extended
> > then you would be right, but that hasn't happened so far. Whitelists
> > by definition contain only a small, limited number of potential moles.
> >
> > The only thing you found above that even remotely approaches an
> > exploit is the decimal.getcontext() thing, and even that I don't
> > think you could use to do any code execution.
> 
> decimal.getcontext is a simple and obvious example of a way that
> global mutable objects can be accessed across the boundary. There is
> no way to mathematically prove that there are no more, so it's still a
> matter of blacklisting.

No, it's a matter of reducing the whitelist. I must admit that
I don't understand in what way this is not already clear. Look:

  >>> len(unsafe._SAFE_MODULES)
  23

I could "mathematically prove" that there are no more security holes
in that list by reducing its length to zero. There are still plenty
of circumstances in which the experiment would be a useful tool even
with no modules allowed to be imported.

> I still think you need to work out a "minimum viable set" and set down
> some concrete rules: if any feature in this set has to be blacklisted
> in order to achieve security, the experiment has failed.

The "minimum viable set" in my view would be: no builtins at all,
only allowing eval() not exec(), and disallowing yield [from],
lambdas and generator expressions.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 06:21:04AM -0400, Isaac Morland wrote:
> On Tue, 12 Apr 2016, Jon Ribbens wrote:
> >>This is still a massive game of whack-a-mole.
> >
> >No, it still isn't. If the names blacklist had to keep being extended
> >then you would be right, but that hasn't happened so far. Whitelists
> >by definition contain only a small, limited number of potential moles.
> >
> >The only thing you found above that even remotely approaches an
> >exploit is the decimal.getcontext() thing, and even that I don't
> >think you could use to do any code execution.
> 
> "I don't think"?
> 
> Where's the formal proof?

I disallowed the module completely, that's the proof.

> Without a proof, this is indeed just a game of whack-a-mole.

Almost no computer programs are ever "formally proved" to be secure.
None of those that run the global Internet are. I don't see why it
makes any sense to demand that my experiment be held to a massively
higher standard than the rest of the code everyone relies on every day.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Maciej Fijalkowski
On Tue, Apr 12, 2016 at 1:14 PM, Jon Ribbens
 wrote:
> On Tue, Apr 12, 2016 at 06:21:04AM -0400, Isaac Morland wrote:
>> On Tue, 12 Apr 2016, Jon Ribbens wrote:
>> >>This is still a massive game of whack-a-mole.
>> >
>> >No, it still isn't. If the names blacklist had to keep being extended
>> >then you would be right, but that hasn't happened so far. Whitelists
>> >by definition contain only a small, limited number of potential moles.
>> >
>> >The only thing you found above that even remotely approaches an
>> >exploit is the decimal.getcontext() thing, and even that I don't
>> >think you could use to do any code execution.
>>
>> "I don't think"?
>>
>> Where's the formal proof?
>
> I disallowed the module completely, that's the proof.
>
>> Without a proof, this is indeed just a game of whack-a-mole.
>
> Almost no computer programs are ever "formally proved" to be secure.
> None of those that run the global Internet are. I don't see why it
> makes any sense to demand that my experiment be held to a massively
> higher standard than the rest of the code everyone relies on every day.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

Jon, let me reiterate. You asked people to break it (that's the title
of the thread) and they did so almost immediately. Then you patched
the thing and asked them to break it again and they did. Now the
faulty assumption here is that this procedure, repeated enough times
will produce a secure environment - this is not how security works,
you need to be secure against people who will spend more than 5
minutes and who are not on this list or reading this incredibly long
email chain. You can't do that just by asking on the mailing list and
whacking all the examples. As others pointed out, this particular
approach (with maybe different details) has been tried again and again
and again and the result has been the same - you end up with either a
completely unusable python (the python that can't run anything is
trivially secure) or you end up with something that's insecure. I
suggest you look instead at something like PyPy sandbox - which
systematically replaces all external calls with a call to a proxy.
Because PyPy is written in RPython, you can do that - the amount of
code that needs reviewing is relatively small, a couple pages of code.
The code you need to review in order to be even remotely secure is
much larger - it's the amount of C code you can call from your python
with or without knowing that it can happen.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Chris Angelico
On Tue, Apr 12, 2016 at 9:10 PM, Jon Ribbens
 wrote:
> On Tue, Apr 12, 2016 at 08:27:14PM +1000, Chris Angelico wrote:
>> decimal.getcontext is a simple and obvious example of a way that
>> global mutable objects can be accessed across the boundary. There is
>> no way to mathematically prove that there are no more, so it's still a
>> matter of blacklisting.
>
> No, it's a matter of reducing the whitelist. I must admit that
> I don't understand in what way this is not already clear. Look:
>
>   >>> len(unsafe._SAFE_MODULES)
>   23
>
> I could "mathematically prove" that there are no more security holes
> in that list by reducing its length to zero. There are still plenty
> of circumstances in which the experiment would be a useful tool even
> with no modules allowed to be imported.

Yes, you just removed decimal because of getcontext. What about the
next module with that kind of issue? Or what about the next
non-underscore attribute on a core type that can cause you grief (like
how async functions leak stack frames)?

>> I still think you need to work out a "minimum viable set" and set down
>> some concrete rules: if any feature in this set has to be blacklisted
>> in order to achieve security, the experiment has failed.
>
> The "minimum viable set" in my view would be: no builtins at all,
> only allowing eval() not exec(), and disallowing yield [from],
> lambdas and generator expressions.

Then start with that. Don't give ANYTHING else. Otherwise you're still
playing with the blacklist.

But at that point, you pretty much have something that can't be
recognized as Python. You may as well start from a completely
different basis and design your own expression evaluator, maybe making
use of parse-to-AST, but not actually eval'ing the source code. That's
how fundamental this issue is - to dodge the security problems, you
get to the point where you've dodged all of what makes Python Python.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Victor Stinner
2016-04-12 13:10 GMT+02:00 Jon Ribbens :
> No, it's a matter of reducing the whitelist. I must admit that
> I don't understand in what way this is not already clear. Look:
>
>   >>> len(unsafe._SAFE_MODULES)
>   23

You don't understand that even if the visible "Python scope", "Python
namespace", or call it as you want (the code that is accessible from
your sandbox) looks very tiny, the real effictive code is HUGE. For
example, you give a full access to the str type which is made of 20K
lines of C code:

haypo@smithers$ wc -l Objects/unicodeobject.c Objects/unicodectype.c
Objects/stringlib/*h
 15670 Objects/unicodeobject.c
   297 Objects/unicodectype.c
29 Objects/stringlib/asciilib.h
   827 Objects/stringlib/codecs.h
27 Objects/stringlib/count.h
   109 Objects/stringlib/ctype.h
25 Objects/stringlib/eq.h
   250 Objects/stringlib/fastsearch.h
   201 Objects/stringlib/find.h
   133 Objects/stringlib/find_max_char.h
   140 Objects/stringlib/join.h
   180 Objects/stringlib/localeutil.h
   116 Objects/stringlib/partition.h
53 Objects/stringlib/replace.h
   390 Objects/stringlib/split.h
28 Objects/stringlib/stringdefs.h
   266 Objects/stringlib/transmogrify.h
30 Objects/stringlib/ucs1lib.h
29 Objects/stringlib/ucs2lib.h
29 Objects/stringlib/ucs4lib.h
11 Objects/stringlib/undef.h
32 Objects/stringlib/unicodedefs.h
  1284 Objects/stringlib/unicode_format.h
 20156 total

Did you review carefully *all* these lines? If a single C line gives
access to the real Python namespace, the game is over.

In a few minutes, I found "{0.__class__}".format(obj) which is not a
full escape of the sandbox, but it's just to give one example. With
more time, I'm sure that a line can be found in the str type to escape
your sandbox.


> I could "mathematically prove" that there are no more security holes
> in that list by reducing its length to zero.

You only see a very tiny portion of the real attack surface.

> The "minimum viable set" in my view would be: no builtins at all,
> only allowing eval() not exec(), and disallowing yield [from],
> lambdas and generator expressions.

IMHO it's a waste of time to try to reduce the great Python with
battery included to a simple calculator to compute 1+2. You will never
be able to fix all holes, there are too many holes in your sandbox.

It's very easy to implement your own calculator in pure Python, from
the parser to the code to compute the operators. If you write yourself
the whole code, it's much easier to control what is allowed and put
limits. For example, with your own code, you can put limits on the
maximum number, whereas your sandbox will kill your CPU and memory if
you try 2**(2**100) (no builtin function required for this "exploit").

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Victor Stinner
2016-04-08 16:18 GMT+02:00 Jon Ribbens :
> I've made another attempt at Python sandboxing, which does something
> which I've not seen tried before - using the 'ast' module to do static
> analysis of the untrusted code before it's executed, to prevent most
> of the sneaky tricks that have been used to break out of past attempts
> at sandboxes.

Right, it blocks the most trivial attacks against sandboxes. But you
only fixed a few holes, they are still a wide area of holes to escape
your sandbox.

I read your code and the code of CPython. I found many issues.

Your sandbox runs untrusted code in a new namespace. The game is to
get access of the outter namespace, the real Python namespace. For
example, get the namespace of the unsafe module.

Your bet is that blocking access to "_" variables, using a whitelist
of modules and a few other protections is enough to block access to
the real namespace. The problem is that Python provides a very wide
range of tools for introspection.

I expected to find a hole using the C code, but in fact, it was much
simpler than that.

Your "safe import" hides real functions with a proxy. Ok. But the code
of modules is still run in the real namespace, where I expected that
modules run in the untrusted (restricted) namespace. The game is now
to find a way to retrieve content from the real namespace using any
function exposed in modules.

I found functools.update_wrapper(). I was very surprised because this
function calls getattr() and setattr(), whereas your sandbox replaces
these builtin functions. In fact, the "safe" getattr and setattr are
only installed in the untrusted namespace, and as I wrote, the modules
run in the real Python namespace.


> I would be very interested to see if anyone can manage to break it.

So here you have:
---
import functools

# any proxy function from unsafe.py
import base64
src = base64.main

# hack to get any attribute of an object
def getattr(obj, attr):
secret = None

class A:
def __setattr__(self, key, value):
nonlocal secret
if key == attr:
secret = value

dst = A()
functools.update_wrapper(dst, src, assigned=(attr,), updated=())
return secret

builtins = getattr(base64.main, "__globals__")["__builtins__"]

fn = "/tmp/owned"
with builtins.open(fn, "w") as f:
f.write("game over!\n")
---

The exploit is based on two things:

* update_wrapper() is used to get the secret attribute using the real
getattr() function
* update_wrapper() + A.__setattr__ are used to pass the secret from
the real namespace to the untrusted namespace


> Bugs which are trivially fixable are of course welcomed, but the real
> question is: is this approach basically sound, or is it fundamentally
> unworkable?

You can block the functools.update_wrapper(), or even the whole
functools module. But it will not fix the root cause: modules must run
in the untrusted namespace.

In pysandbox, I have code to ensure that all modules run in the
untrusted namespace: see CleanupBuiltins in sandbox/builtins.py. But
it was not enough, many vulnerabilities were found even with all my
protections.

I'm sure that many others will find other ways to escape your sandbox
with enough time. It's a matter of time, not a matter of whitelists.

As I wrote in my long explaning why pysandbox is broken by design,
writing a sandbox inside a CPython doesn't work. In fact, what you
want to restrict is the access to limited resources like CPU and
memory, and block access to the filesystem. This is the job of the
operating system, and external sandboxes help to block access to the
filesystem.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 01:38:09PM +0200, Maciej Fijalkowski wrote:
> Jon, let me reiterate. You asked people to break it (that's the title
> of the thread) and they did so almost immediately. Then you patched
> the thing and asked them to break it again and they did. Now the
> faulty assumption here is that this procedure, repeated enough times
> will produce a secure environment - this is not how security works,

That is not an accurate summary of what has happened so far,
nor am I making that assumption. You are misunderstanding the
purpose of the experiment - I am not sure how, as I have tried
to be quite clear.

The question is: with a minimal (or empty) set of builtins, and a
restriction on ast.Name and ast.Attribute nodes, can exec/eval be
made 'safe' so they cannot execute code outside the sandbox. The
answer appears to be "yes", if the restriction is "^f?_". (If you
additionally inject external objects to the namespace then they need
to be proxied and mro() prevented.)

> You can't do that just by asking on the mailing list and whacking
> all the examples.

If anyone had managed to find any more examples of holes in the
original featureset after the first couple then I would agree with
you, but they haven't. 

> As others pointed out, this particular approach (with maybe
> different details) has been tried again and again and again

This simply isn't true either. As far as I can see, only
RestrictedPython has tried anything remotely similar, and
to the best of my ability to determine, that project is not
considerd a failure.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Victor Stinner
2016-04-12 13:38 GMT+02:00 Maciej Fijalkowski :
> (...) you end up with either a
> completely unusable python (the python that can't run anything is
> trivially secure)

Yeah, that's the obvious question: what's the purpose of such very
limited Python subset, for example something limited to int with a few
operators (+ - * /)?

That's also why I gave up with pysandbox. It became impossible to
execute anything more complex than an hello world.

By the way, I noticed that enum.Enum and enum.EnumMeta don't work in
your sandbox.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Victor Stinner
2016-04-12 14:18 GMT+02:00 Jon Ribbens :
> The question is: with a minimal (or empty) set of builtins, and a
> restriction on ast.Name and ast.Attribute nodes, can exec/eval be
> made 'safe' so they cannot execute code outside the sandbox.

According to multiple exploits listed in this thread, no, it's not possible.


> If anyone had managed to find any more examples of holes in the
> original featureset after the first couple then I would agree with
> you, but they haven't.

See my latest exploit using functools.update_wrapper() + A.__setattr__() ;-)


>> As others pointed out, this particular approach (with maybe
>> different details) has been tried again and again and again
>
> This simply isn't true either. As far as I can see, only
> RestrictedPython has tried anything remotely similar, and
> to the best of my ability to determine, that project is not
> considerd a failure.

IMHO nobody seriously audited RestrictedPython. It doesn't mean that
it's secure.

When it was created, security was less important than nowadays.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Victor Stinner
2016-04-12 14:16 GMT+02:00 Victor Stinner :
> I read your code and the code of CPython. I found many issues.
> (...)
> The exploit is based on two things:
>
> * update_wrapper() is used to get the secret attribute using the real
> getattr() function
> * update_wrapper() + A.__setattr__ are used to pass the secret from
> the real namespace to the untrusted namespace

Oh, I forgot to mention another vulnerability: you block access to
attributes by replacing getattr and by analyzing the AST. Ok, but one
more time, it's not enough. If you get access to obj.__dict__, you
will likely get access to any attribute using obj_dict[attr] instead
of obj.attr.

I wrote pysandbox because I liked Tav's idea of *removing* sensitive
dictionary keys of sensitive types like functions, frames and code
objects. Again, it was not enough.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 02:05:06PM +0200, Victor Stinner wrote:
> 2016-04-12 13:10 GMT+02:00 Jon Ribbens :
> > No, it's a matter of reducing the whitelist. I must admit that
> > I don't understand in what way this is not already clear. Look:
> >
> >   >>> len(unsafe._SAFE_MODULES)
> >   23
> 
> You don't understand that even if the visible "Python scope", "Python
> namespace", or call it as you want (the code that is accessible from
> your sandbox) looks very tiny, the real effictive code is HUGE.

You are mistaken, I do understand that.

> In a few minutes, I found "{0.__class__}".format(obj) which is not a
> full escape of the sandbox, but it's just to give one example.

It's something I'd already thought of, and it's not an escape at all.

> > I could "mathematically prove" that there are no more security holes
> > in that list by reducing its length to zero.
> 
> You only see a very tiny portion of the real attack surface.

You've misunderstood my comment - I was saying that the security holes
from imported modules can be easily eliminated. That doesn't say
anything about security holes not from imported modules, of course.

> > The "minimum viable set" in my view would be: no builtins at all,
> > only allowing eval() not exec(), and disallowing yield [from],
> > lambdas and generator expressions.
> 
> IMHO it's a waste of time to try to reduce the great Python with
> battery included to a simple calculator to compute 1+2.

And in my opinion it isn't. There are plenty of use cases for such
a thing. Take a look at this for example:
https://developer.blender.org/D1862 

> It's very easy to implement your own calculator in pure Python, from
> the parser to the code to compute the operators. If you write yourself
> the whole code, it's much easier to control what is allowed and put
> limits. For example, with your own code, you can put limits on the
> maximum number, whereas your sandbox will kill your CPU and memory if
> you try 2**(2**100) (no builtin function required for this "exploit").

Yes, I'd already thought of that too, although if you allow functions
and methods to be called (which they are, in my minimal viable set
suggestion above) then I think perhaps you've not actually bought
yourself very much with all that work.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 02:31:19PM +0200, Victor Stinner wrote:
> Oh, I forgot to mention another vulnerability: you block access to
> attributes by replacing getattr and by analyzing the AST. Ok, but one
> more time, it's not enough. If you get access to obj.__dict__, you
> will likely get access to any attribute using obj_dict[attr] instead
> of obj.attr.

That's not a vulnerability, and it's something I already explicitly
mentioned - if you can get a function to return an object's __dict__
then you win. The question is: can you do that?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Chris Angelico
On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens
 wrote:
> On Tue, Apr 12, 2016 at 02:31:19PM +0200, Victor Stinner wrote:
>> Oh, I forgot to mention another vulnerability: you block access to
>> attributes by replacing getattr and by analyzing the AST. Ok, but one
>> more time, it's not enough. If you get access to obj.__dict__, you
>> will likely get access to any attribute using obj_dict[attr] instead
>> of obj.attr.
>
> That's not a vulnerability, and it's something I already explicitly
> mentioned - if you can get a function to return an object's __dict__
> then you win. The question is: can you do that?

The question is, rather: Can you prove that we cannot?

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 02:16:57PM +0200, Victor Stinner wrote:
> I read your code and the code of CPython. I found many issues.

Thanks for your efforts.

> Your "safe import" hides real functions with a proxy. Ok. But the code
> of modules is still run in the real namespace,

Yes, that was the intention.

> I found functools.update_wrapper(). I was very surprised because this
> function calls getattr() and setattr(), whereas your sandbox replaces
> these builtin functions.

Good point. It seems it was almost certainly foolish of me to add
'import' back in in response to peoples' comments while my original
concept was still being discussed.

> So here you have:
> ---
> import functools

Thanks, that was pretty clever. I've of course fixed it by reducing
the list of imports (a lot, since I had really audited them at all).
But you make a good point.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 10:45:06PM +1000, Chris Angelico wrote:
> On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens
>  wrote:
> > That's not a vulnerability, and it's something I already explicitly
> > mentioned - if you can get a function to return an object's __dict__
> > then you win. The question is: can you do that?
> 
> The question is, rather: Can you prove that we cannot?

I refer you to the answer given previously. Can you prove you cannot
write code to escape JavaScript sandboxes? No? Then why have you not
disabled JavaScript in your browser?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Chris Angelico
On Tue, Apr 12, 2016 at 10:49 PM, Jon Ribbens
 wrote:
> On Tue, Apr 12, 2016 at 10:45:06PM +1000, Chris Angelico wrote:
>> On Tue, Apr 12, 2016 at 10:42 PM, Jon Ribbens
>>  wrote:
>> > That's not a vulnerability, and it's something I already explicitly
>> > mentioned - if you can get a function to return an object's __dict__
>> > then you win. The question is: can you do that?
>>
>> The question is, rather: Can you prove that we cannot?
>
> I refer you to the answer given previously. Can you prove you cannot
> write code to escape JavaScript sandboxes? No? Then why have you not
> disabled JavaScript in your browser?

I personally cannot, any more than I can prove that SSL is secure or
that my Linux+Apache system doesn't allow remote code execution [1]. I
trust other people to, and then make a value judgement: is it worth
breaking all the web sites that depend on it? (And sometimes the
answer is "yes".)

One of the key differences with scripts in web browsers is that there
*is* no "outer environment" to access. Remember what I said about the
difference between Python-in-Python sandboxing and, say,
Lua-in-Python? One tiny exploit in Python-in-Python and you suddenly
gain access to the entire outer environment, and it's game over. One
tiny exploit in Lua-in-Python and you have whatever that exploit gave
you, nothing more.

In fact, if you're prepared to forfeit almost all of Python's power to
achieve security, you probably should look into embedding a JavaScript
or Lua engine in your Python code. You'll get a comparable expression
evaluator, and most people won't be able to tell the difference.
You've already cut the set of modules down to just cmath, datetime,
math, and re; I suspect re is next on the chopping block (it has a
global cache - if the outer system uses a regular expression more than
once, it would potentially be possible to mess with it in the cache,
and then next time it gets used, the injected code gets run), and
datetime might not be that far behind. And if they do go, all you have
left is a scientific calculator. You can implement that in any
language you like.

ChrisA

[1] And if anyone mentions PHP, I will set him to work on the hardest
PHP problem I know of - no, not securing it. I mean convincing end
users that it's not necessary. Securing it is trivial by comparison.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Steven D'Aprano
I haven't been following this thread in detail, so perhaps I have 
missed something, but I have a question...


On Tue, Apr 12, 2016 at 02:05:06PM +0200, Victor Stinner wrote:

> You don't understand that even if the visible "Python scope", "Python
> namespace", or call it as you want (the code that is accessible from
> your sandbox) looks very tiny, the real effictive code is HUGE. For
> example, you give a full access to the str type which is made of 20K
> lines of C code:
> 
> haypo@smithers$ wc -l Objects/unicodeobject.c Objects/unicodectype.c
> Objects/stringlib/*h
>  15670 Objects/unicodeobject.c
[...]
>   1284 Objects/stringlib/unicode_format.h
>  20156 total
> 
> Did you review carefully *all* these lines? If a single C line gives
> access to the real Python namespace, the game is over.

I don't follow this logic. Jon's sandbox doesn't provide an interface to 
calling arbitrary lines of C code from Python. It is limited to only a 
restricted set of Python operations.

So sticking to string methods for the sake of discussion, it doesn't 
matter if (let's say) str.upper has access to the real Python namespace. 
There is no API for str.upper to return that namespace. It only returns 
a new string. So where is the error in the following reasoning?

There are 44 string methods, excluding those that start with an 
underscore. So if Jon audits those 44 methods, and determines which ones 
return (let's say) strings and which give access to namespaces, then he 
can block the ones which give access to namespaces and allow the ones 
which return strings.

To give a concrete example... suppose that the C locale library is 
unsafe. Further, let's suppose that the str.isdigit method calls code 
from the C locale library, to determine whether or not the string is 
made up of locale-specific digits. How does this make str.isdigit 
(potentially) unsafe? Regardless of what happens inside the method, it 
still returns either True or False and nothing else. There's no 
str.isdigit API to access the locale library.

I can think of one possible threat. Suppose that the locale library has 
a bug, so that calling "aardvark".isdigit seg faults, potentially 
executing arbitrary C code, but at the very least crashing the 
application. Is that the sort of attack you're concerned by?



> In a few minutes, I found "{0.__class__}".format(obj) which is not a
> full escape of the sandbox, but it's just to give one example. With
> more time, I'm sure that a line can be found in the str type to escape
> your sandbox.

Maybe so. And then Jon will fix that vulnerability. And somebody will 
find a new one. And he'll fix that too, or decide that it is too hard to 
fix and give up.

That's how security works. Even software designed for security can have 
exploitable bugs:

http://securityvulns.com/news/FreeBSD/jail/chdir.html

It seems unfair to me to hold Jon to a higher standard than we hold 
people like Apple, or the Linux kernal devs.

I fully accept and respect your personal opinion, based on your 
experience, that Jon's tactic is doomed to failure. But if he needs to 
learn this for himself, just as you had to learn it for yourself 
(otherwise you wouldn't have started your own sandbox project), I can 
respect that too. Progress depends on the unreasonable person who thinks 
they can overturn the conventional wisdom.

You're telling Jon not to bother trying to sandbox CPython, he should 
use PyPy's sandbox instead. But if the PyPy people had believed the 
conventional wisdom that you can't sandbox Python, they wouldn't have a 
sandbox either.

Even if the only thing we learn from Jon's experiment is a new set of 
tricks for breaking out of the sandbox, that's still interesting, if not 
useful. And maybe he'll find some combination of whielist and OS-level 
jail that together makes a practical sandbox. And if not, well, it's his 
own time he is wasting.


> IMHO it's a waste of time to try to reduce the great Python with
> battery included to a simple calculator to compute 1+2.

Completely agree. But hopefully the whitelist won't be that restrictive, 
and will allow subtraction and multiplication as well :-)



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Chris Angelico
On Tue, Apr 12, 2016 at 11:12 PM, Steven D'Aprano  wrote:
> To give a concrete example... suppose that the C locale library is
> unsafe. Further, let's suppose that the str.isdigit method calls code
> from the C locale library, to determine whether or not the string is
> made up of locale-specific digits. How does this make str.isdigit
> (potentially) unsafe? Regardless of what happens inside the method, it
> still returns either True or False and nothing else. There's no
> str.isdigit API to access the locale library.
>
> I can think of one possible threat. Suppose that the locale library has
> a bug, so that calling "aardvark".isdigit seg faults, potentially
> executing arbitrary C code, but at the very least crashing the
> application. Is that the sort of attack you're concerned by?

That is a potentially significant attack vector, as it depends on a
lot of external-to-Python information (the current locale, for
instance; and we've seen exploits that involve remotely setting
environment variables, which could include LC_ALL). However, you're
right that it isn't the concern here.

There is one other thing to worry about, and that's anything where the
"inner" system can affect or influence the "outer" system. With the
str type, that's unlikely (since strings are immutable), but I raised
the potential concern of the regex cache, as there's a chance someone
could attack that. The mere presence of decimal.getcontext() resulted
in the whole module getting off the whitelist.

If you want complete isolation of one and the other, that's easy: have
no communication whatsoever. But then there's no point in having them
both execute in the same interpreter. You may as well create a chroot
and run Python inside that, have it serialize the result to JSON and
write it to stdout, which you can then retrieve. That would pretty
much solve the problem. (And in fact, if I were to do-over the project
where I wanted Python sandboxing, that's probably what I'd do.)

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Isaac Morland

On Tue, 12 Apr 2016, Jon Ribbens wrote:


This is still a massive game of whack-a-mole.


No, it still isn't. If the names blacklist had to keep being extended
then you would be right, but that hasn't happened so far. Whitelists
by definition contain only a small, limited number of potential moles.

The only thing you found above that even remotely approaches an
exploit is the decimal.getcontext() thing, and even that I don't
think you could use to do any code execution.


"I don't think"?

Where's the formal proof?

Without a proof, this is indeed just a game of whack-a-mole.

I don't "think" Python is a suitable foundation for a sandboxing system 
intended for security purposes, but my "think" won't lead to security 
holes whereas yours will.  So, I would respectfully suggest that unless 
you increase the rigour of your effort substantially, it is not 
worthwhile.  Python is great for lots of applications already - there is 
no need to force it into unsuitable problem domains.


Isaac Morland   CSCF Web Guru
DC 2619, x36650 WWW Software Specialist
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 11:03:11PM +1000, Chris Angelico wrote:
> One of the key differences with scripts in web browsers is that there
> *is* no "outer environment" to access.

If you think that then I think you considerably misunderstand how
modern browsers work.

> Remember what I said about the difference between Python-in-Python
> sandboxing and, say, Lua-in-Python? One tiny exploit in
> Python-in-Python and you suddenly gain access to the entire outer
> environment, and it's game over. One tiny exploit in Lua-in-Python
> and you have whatever that exploit gave you, nothing more.

Are you imagining the Lua-in-Python as being completely isolated from
the Python namespace then?

> In fact, if you're prepared to forfeit almost all of Python's power to
> achieve security, you probably should look into embedding a JavaScript
> or Lua engine in your Python code.

Yes, I have in fact already done this (JavaScript using SpiderMonkey).
It allows the JavaScript to access Python objects and methods directly
from JavaScript so it doesn't actually help, but I think I could put
limits on that (e.g. making things read-only) and unlike most of this
Python stuff, that could be made a solid rule with no clever ways
around it.

> I suspect re is next on the chopping block (it has a global cache -
> if the outer system uses a regular expression more than once, it
> would potentially be possible to mess with it in the cache, and then
> next time it gets used, the injected code gets run),

All you could do would be to give misleading results from the regular
expression methods, but yes that is a good point. I regret that
I added the import stuff at all now - it has just been a distraction
from my original point.

> [1] And if anyone mentions PHP, I will set him to work on the hardest
> PHP problem I know of - no, not securing it. I mean convincing end
> users that it's not necessary. Securing it is trivial by comparison.

Fortunately I have managed to exclude PHP completely these days from
any system I have anything to do with!
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread David Wilson
On Tue, Apr 12, 2016 at 11:12:27PM +1000, Steven D'Aprano wrote:

> I can think of one possible threat. Suppose that the locale library
> has a bug, so that calling "aardvark".isdigit seg faults, potentially
> executing arbitrary C code, but at the very least crashing the
> application. Is that the sort of attack you're concerned by?

This thread already covered the need to address SEGV at length. For a
truly evil user, almost any kind of crash is an opportunity to take
control of the system, and a security solution ignoring this is no
security solution at all.


> Maybe so. And then Jon will fix that vulnerability. And somebody will
> find a new one. And he'll fix that too, or decide that it is too hard
> to fix and give up.
> 
> That's how security works. Even software designed for security can
> have exploitable bugs:
> 
> It seems unfair to me to hold Jon to a higher standard than we hold 
> people like Apple, or the Linux kernal devs.

I don't believe that's what is happening here. In the OS analogy, Jon is
generating busywork trying to secure an environment similar to Windows
3.1 that was simply never designed with e.g. memory protection in mind
to begin with, and there is no evidence after numerous attempts spanning
many years by multiple people that such an environment can be secured
meaningfully while still remaining generally useful.


> I fully accept and respect your personal opinion, based on your
> experience, that Jon's tactic is doomed to failure. But if he needs to
> learn this for himself, just as you had to learn it for yourself
> (otherwise you wouldn't have started your own sandbox project), I can
> respect that too. Progress depends on the unreasonable person who
> thinks they can overturn the conventional wisdom.

I'd deeply prefer it is this turned into an investigation or patchset
making CPython work nicely with seccomp, sandbox(7), pledge(2) or
whatever capability minimization mechanisms exist on Windows, they are
all mechanisms to make it much safer for random code to be executing on
your system, designed by folk who at all times expressively had security
in mind.

But that's not what's happening, instead a dead horse is being flogged
over a hundred messages in our inboxes and IMHO it is excruciating to
watch.


> Even if the only thing we learn from Jon's experiment is a new set of
> tricks for breaking out of the sandbox, that's still interesting, if
> not useful.

Don't forget the worst case: a fundamentally broken security module
heavily marketed to the naive using claims the core team couldn't break
it.


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Challenge: Please break this! (a.k.a restricted mode revisited)

2016-04-12 Thread Jon Ribbens
On Tue, Apr 12, 2016 at 01:40:57PM +, David Wilson wrote:
> On Tue, Apr 12, 2016 at 11:12:27PM +1000, Steven D'Aprano wrote:
> > I can think of one possible threat. Suppose that the locale library
> > has a bug, so that calling "aardvark".isdigit seg faults, potentially
> > executing arbitrary C code, but at the very least crashing the
> > application. Is that the sort of attack you're concerned by?
> 
> This thread already covered the need to address SEGV at length. For a
> truly evil user, almost any kind of crash is an opportunity to take
> control of the system, and a security solution ignoring this is no
> security solution at all.

Indeed.

> But that's not what's happening, instead a dead horse is being flogged
> over a hundred messages in our inboxes and IMHO it is excruciating to
> watch.

I don't think that is true at all, and I personally I have found this
thread very interesting. I apologise if others have not.

> > Even if the only thing we learn from Jon's experiment is a new set of
> > tricks for breaking out of the sandbox, that's still interesting, if
> > not useful.
> 
> Don't forget the worst case: a fundamentally broken security module
> heavily marketed to the naive using claims the core team couldn't break
> it.

I should point out that my module is called "unsafe.py", is titled
an "experiment", and prominently states in the README:

  Do not use this code for any purpose in the real world.

I will not be putting it up as an installable package, and as already
stated it was never my intention to suggest that it or anything like
it be included in the stdlib. I will however leave it on github for
anyone who wants to have a go at breaking into it in the future.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Sven R. Kunze

On 12.04.2016 00:56, Random832 wrote:

Fully general re-dispatch from argument types on any call to a function
that raises TypeError or NotImplemented? [e.g. call
Path.__missing_func__(os.open, path, mode)]

Have pathlib monkey-patch things at import?


Implicit conversion. No, thanks.


On Mon, Apr 11, 2016, at 17:43, Sven R. Kunze wrote:

So, I might add:

3. add more high-level features to pathlib to prevent a downgrade to os
or os.path

3. reimplement the entire ecosystem in every walled garden so no-one has
to leave their walled gardens.

What's the point of batteries being included if you can't wire them to
anything?


Huh? That makes not sense to me.


I don't get what you mean by this whole "different level of abstraction"
thing, anyway.


Strings are strings. Paths are paths. That's were the difference is.


The fact that there is one obvious thing to want to do
with open and a Path strongly suggests that that should be able to be
done by passing the Path to open.


Path(...).open() is your friend then. I don't see why you need os.open.

Refusing to upgrade it like saying, everything was better in the old 
days. So let's use os.open instead of Path(...).open().



Also, what level of abstraction is builtin open? Maybe we should _just_
leave os alone on the grounds of some holy sacred lowest-level-itude,
but allow io and shutils to accept Path?


os, io and shutils accept strings. Not Path objects. Why? Because the 
semantics of "being a path" are applied implicitly by those modules. You 
are free to use a random string as a path and later as the name of your 
pet. Semantics of a string comes from usage. Path objects however have 
built-in semantics.


Furthermore, if os, io and shutils are changed, we allow code like the 
following:



my_path.touch()
os.remove(my_path)


I don't know how to explain reasonably why my_path sometimes stays in 
front of the method call and sometimes behind it to newbies.


Best,
Sven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Sven R. Kunze

On 12.04.2016 12:41, Paul Moore wrote:

As your thoughts appear to have been triggered by my comments, I feel
I should clarify.

1. I like pathlib even as it is right now, and I'm strongly -1 on removing it.
2. The "external dependency" aspect of 3rd party solutions makes them
far less useful to me.
3. The work on improving integration with the stdlib (which is nearly
sorted now, as far as I can see) is a big improvement, and I'm all in
favour. But even without it, I wouldn't want pathlib to be removed.
4. There are further improvements that could be made to pathlib,
certainly, but again they are optional, and pathlib is fine without
them.


My conclusion is that these changes are not optional and tweaking os, io 
and shutil is just yet another workaround for a clean solution. :)


Just my two cents.


5. I wish more 3rd party code integrated better with pathlib. The
improved integration work might help with this. But ultimately, Python
2 compatibility is likely to be the biggest block (either perceived or
real - we can make pathlib support as simple as possible, but some 3rd
party authors will remain unwilling to add support for Python 3 only
features in the short term). This isn't a pathlib problem.
6. There will probably always be a place for low-level os/os.path
code. Adding support in those modules for pathlib doesn't affect that
fact, but does make it easier to use pathlib "seamlessly", so why not
do so?

tl; dr; I'm 100% in favour of pathlib, and in the direction the
current discussion (excluding "let's give up on pathlib" digressions)
is going.


Best,
Sven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Donald Stufft

> On Apr 12, 2016, at 10:52 AM, Sven R. Kunze  wrote:
> 
> Path(...).open() is your friend then. I don't see why you need os.open.
> 
> Refusing to upgrade it like saying, everything was better in the old days. So 
> let's use os.open instead of Path(...).open().


I think it was a mistake to have Path(…).open to be honest and I think the main 
reason it exists is because open(Path(…)) doesn’t work (yet!). You can’t hang 
every single thing you might ever want to do to a Path off the path object.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Random832
On Tue, Apr 12, 2016, at 10:52, Sven R. Kunze wrote:
> On 12.04.2016 00:56, Random832 wrote:
> > Fully general re-dispatch from argument types on any call to a function
> > that raises TypeError or NotImplemented? [e.g. call
> > Path.__missing_func__(os.open, path, mode)]
> >
> > Have pathlib monkey-patch things at import?
> 
> Implicit conversion. No, thanks.

No more so than __radd__ - I didn't actually mean this as a serious
suggestion, but but python *does* already have multiple dispatch.

> > On Mon, Apr 11, 2016, at 17:43, Sven R. Kunze wrote:
> > I don't get what you mean by this whole "different level of abstraction"
> > thing, anyway.
> 
> Strings are strings. Paths are paths. That's were the difference is.

Yes but why aren't these both "things that you may want to use to open a
file"?

> > The fact that there is one obvious thing to want to do
> > with open and a Path strongly suggests that that should be able to be
> > done by passing the Path to open.
> 
> Path(...).open() is your friend then. I don't see why you need os.open.

Because I'm passing it to modfoo.dosomethingwithafile() which takes a
filename and passes it to shutils, which passes it to builtin open,
which passes it to os.open.

Should Path grow a dosomethingwithmodfoo method?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Chris Angelico
On Tue, Apr 12, 2016 at 7:58 AM, Ethan Furman  wrote:
> Sticking points:
> ---
>
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we allow
> bytes from __fspath__()?
>

I would say No and No, on the basis that it's *far* easier to widen
their scope in 3.7 than to narrow it. Once you declare that one or
both of these may return bytes, it becomes an annoying incompatibility
to change that (even if it *is* marked provisional), which almost
certainly means it won't happen. By restricting them both, we force
the issue: if you want bytes, you'll know about it.

I'd also prefer to stick to Unicode path names, for reasons I've
stated in other threads. Undecodable path byte streams can be handled
already, so what are we really gaining by allowing a Path-like object
to emit bytes? If it becomes a major issue for a lot of types, it
wouldn't be hard to add a helper function somewhere (or a mixin class
that provides a ready-to-go __fspath__, which might well be
sufficient).

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Sven R. Kunze

Sorry for disturbing this thread's harmony.


On 12.04.2016 08:00, Ethan Furman wrote:

On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote:


Consider os.path.join:


Why in the world do the  os.path functions need to work with Path
objects? ( and other conforming objects)


Because library XYZ that takes a path and wants to open it shouldn't 
have to care whether that path is a string or pathlib.Path -- but if 
os.open can't use pathlib.Path then the library has to care (or the 
user has to care).



This all started with the goal of using Path objects in the stdlib,
but that's for opening files, etc.


Etc. as in os.join?  os.stat? os.path.split?


Path is an alternative to os.path -- you don't need to use both.




I agree with that quote of Chris.

As a user you don't, no.  As a library that has no control over what 
kind of "path" is passed to you -- well, if os and os.path can accept 
Path objects then you can just use os and os.path; otherwise you have 
to use os and os.path if passed a str or bytes, and pathlib.Path if 
passed a pathlib.Path -- so you do have to use both.


I don't agree here. There's no need to increase the convenience for a 
library maintainer when it comes to implicit conversions.


When people want to use your library and it requires a string, the can 
simply use "my_path.path" and everything still works for them when they 
switch to pathlib.



Best,
Sven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > One possible way to address this concern would be to have the
 > underlying protocol be bytes/str (since boundary code frequently
 > needs to handle the paths-are-bytes assumption in POSIX),

What "needs"?  As has been pointed out several times, with PEP 383 you
can deal with bytes losslessly by using an arbitrary codec and
errors=surrogateescape.  I know why *I* use bytes nevertheless:
because when I must guess the encoding, it just makes more sense to
read bytes and then iterate over codecs until the result looks like
words I know in some language.

I don't understand why people who mostly believe "bytes are text, too"
because almost all they ever see are bytes in the range 0x00-0x7f need
bytes.  For them, fsdecode and fsencode DTRT.

If you want to claim "efficiency", I can't gainsay since I don't know
the applications, but if you're trying to manipulate file names
millions of times per second, I have to wonder what you're doing with
them that benefits so much from Path.

 > but offer an "os.fspathname" API that rejected bytes output from
 > os.fspath.

Either it's a YAGNI because I'm not going to get any bytes in the
first place, or it raises where I probably could have done something
useful with bytes if I were expecting them (see "pathological" below).

 > That way folks that wanted the clean "must be str" signature

Er, I don't need no steenkin' "clean signature".  I need str, and if
I can't get it from __fspath__, there's always os.fsdecode.  But this
is serious horse-before cart-putting, punishing those who do things
Python-3-ishly right.

 > The ambiguity in question here is inherent in the differences between
 > the way POSIX and Windows work,

Not with PEP 383, it's not.  And I don't do Windows, so my preference
for str has nothing to do with it mapping to native OS APIs well.

The ambiguity in question here is inherent in the differences between
the ways Python 2 and Python 3 programmers work on POSIX AFAICS.
Certainly, there will be times when fsdecode doesn't DTRT.  So those
times you have to use an explicit bytes.decode.  Note that when you
*do* care enough to do that, it's because the Path is *text* -- you're
going to display it to a human, or pass it out of the module.  If all
you're going to do is access the filesystem object denoted, fsdecode
does a sufficiently accurate job.

So if for some reason you're getting bytes at the boundary, I see no
reason why you can't have a convenience constructor

def pathological(str_or_bytes_or_path_seq):
args = []
for s_o_b in str_or_bytes_or_path_seq:
args.append(os.fsdecode(s_o_b) if isinstance(s_o_b, bytes) else s_o_b)
return pathlib.Path(str_or_path_list)

for when that's good enough (maybe Antoine would even allow it into
pathlib?)

 > so there are limits to how far we can go in hiding it without
 > making things worse rather than better.

What "hide"?  Nobody is suggesting that the polymorphic os APIs should
go away.  Indeed, they are perfect TOOWTDI, giving the programmer
exactly the flexibility needed *and no more*, *at* the boundary.

The questions on my mind are:

(A) Why does anybody need bytes out of a pathlib.Path (or other
__fspath__-toting, higher-level API) *inside* the boundary?  Note
that the APIs in os (etc) *don't need* bytes because they are
already polymorphic.

(B) If they do, why can't they just apply bytes() to the object?  I
understand that that would offend Ethan's aesthetic sense, so it's
worth looking for a nice way around it.  But allowing __fspath__
to return bytes or str is hideous, because Paths are clearly on
the application side of the boundary.

Note that bytes() may not have the serious problem that str() does of
being too catholic about its argument: nothing in __builtins__ has a
__bytes__!  Of course there are a few things that do work: ints, and
sequences of ints.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Sven R. Kunze

On 12.04.2016 16:59, Random832 wrote:


Strings are strings. Paths are paths. That's were the difference is. 

Yes but why aren't these both "things that you may want to use to open a
file"?


Because "things that you may want to use to open a file" is a bit vague 
and thus conceal the fact that we really need.


As an example: time.sleep takes a number of seconds (notice the 
primitive datatype just like a string) and does not take timedelta.


Why don't we add datetime.timedelta support to time.sleep? Very same thing.


The fact that there is one obvious thing to want to do
with open and a Path strongly suggests that that should be able to be
done by passing the Path to open.

Path(...).open() is your friend then. I don't see why you need os.open.

Because I'm passing it to modfoo.dosomethingwithafile() which takes a
filename and passes it to shutils, which passes it to builtin open,
which passes it to os.open.

Should Path grow a dosomethingwithmodfoo method?


Because we can argue here the other way round and say:

"oh, pathlib can do things, I cannot do with os.path."

Should os.path grow those things?


Put differently, you cannot do everything. But the most common issues 
should be resolved in the correct module. This is no argument for or 
against either solution.



I am sorry, if my contribution on the threads of python-ideas made it 
seem that I would always support this idea. I don't anymore. However, I 
will still be happy with the outcome even if not perfect, will help 
making the Python stdlib better. :)


Best,
Sven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Chris Barker
one little note:

On Tue, Apr 12, 2016 at 3:41 AM, Paul Moore  wrote:

> 4. There are further improvements that could be made to pathlib,
> certainly, but again they are optional, and pathlib is fine without
> them.
>

Exactly -- "improvements to pathlib" and "make the stdlib pathlib
compatible" are completely orthogonal.


> 5. I wish more 3rd party code integrated better with pathlib. The
> improved integration work might help with this. But ultimately, Python
> 2 compatibility is likely to be the biggest block (either perceived or
> real - we can make pathlib support as simple as possible, but some 3rd
> party authors will remain unwilling to add support for Python 3 only
> features in the short term). This isn't a pathlib problem.
>

true -- though the proposed protocol approach opens doors there -- any
third party lib can check for a __whatever_it's_called__ and run fine in
py2 or py3 or, indeed, any version of python.

Also if you really don't like pathlib, then the protocol allows you to
write/use a different path implementation -- really win-win.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Chris Barker
On Tue, Apr 12, 2016 at 7:54 AM, Sven R. Kunze  wrote:

>
> My conclusion is that these changes are not optional and tweaking os, io
> and shutil is just yet another workaround for a clean solution. :)
>

Is the clean solution to re-implement EVERYTHING in the stdlib that
involves a path in a new, fancy pathlib way?

If we were starting from scratch, I _might_ like that idea, but we're not
starting from scratch. And that would cement in pathlib itself, leaving no
room for other path implementations. kind of like how the pre-__Index__
python cemented in python integers as the only objects once could use to
index a sequence.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Ethan Furman

On 04/11/2016 02:58 PM, Ethan Furman wrote:

Sticking points:
---

Do we allow bytes to be returned from os.fspath()?  If yes, then do we
allow bytes from __fspath__()?


On 04/11/2016 10:28 PM, Stephen J. Turnbull wrote:
> In text applications, "bytes as carcinogen" is an apt metaphor.

On 04/12/2016 08:25 AM, Chris Angelico wrote:
> I would say No and No, on the basis that it's *far* easier to widen
> their scope in 3.7 than to narrow it.

On 04/11/2016 08:45 PM, Nick Coghlan wrote:
> I've come around to the point of view that allowing both str and
> bytes-like objects to pass through unchanged makes sense, with the
> rationale being the one someone mentioned regarding ease-of-use in
> os.path.
[...]

One possible way to address this concern would be to have the
underlying protocol be bytes/str (since boundary code frequently needs
to handle the paths-are-bytes assumption in POSIX), but offer an
"os.fspathname" API that rejected bytes output from os.fspath.


I think this is the way forward:  offer a standard way to get 
paths-as-strings, with an easily supported way of working with 
paths-as-bytes.


This could be with on os.fspathname() & os.fspath() pair of functions, 
or with a single function that has a parameter specifying what to do 
with bytes objects: reject (default), accept, or (maybe) an encoding to 
use to coerce to bytes.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Sven R. Kunze

On 12.04.2016 18:04, Chris Barker wrote:
On Tue, Apr 12, 2016 at 7:54 AM, Sven R. Kunze > wrote:



My conclusion is that these changes are not optional and tweaking
os, io and shutil is just yet another workaround for a clean
solution. :)


Is the clean solution to re-implement EVERYTHING in the stdlib that 
involves a path in a new, fancy pathlib way?


If we were starting from scratch, I _might_ like that idea, but we're 
not starting from scratch. And that would cement in pathlib itself, 
leaving no room for other path implementations. kind of like how the 
pre-__Index__ python cemented in python integers as the only objects 
once could use to index a sequence.


I cannot remember us using another datetime library. So, I don't value 
this "advantage" as much as you do.



Best,
Sven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Ethan Furman

On 04/11/2016 04:43 PM, Victor Stinner wrote:

Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit :



So my concern in such a case is what happens if we pass this SE
string somewhere else: a UTF-8 file, or over a socket, or into a
database? Does this have issues that we wouldn't face if we just used bytes?


"SE string" are returned by os.listdir(str), os.walk(str),
os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
the sun.


So when we pass a bytes object in, Python (on posix) converts that to a 
string using surrogateescape, gets back strings from the os, and encodes 
them back to bytes, again using surrogateescape?




Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
error.


latin1?  I thought latin1 had a code point for 0-255, so how could using 
it raise an encoding error?


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Chris Barker
On Mon, Apr 11, 2016 at 10:40 PM, Greg Ewing 
wrote:
>
> So the ONLY thing
>> you should do with it is pass it along to another low level system
>> call.
>>
>
> Not quite -- you can separate it into components and
> work with them. Essentially the same set of operations
> that os.path provides.
>

ahh yes, so while posix claims that paths are "just a char*", they are
really bytes where we can assume that the byte with value 2F is the pathsep
(and that 2E separates an extension?), so I suppose os.path is useful. But
I still think that most of us should never deal with bytes paths, and the
few that need to should just work with the low level functions and be done
with it.

One more though came up just now: there are different level sof
abstractions and representations for paths. We don't want to make Path a
subclass of string, because Path is supposed to be a higher level
abstraction -- good.

then at the bottom of the stack, we NEED the bytes level path, because that
what ultimately gets passed to the OS.

THe legacy from the single-byte encoding days is that bytes and strings
were the same, so we could let people work with nice human readable
strings, while also working with byte paths in the same way -- but those
days are gone -- py3 make s clear (and important) distiction between nice
human readable strings  and the bytes that represent them.

So: why use strings as the lingua franca of paths? i.e. the basis of the
path protocol. maybe we should support only two path representations:

1) A "proper" path object -- i.e. pathlib.Path or anything else that
supports the path protocol.

2) the bytes that the OS actually needs.

this would mean that the protocol would be to have a __pathbytes__() method
that woulde return the bytes that should be passed off to the OS.

A posix Path implementation could store that internal bytes representation,
so it could pass it off unchanged if that's all you need to do.

Any current API that takes bytes could be made to easily work.

I'm SURE I'm missing something really big here, but it seems like maybe
it's better to get farther from "strings as paths" rather than closer to
it

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Chris Angelico
On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman  wrote:
> On 04/11/2016 04:43 PM, Victor Stinner wrote:
>>
>> Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit :
>
>
>>> So my concern in such a case is what happens if we pass this SE
>>> string somewhere else: a UTF-8 file, or over a socket, or into a
>>> database? Does this have issues that we wouldn't face if we just used
>>> bytes?
>>
>>
>> "SE string" are returned by os.listdir(str), os.walk(str),
>> os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under
>> the sun.
>
>
> So when we pass a bytes object in, Python (on posix) converts that to a
> string using surrogateescape, gets back strings from the os, and encodes
> them back to bytes, again using surrogateescape?
>
>
>> Trying to encode a surrogate to ascii, latin1 or utf8 raise an encoding
>> error.
>
>
> latin1?  I thought latin1 had a code point for 0-255, so how could using it
> raise an encoding error?

Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
string will *decode*. It only defines 256 characters as having
equivalent bytes, though, so *encoding* can fail.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Koos Zevenhoven
On Tue, Apr 12, 2016 at 11:56 AM, Nick Coghlan  wrote:
> One possible way to address this concern would be to have the
> underlying protocol be bytes/str (since boundary code frequently needs
> to handle the paths-are-bytes assumption in POSIX), but offer an
> "os.fspathname" API that rejected bytes output from os.fspath. That
> is, it would be equivalent to:
>
> def fspathname(path):
> name = os.fspath(path)
> if not isinstance(name, str):
> raise TypeError("Expected str for pathname, not
> {}".format(type(name)))
> return name
>
> That way folks that wanted the clean "must be str" signature could use
> os.fspathname, while those that wanted to accept either could use the
> lower level os.fspath.

I'm not necessarily opposed to this. I kept bringing up bytes in the
discussion because os.path.* etc. and DirEntry support bytes and will
need to keep doing so for backwards compatibility.  I have no
intention to use bytes pathnames myself. But it may break existing
code if functions, for instance, began to decode bytes paths to str if
they did not previously do so (or to reject them). It is indeed a lot
safer to make new code not support bytes paths than to change the
behavior of old code.

But then again, do we really recommend new code to use os.fspath (or
os.fspathname)? Should they not be using either pathlib or os.path.*
etc. so they don't have to care? I'm sure Ethan and his library (or
some other path library) will manage without the function in the
stdlib, as long as the dunder attribute is there.

So I'm, once again, posing this question (that I don't think got any
reactions previously): Is there a significant audience for this new
function, or is it enough to keep it a private function for the stdlib
to use? That handful of third-party path libraries can decide for
themselves if they want to (a) reject bytes or (b) implicitly fsdecode
them or (c) pass them through just like str, depending on whatever
their case requires in terms of backwards compatiblity or other goals.

If we forget about the os.fswhatever function, we only have to decide
whether the magic dunder attribute can be str or bytes or just str.

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Koos Zevenhoven
On Tue, Apr 12, 2016 at 7:19 PM, Chris Barker  wrote:
>
> One more though came up just now: there are different level sof abstractions
> and representations for paths. We don't want to make Path a subclass of
> string, because Path is supposed to be a higher level abstraction -- good.
>
> then at the bottom of the stack, we NEED the bytes level path, because that
> what ultimately gets passed to the OS.
>
> THe legacy from the single-byte encoding days is that bytes and strings were
> the same, so we could let people work with nice human readable strings,
> while also working with byte paths in the same way -- but those days are
> gone -- py3 make s clear (and important) distiction between nice human
> readable strings  and the bytes that represent them.
>
> So: why use strings as the lingua franca of paths? i.e. the basis of the
> path protocol. maybe we should support only two path representations:
>
> 1) A "proper" path object -- i.e. pathlib.Path or anything else that
> supports the path protocol.
>
> 2) the bytes that the OS actually needs.
>

You do have a point there. But since bytes pathnames are deprecated on
windows, this seems to lead to supporting both str and bytes in the
protocol, or having two protocols __fspathbytes__ and __fspathstr__
(and one being preferred over the other, potentially even depending on
the platform).,

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Chris Barker
On Tue, Apr 12, 2016 at 8:57 AM, Sven R. Kunze  wrote:

> As an example: time.sleep takes a number of seconds (notice the primitive
> datatype just like a string) and does not take timedelta.
>
> Why don't we add datetime.timedelta support to time.sleep? Very same thing.


yup -- and it there were a lot of commonly used APIs that took strings, and
multiple timedelta implementations, then it would make sense to introduce a
__seconds_int__ protocol.

I don't think the use-cases rise to that level, myself. Though if someone
wanted to put a call in to obj.totalseconds() into time.sleep, that might
actually be worth it :-)

(now that yo mention it -- I have a substantial library that uses seconds
internally, and currently has an ugly sometimes integer seconds, sometimes
timedelta API -- maybe I'll introduce that protocol. Not sure why I didn't
think of that before now.

Because I'm passing it to modfoo.dosomethingwithafile() which takes a
>> filename and passes it to shutils, which passes it to builtin open,
>> which passes it to os.open.
>>
>> Should Path grow a dosomethingwithmodfoo method?
>
>
It can't -- modfoo could be a third-party module -- it is impossible for
Path to grow everything that any third party module might support.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Chris Barker
On Tue, Apr 12, 2016 at 9:20 AM, Chris Angelico  wrote:

> > latin1?  I thought latin1 had a code point for 0-255, so how could using
> it
> > raise an encoding error?
>
> Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
> string will *decode*. It only defines 256 characters as having
> equivalent bytes, though, so *encoding* can fail.
>

unless it was decoded as latin-1 in the first place. doesn't the surrogate
escape thing only work properly if you decode/encode with the same encoding?

-CHB




Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-12 Thread Ethan Furman

On 04/12/2016 09:20 AM, Chris Angelico wrote:

On Wed, Apr 13, 2016 at 2:15 AM, Ethan Furman



latin1?  I thought latin1 had a code point for 0-255, so how could using it
raise an encoding error?


Latin-1 / ISO-8859-1 defines a character for every byte, so any byte
string will *decode*. It only defines 256 characters as having
equivalent bytes, though, so *encoding* can fail.


Ah, right -- so if you start with bytes it cannot fail, if you start 
with a string it can.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Ethan Furman

On 04/12/2016 09:26 AM, Koos Zevenhoven wrote:


So I'm, once again, posing this question (that I don't think got any
reactions previously): Is there a significant audience for this new
function, or is it enough to keep it a private function for the stdlib
to use?


Quite frankly, I expect the stdlib itself to be the primary consumer. 
But I see no reason to not publish the function so that users who need 
the advanced functionality have easy access to it.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Chris Barker
On Tue, Apr 12, 2016 at 9:32 AM, Koos Zevenhoven  wrote:

> > 1) A "proper" path object -- i.e. pathlib.Path or anything else that
> > supports the path protocol.
> >
> > 2) the bytes that the OS actually needs.
> >
>
> You do have a point there. But since bytes pathnames are deprecated on
> windows,


Ah -- there's the fatal flaw -- even Windows needs bytes at the lowest
level, but the decision was already made there to use str as the the
lingua-franca -- i.e. the user NEVER sees a path as a bytestring on
Windows? I guess that's decided then. str is the exchange format.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Random832
On Tue, Apr 12, 2016, at 12:40, Chris Barker wrote:
> Ah -- there's the fatal flaw -- even Windows needs bytes at the lowest
> level,

Only in the sense that literally everything's bytes at the lowest level.
But the bytes Windows needs are not in an ASCII-compatible encoding so
it's not reasonable to talk about them in the same way as every other
kind of bytes filename.

> but the decision was already made there to use str as the the
> lingua-franca -- i.e. the user NEVER sees a path as a bytestring on
> Windows? I guess that's decided then. str is the exchange format.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Barry Scott
On Mon, 11 Apr 2016 14:15:02 -0700
Ethan Furman  wrote:

> We've pretty decided that we have two options:
> 
> 1. remove pathlib
> 2. make the stdlib work with pathlib
> 
> So we're trying to make option 2 work before falling back to option 1.

I have been doing a lot of porting to Python 3 and have really enjoyed
having pathlib, even in its current state.

In one of my previous projects using python 2 on linux we had to code to
handle files with names that where not utf-8. (Users could FTP a file
into the file system and it could end up non-utf-8).

Today we would have used pathlib to represent paths in the app.
But we would need to be able to detect the paths that do not following 
the fs encoding rules.

I would suggest a predicate in Path to report that the path cannot be
encoding without the use of surrogates. Not sure what to call the
predicate.

This can be used by code that cares to handle converting the path into
a suitable presentation string for showing to a user. I'm assuming here
that PEP383 may not provide an presentation string that is suitable for
showing to users.

In the case of our product we refused to use files that did not encode
to utf-8 and had a UI to allow the user to fix the name. 

One reason for files that can only be represented as bytes()
being detectable I suspect is to avoid security issues. I think
if I have my black hat on I would probe a python3 app with filenames
that are non-utf-8 and see if I can break the app.

Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Koos Zevenhoven
On Tue, Apr 12, 2016 at 6:52 PM, Stephen J. Turnbull  wrote:
>
> (A) Why does anybody need bytes out of a pathlib.Path (or other
> __fspath__-toting, higher-level API) *inside* the boundary?  Note
> that the APIs in os (etc) *don't need* bytes because they are
> already polymorphic.
>

Indeed not from pathlib.*Path , but from DirEntry, which may have a
path as bytes. So the options for DirEntry (or things like Ethan's
'antipathy') are:

(1) Provide bytes or str via the protocol, depending on which type
this DirEntry has

Downside: The protocol needs to support str and bytes.

(2) Decode bytes using os.fsdecode and provide a str via the protocol

Downside: The user passed in bytes and maybe had a reason to do so.
This might lead to a weird mixture of str and bytes in the same code.

(3) Do not implement the protocol when dealing with bytes

Downside: If a function calling os.scandir accepts both bytes and str
in a duck-typing fashion, then, if this adopted something that uses
the new protocol, it will lose its bytes compatiblity. This risk might
not be huge, so perhaps (3) is an option?


> (B) If they do, why can't they just apply bytes() to the object?  I
> understand that that would offend Ethan's aesthetic sense, so it's
> worth looking for a nice way around it.  But allowing __fspath__
> to return bytes or str is hideous, because Paths are clearly on
> the application side of the boundary.
>
> Note that bytes() may not have the serious problem that str() does of
> being too catholic about its argument: nothing in __builtins__ has a
> __bytes__!  Of course there are a few things that do work: ints, and
> sequences of ints.

Good point. But this only applies to when the user _explicitly_ deals
with bytes. But when the user just deals with the type (str or bytes)
that is passed in, as os.path.* as well as DirEntry now do, this does
not work.

-Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maybe, just maybe, pathlib doesn't belong.

2016-04-12 Thread Alexander Walters

On 4/12/2016 12:14, Sven R. Kunze wrote:
I cannot remember us using another datetime library. So, I don't value 
this "advantage" as much as you do.


They exist, and there are many cases where you would use a datetime 
library other than datetime for various reasons (integration in third 
party systems is only one reason).  But this is just a tangent.


In fact the situation with pathlib is similar to datetime - before the 
inclusion of datetime in the stdlib, there were several datetime 
libraries available.  Before pathlib, there were several path object 
libraries.  Only now, the third party options offer a great deal of 
competition over the stdlib option, thus these many hundreds, if not 
thousands, of emails on the subject.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] ping on issue 18378: locale.getdefaultlocale() fails on recent Mac OS X

2016-04-12 Thread Chris Barker
Hi folks,

There have been multiple reports of folks having failures on startup of
matplotlib, which appears to be due to the most recent OS-X version setting
the locale weirdly. This was identified last summer in this issue:

http://bugs.python.org/issue18378

It looks like the issue was figured out, and even a patch contributed, but
it stalled out before being applied.

I have no idea if the patch is any good, but it would be great to get this
fixed!

-Thanks,
  -Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] List posting custom [was: current status of discussions]

2016-04-12 Thread Stephen J. Turnbull
The following is my opinion, as will become obvious, but it's based on
over a decade of observing these lists, and other open source
development lists.  In a context where some core developers have
unsubscribed from these lists, and others regularly report muting
threads with a certain air of asperity, I think it's worth the risk of
seeming arrogant to explain some of the customs (which are complex and
subtle) around posting to Python developer lists.  I'm posting
publicly because there are several new developers whose activity and
fresh perspective is very welcome, but harmony *is* being disturbed,
IMO unnecessarily.

This particular post caught my eye, but it's only an example of one of
the most unharmonious posting styles that has become common recently.
Attribution deliberately removed.

 > Sorry for disturbing this thread's harmony.

*sigh*  There is way too much of this on Python-Ideas recently, and
there shouldn't be any on Python-Dev.  So please don't.  Specifically,
disagreement with an apparently developing consensus is fine but
please avoid this:

 > >> Path is an alternative to os.path -- you don't need to use both.
 > 
 > I agree with that quote of Chris.

It's a waste of time to post *what* you agree with.[1]  Decisions are
not taken by vote in this community, except for the color of the
bikeshed, where it is agreed that *what* decision is taken doesn't
matter, but that some decision should be taken expeditiously.[2]
Chris already stated this position clearly and it's not a "color", so
there is no need to reiterate.  It simply wastes others' time to read
it.  (Whether it was a waste of the poster's time is not for me to
comment on.)

What matters to the decision is *why* you agree (or disagree).  If you
think that some of Chris's arguments are bogus (and should be
disregarded) and others are important, that is valuable information.
It's even better if you can shed additional light on the matter
(example below).

Also, expression of agreement is often a prelude to a request for
information.  "I agree with Z's post.  At least, I have never needed
X.  *When* do you need X?  Let's look for a better way than X!"

Unsupported (dis)agreement to statements about "needs" also may be
taken as *rude*, because others may infer your arrogant claim to know
what *they* do or don't need.  Admittedly there's a difficult
distinction here between Chris's *idiom* where "you don't need to"
translates to "In my understanding, it is generally not necessary to",
and your *unsupported* agreement, which in my dialect of English
changes the emphasis to imply you know better than those who disagree
with you and Chris.  And, of course, the position that others are "too
easily offended" is often reasonable, but you should be aware that
there will be an impact on your reputation and ability to influence
development of Python (even if it doesn't come near the point where
a moderator invokes "Code of Conduct").

"Me too" posts aren't entirely forbidden, but I feel that in Python
custom they are most appropriate when voting on bikeshed colors, and
as applause for a *technically* excellent suggestion.  They should be
avoided in the context of value judgments (of "need" and "simplicity",
for example) for the reason given above.

 > When people want to use your library and it requires a string, the
 > can simply use "my_path.path" and everything still works for them
 > when they switch to pathlib.

This is disrespectful in tone.  I don't know if you're responding to
Ethan here, but he's one of the authors in question.  We *know* that
Ethan doesn't like such inelegant idioms -- he said so -- where "this
object has an appropriate conversion to your argument type, so you
should apply it implicitly" is unambiguous.[3] So for him, it's *not*
so simple.  Since it's not a matter of voting, each proponent should
provide more contexts where preferred programming idioms are
"Pythonic" to sway the sense of the community, or if necessary, the
BDFL.

Where that aesthetic came up was in the context of consistently
wrapping arguments that might be Paths in str, as in

p = Path(*stuff) or defaultstring
# 500 lines crossing function and module boundaries!
with open(str(p)) as f:
process(f)

I think it was Nick who posted agreement with Ethan on the aesthetics
of str-wrapping.  If that were all, he probably wouldn't have posted
(see fn. 1), but he further pointed out that this application of str
is *dangerous* because *everything* in Python can be coerced to str.
That was a very valuable observation, which swayed the list in favor
of "Uh-oh, we can't recommend 'os.method(str(Path))'!"

This is my last post on this particular topic, but I will be happy to
discuss off-list.  (I may discuss further in public on my blog, but
first I have to get a blog. :-)


Footnotes: 
[1]  "You" is generic here.  There are a couple of developers whose
agreement has the status of pronouncement of Pythonicity.  Aspire to
that, but don't assume it 

Re: [Python-Dev] Not receiving bug tracker emails

2016-04-12 Thread Terry Reedy

On 4/4/2016 5:05 PM, Terry Reedy wrote:

Since a few days, I am getting bug tracker emails again, in my Inbox.  I 
just got a Rietveld review in the Inbox and I believe it went there 
directly instead of first to Junk.  Thank you to whoever made the 
improvements.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib - current status of discussions

2016-04-12 Thread Michael Mysinger via Python-Dev
Ethan Furman  stoneleaf.us> writes:
 
> Do we allow bytes to be returned from os.fspath()?  If yes, then do we 
> allow bytes from __fspath__()?

De-lurking. Especially since the ultimate goal is better interoperability, I 
feel like an implementation that people can play with would help guide the 
few remaining decisions. To help test the various options you could 
temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to both 
pathlib.__fspath__() and os.fspath(), with distinct configurable defaults for 
each. 

In the spirit of Python 3 I feel like bytes might not be needed in practice, 
but something like this with defaults of False will allow people to easily 
test all the various options.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com