Re: Promoting Python

2016-04-07 Thread Ian Kelly
On Thu, Apr 7, 2016 at 12:30 AM, Marko Rauhamaa  wrote:
> Or:
>
>When a class attribute reference (for class C, say) would yield a
>class method object, it is transformed into an instance method object
>whose __self__ attributes is C.
>https://docs.python.org/3/reference/datamodel.html?highlight=__g
>etattr__#the-standard-type-hierarchy>
>
> So the only difference between a regular function and an instance method
> object is the fact that the latter has a __self__ attribute set.
>
> Although even that small difference can be paved over:
>
> def g():
> print("g")
> g.__self__ = a
> a.f = g

What is this example supposed to accomplish?  Functions don't merely
not have a __self__ attribute set. The __self__ attribute has no
meaning on a function.

Let's take a different example.


class Dialog(Window):

def __init__(self, parent, title, ok_callback):
super().__init__(parent, title)
self._ok_callback = ok_callback
self._ok_button = Button(self, 'Ok')
self._ok_button.bind(self._ok_callback)

def f(event):
print("Hello world")

dialog = Dialog(None, "Example", f)
dialog.show()


Are you suggesting that dialog._ok_callback should be considered a
method of Dialog, despite the fact that the implementation of Dialog
and the implementation of f are entirely unrelated? If so, then I
think that most OOP practitioners would disagree with you.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Paul Rubin
Jon Ribbens  writes:
>> That string decodes to "__private".
> Yes, and? ... The namespace
> I was suggesting didn't provide access to any objects which have a
> 'get()' method which would access attributes.

I see, I forgot that getattr is a function, not an object method.
Though, now you've got the problem that there isn't enough capability
left to do much interesting.  I used web.py for a while, that had a
complete interpeter for a sandboxed Python-like language written in
Python itself.  That's a brutal way to deal with the problem, and it had
annoyances, but it seemed to work.  You presumably also want to limit
CPU usage etc.  

Geordi (the C++ irc bot) now just launches the user script in a Docker
container, I think.  Before that it had some fancier sandboxing
approaches.

Lua is supposed to be easy to embed and sandbox.  It might be
interesting to write Python bindings for the Lua interpreter sometime.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Promoting Python

2016-04-07 Thread Marko Rauhamaa
Ian Kelly :

> Let's take a different example.
>
> class Dialog(Window):
>
> def __init__(self, parent, title, ok_callback):
> super().__init__(parent, title)
> self._ok_callback = ok_callback
> self._ok_button = Button(self, 'Ok')
> self._ok_button.bind(self._ok_callback)
>
> def f(event):
> print("Hello world")
>
> dialog = Dialog(None, "Example", f)
> dialog.show()
>
> Are you suggesting that dialog._ok_callback should be considered a
> method of Dialog, despite the fact that the implementation of Dialog
> and the implementation of f are entirely unrelated? If so, then I
> think that most OOP practitioners would disagree with you.

First, terminology disputes are pointless.

No, I would never call f a method of Dialog. I might call it a method of
dialog, though.

A method is simply a callable attribute:

   Procedures in object-oriented programming are known as methods
   https://en.wikipedia.org/wiki/Object-oriented_programming>

(Now, in CLOS, GOOPS etc, methods are not attributes at all, but that's
another story.)


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: recursive methods require implementing a stack?

2016-04-07 Thread Michael Selik
On Thu, Apr 7, 2016, 7:51 AM Charles T. Smith 
wrote:

> On Wed, 06 Apr 2016 20:28:47 +, Rob Gaddi wrote:
>
> > Charles T. Smith wrote:
> >
> >> I just tried to write a recursive method in python - am I right that
> local
> >> variables are only lexically local scoped, so sub-instances have the
> same
> >> ones?  Is there a way out of that?  Do I have to push and pop my own
> simulated
> >> stack frame entry?
> >
> > You have been badly misled.  Python local variables are frame local, and
> > recursion just works.
>
>
> Well, I probably stumbled astray due to my own stupidity, can't blame
> anybody
> of having misled me...  ;)
>
> So it's a bug in my program!  Good news!  Thank you.
>

I'm guessing you are passing a list or dict to the recursive call and
discovering that the object is passed rather than a copy. Note that this is
not pass-by-reference, but "pass by object", for your Googling.

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ola

2016-04-07 Thread Manolo Martínez
On 04/06/16 at 08:59pm, Joel Goldstick wrote:
> 2016-04-05 20:35 GMT-04:00 majoxd hola :
> > me podrían enviar el programa Python spyder para Windows?
> >
 
> This is an english language list.  And besides, your question I could
> send the Python program spyder for Windows? is awfully vague

They are asking for the list to send them spyder.

Hola, esta lista es en inglés. Te irá mejor en la lista de python en
español, aquí: https://mail.python.org/mailman/listinfo/python-es

Respecto a Spyder, creo que la mejor manera de obtenerlo es a través de
Anaconda: https://www.continuum.io/downloads. Pero no soy usuario de
Windows ni de Spyder.

Manolo
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unacceptable behavior

2016-04-07 Thread Ben Finney
Ethan Furman  writes:

> You are hereby placed in moderation for the Python List mailing list.

Thanks for taking action to maintain a healthy community, Ethan.

-- 
 \   “I don't know anything about music. In my line you don't have |
  `\ to.” —Elvis Aaron Presley (1935–1977) |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


deque is not a subclass of Sequence.

2016-04-07 Thread Antoon Pardon
Playing around with the collections and collections.abc modules in
python3.4 I stumbled upon the following: >>> from collections.abc import
Sequence >>> from collections import deque >>> isinstance(list(),
Sequence) True >>> isinstance(deque(), Sequence) False >>> This seems
strange to me. As far as I understand, the documentation indicates there
is no reason why deque shouldn't be a subclass of Sequence. Am I missing
something or can this be considered a bug? -- Antoon.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unacceptable behavior

2016-04-07 Thread Rustom Mody
On Thursday, April 7, 2016 at 12:20:32 PM UTC+5:30, Ethan Furman wrote:
> On 04/05/2016 01:05 PM, Thomas 'PointedEars' Lahn wrote:
> 
>  > | >>> from email import ID10T
> 
> Thomas, as has been pointed out to you in previous threads it is not 
> necessary to be rude to be heard.
> 
> You are hereby placed in moderation for the Python List mailing list.
> 
> Every one else:  If you see offensive posts from Thomas on the usenet 
> side, please just ignore them.  I have no desire to see his posts in 
> your replies.
> 
> --
> ~Ethan~
> Python List Owners

Thanks for making the list more pleasant at the expense of increasing work for
yourself
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: deque is not a subclass of Sequence.

2016-04-07 Thread Antoon Pardon
Second tryal, I hope the formatting doesn't get messed up now

Playing around with the collections and collections.abc modules in
python3.4 I stumbled upon the following:

>>> from collections.abc import Sequence
>>> from collections import deque
>>> isinstance(list(), Sequence)
True
>>> isinstance(deque(), Sequence)
False

This seems strange to me. As far as I understand, the
documentation indicates there is no reason why deque
shouldn't be a subclass of Sequence.

Am I missing something or can this be considered a bug?

-- 
Antoon.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: deque is not a subclass of Sequence.

2016-04-07 Thread Peter Otten
Antoon Pardon wrote:

> Second tryal, I hope the formatting doesn't get messed up now
> 
> Playing around with the collections and collections.abc modules in
> python3.4 I stumbled upon the following:
> 
 from collections.abc import Sequence
 from collections import deque
 isinstance(list(), Sequence)
> True
 isinstance(deque(), Sequence)
> False
> 
> This seems strange to me. As far as I understand, the
> documentation indicates there is no reason why deque
> shouldn't be a subclass of Sequence.
> 
> Am I missing something or can this be considered a bug?

>>> from collections import deque
>>> from collections.abc import Sequence
>>> [name for name in set(dir(Sequence)) - set(dir(deque)) if not 
name.startswith("_")]
['index']

So the index() method seems to be what is missing.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: deque is not a subclass of Sequence.

2016-04-07 Thread Antoon Pardon
Op 07-04-16 om 11:12 schreef Peter Otten:
>
 from collections import deque
 from collections.abc import Sequence
 [name for name in set(dir(Sequence)) - set(dir(deque)) if not 
> name.startswith("_")]
> ['index']
>
> So the index() method seems to be what is missing.

the index() method seems to be added in 3.5, so is deque
a subclass of Sequence in 3.5?

-- 
Antoon. 

-- 
https://mail.python.org/mailman/listinfo/python-list


read a file and remove Mojibake chars

2016-04-07 Thread Daiyue Weng
Hi, when I read a file, the file string contains Mojibake chars at the
beginning, the code is like,

file_str = open(file_path, 'r', encoding='utf-8').read()
print(repr(open(file_path, 'r', encoding='utf-8').read())

part of the string (been printing) containing Mojibake chars is like,

  '锘縶\n "name": "__NAME__"'

I tried to remove the non utf-8 chars using the code,

def read_config_file(fname):
with open(fname, "r", encoding='utf-8') as fp:
for line in fp:
line = line.strip()
line = line.decode('utf-8','ignore').encode("utf-8")

return fp.read()

but it doesn't work, so how to remove the Mojibakes in this case?

many thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: read a file and remove Mojibake chars

2016-04-07 Thread Ben Finney
Daiyue Weng  writes:

> Hi, when I read a file, the file string contains Mojibake chars at the
> beginning

You are explicitly setting an encoding to read the file; that is good,
since Python should not guess the input encoding.

The reason it's good is because the issue, of knowing the correct text
encoding, is dealt with immediately. I am guessing the text encoding may
be not as you expect.

Are you certain the text encoding is “utf-8”? Can you verify that with
whatever created the file — what text encoding does it use to write that
file?

-- 
 \  “Advertising is the price companies pay for being unoriginal.” |
  `\—Yves Béhar, _New York Times_ interview 2010-12-30 |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: deque is not a subclass of Sequence.

2016-04-07 Thread Peter Otten
Antoon Pardon wrote:

> Op 07-04-16 om 11:12 schreef Peter Otten:
>>
> from collections import deque
> from collections.abc import Sequence
> [name for name in set(dir(Sequence)) - set(dir(deque)) if not
>> name.startswith("_")]
>> ['index']
>>
>> So the index() method seems to be what is missing.
> 
> the index() method seems to be added in 3.5, so is deque
> a subclass of Sequence in 3.5?

Yes, according to the only 3.5 interpreter I have currently available:

python3.5
Python 3.5.0b2+ (3.5:9aee273bf8b7+, Jun 25 2015, 09:25:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import deque
>>> from collections.abc import Sequence
>>> isinstance(deque(), Sequence)
True


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: read a file and remove Mojibake chars

2016-04-07 Thread Peter Otten
Daiyue Weng wrote:

> Hi, when I read a file, the file string contains Mojibake chars at the
> beginning, the code is like,
> 
> file_str = open(file_path, 'r', encoding='utf-8').read()
> print(repr(open(file_path, 'r', encoding='utf-8').read())
> 
> part of the string (been printing) containing Mojibake chars is like,
> 
>   '锘縶\n "name": "__NAME__"'
> 
> I tried to remove the non utf-8 chars using the code,
> 
> def read_config_file(fname):
> with open(fname, "r", encoding='utf-8') as fp:
> for line in fp:
> line = line.strip()
> line = line.decode('utf-8','ignore').encode("utf-8")
> 
> return fp.read()
> 
> but it doesn't work, so how to remove the Mojibakes in this case?

I'd first investigate if the file can correctly be decoded using an encoding 
other than UTF-8, but if it's really hopeless and your best bet is to ignore 
all non-ascii characters try

def read_config_file(fname):
with open(fname, "r", encoding="ascii", errors="ignore") as f:
return f.read()

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: deque is not a subclass of Sequence.

2016-04-07 Thread Rolf Camps



On 2016-04-07 11:12, Peter Otten wrote:

Antoon Pardon wrote:


Second tryal, I hope the formatting doesn't get messed up now

Playing around with the collections and collections.abc modules in
python3.4 I stumbled upon the following:


from collections.abc import Sequence
from collections import deque
isinstance(list(), Sequence)

True

isinstance(deque(), Sequence)

False

This seems strange to me. As far as I understand, the
documentation indicates there is no reason why deque
shouldn't be a subclass of Sequence.

Am I missing something or can this be considered a bug?

from collections import deque
from collections.abc import Sequence
[name for name in set(dir(Sequence)) - set(dir(deque)) if not

name.startswith("_")]
['index']

So the index() method seems to be what is missing.


The index method was added to the deque object in Python 3.5.

Python 3.5.1 (default, ...)
[GCC ...] on linux
>>> import collections
>>> isinstance(collections.deque(), collections.abc.Sequence)
True

--
https://mail.python.org/mailman/listinfo/python-list


join_paired_ends.py: error: option -f: file does not exist

2016-04-07 Thread Inya Ivano
Hi,

I've been having trouble with running my files:

>join_paired_ends.py -f 
>/home/qiime/Documents/Aleurone/1101-Pl1-A1_S193_L001_R2_001.fastq.gz -r 
>/home/qiime/Documents/Aleurone/1101-Pl1-A1_S193_L001_R1_001.fastq.gz -o 
>/home/qiime/Documents/Aleurone/Joined_1101

>join_paired_ends.py: error: option -f: file does not exist: 
>'/home/qiime/Documents/Aleurone/1101-Pl1-A1_S193_L001_R2_001.fastq.gz'

What I'm not sure about is whether there's a problem with the script or the 
virtualmachine settings, I'm using the latest Oracle VM VirtualBox version 
5.0.16 edition. 

Inya









-- 
https://mail.python.org/mailman/listinfo/python-list


Re: deque is not a subclass of Sequence.

2016-04-07 Thread Mark Lawrence via Python-list

On 07/04/2016 10:25, Antoon Pardon wrote:


the index() method seems to be added in 3.5, so is deque
a subclass of Sequence in 3.5?



Yes, this http://bugs.python.org/issue23704 refers.

Use the builtin 
https://docs.python.org/3/library/functions.html#issubclass to try it.


>>> issubclass(deque, Sequence)
True

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Python, Linux, default search places.

2016-04-07 Thread Frantisek . Fridrich
Hello.

I run a third party program that can use a system installation of Python. 
I have to modify environment variables: 
PYTHONPATH,
PATH,
LD_LIBRARY_PATH.

All these environment variables are empty at the beginning but Python uses 
a default or initial places to search for modules, libraries or 
executables.

Q1. Could anybody tell me where I could find default search places for 
environment variables PYTHONPATH, PATH, LD_LIBRARY_PATH?

OS: SUSE Linux Enterprise Server 11. 
HW: HP DL160 Gen8 SFF CTO.
Python 2.6.

Frantisek
-- 
https://mail.python.org/mailman/listinfo/python-list


how to convert code that uses cmp to python3

2016-04-07 Thread Antoon Pardon
I am looking at my avltree module for converting it to
python3.

One of the things that trouble me here is how python3 no
longer has cmp and how things have to be of "compatible"
type in order to be comparable.

So in python2 it wasn't a problem to have a tree with
numbers and strings as keys. In python3 that will not
be so evident.

In python2 descending the tree would only involve at
most one expensive comparison, because using cmp would
codify that comparison into an integer which would then
be cheap to compare with 0. Now in python3, I may need
to do two expensive comparisons, because there is no
__cmp__ method, to make such a codefication.

Is there a way to work around these limitations or
should I resign myself to working within them?

-- 
Antoon Pardon
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Jon Ribbens
On 2016-04-06, Steven D'Aprano  wrote:
> On Wed, 6 Apr 2016 03:48 am, Chris Angelico wrote:
>> On Wed, Apr 6, 2016 at 3:26 AM, Jon Ribbens
>>  wrote:
>>> The received wisdom is that restricted code execution in Python is
>>> an insolubly hard problem, but it looks a bit like my 7-line example
>>> above disproves this theory, 
>
> Jon's 7-line example doesn't come even close to providing restricted code
> execution in Python. What it provides is a restricted subset of expression
> evaluation, which is *much* easier.

It's true that I was using eval(), but I don't think that actually
fundamentally changes the game. Almost exactly the same sanitisation
method can be used to make exec() code safe. ("import" for example
does not work because there is no "__import__" in the provided
builtins, but even if it did work it could be trivially disallowed by
searching for ast.Import and ast.ImportFrom nodes. "with" must be
disallowed because otherwise __exit__ can be used to get a frame
object.)

> It's barely more powerful than the ast.safe_eval function.

I think you mean ast.literal_eval(), and you're misremembering.
That function isn't even a calculator, it won't even work out
"2*2" for you. It (almost) literally just parses literals ;-)

> [Jon again]
>>> provided you choose carefully what you 
>>> provide in your restricted __builtins__ - but people who knows more
>>> than me about Python seem to have thought about this problem for
>>> longer than I have and come up with the opposite conclusion so I'm
>>> curious what I'm missing.
>
> You're missing that they're trying to allow enough Python functionality to
> run useful scripts (not just evaluate a few arithmetic expressions), but
> without allowing the script to break out of the restricted environment and
> do things which aren't permitted.

Hmm, I'm not missing that, I even explicitly mentioned it previously.
I think you're also missing that eval() can do a very great deal more
than just "arithmetic expressions".

> For example, check out Tav's admirable work some years ago on trying to
> allow Python code to read but not write files:
>
> http://tav.espians.com/a-challenge-to-break-python-security.html

Indeed, I have read that and the follow-ups. He was again making it
hard for himself by trying to allow execution of completely arbitrary
code, and still almost every way to escape relied on "_" attributes
(or him missing the obvious point that you can't check a string is
safe by doing "if foo == 'blah'" if "foo" might be a subtype of
str with a malicious __eq__ method).

> You should also read Guido's comments on capabilities:
>
> http://neopythonic.blogspot.com.au/2009/03/capabilities-for-python.html

Thanks, that's interesting.

> As Zooko says, Guido's "best argument is that reducing usability (in terms
> of forbidding language features, especially module import) and reducing the
> usefulness of extant library code" would make the resulting interpreter too
> feeble to be useful.

Well, no. It makes it too feeble to be used as a generic programming
language. But there is a whole other class of uses for which it would
still be very useful - making very configurable or dynamic systems,
for example. I don't know, imagine github allowed you to upload
restricted-Python code that could be used as a server-side commit
hook, to take a completely random example, or you could upload code
that would generate reports or data for graphing.

> Look at what you've done: you've restricted the entire world of
> Python down to, effectively, a calculator and a few string methods.

Again, no not really. You've tuples, sets, lists, dictionaries,
lambdas, generator and list expressions, etc. And although I made my
example __builtins__ very restricted indeed, that was just because
I'm asking about the basic principle of the idea. If the idea is
ok then the builtins can be gone through one by one and added if
they're safe.

> All the obvious, and even not-so-obvious, attack tools are gone:
> eval, exec, getattr, type, __import__.

Indeed. The fundamental point is that we must not allow the attacker
to have access to any of those things, or to gain access by using any
of the tools which we have provided. I think this is not an impossible
problem.

> I think this approach is promising enough that Jon should take it to a few
> other places for comments, to try to get more eyeballs. Stackoverflow and
> Reddit's /r/python, perhaps. 

I'll post some example code on github in a bit and see what people
think.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Chris Angelico
On Thu, Apr 7, 2016 at 10:05 PM, Antoon Pardon
 wrote:
> I am looking at my avltree module for converting it to
> python3.
>
> One of the things that trouble me here is how python3 no
> longer has cmp and how things have to be of "compatible"
> type in order to be comparable.
>
> So in python2 it wasn't a problem to have a tree with
> numbers and strings as keys. In python3 that will not
> be so evident.
>
> In python2 descending the tree would only involve at
> most one expensive comparison, because using cmp would
> codify that comparison into an integer which would then
> be cheap to compare with 0. Now in python3, I may need
> to do two expensive comparisons, because there is no
> __cmp__ method, to make such a codefication.
>
> Is there a way to work around these limitations or
> should I resign myself to working within them?

First off, what does it actually *mean* to have a tree with numbers
and keys as strings? Are they ever equal? Are all integers deemed
lower than all strings? Something else?

Once you've figured out a sane definition, you can codify it one way
or another. For instance, if an integer is equal to a string
representing its digits, all you need to do is call str() on all keys.
Or if you want something like Python 2, where objects of different
types are sorted according to the names of the types, use this:

def keyify(obj):
return (type(obj).__name__, obj)

All type names will be strings, so they're comparable. Only if the
type names are the same will the objects themselves be compared.
Alternatively, if the *only* types you handle are numbers and strings,
you could use this:

def keyify(obj):
return (isinstance(obj, str), obj)

which lets all numbers group together, such that 1 < 2.0 < 3 as you'd
normally expect. All strings will be greater than all non-strings.

There's no __cmp__ method, but you could easily craft your own
compare() function:

def compare(x, y):
"""Return a number < 0 if x < y, or > 0 if x > y"""
if x == y: return 0
return -1 if keyify(x) < keyify(y) else 1

I'm not sure how your tree is crafted and how your "cheap" and
"expensive" comparisons previously worked, but give something like
this a try. I think you'll find it adequate.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: numpy arrays

2016-04-07 Thread Oscar Benjamin
On 6 April 2016 at 17:26, Heli  wrote:
>
> Thanks for your replies. I have a question in regard with my previous 
> question. I have a file that contains x,y,z and a value for that coordinate 
> on each line. Here I am giving an example of the file using a numpy array 
> called f.
>
> f=np.array([[1,1,1,1],
> [1,1,2,2],
> [1,1,3,3],
> [1,2,1,4],
> [1,2,2,5],
...
> [3,2,3,24],
> [3,3,1,25],
> [3,3,2,26],
> [3,3,3,27],
> ])
>
> then after tranposing f, I get the x,y and z coordinates:
> f_tranpose=f.T
> x=np.sort(np.unique(f_tranpose[0]))
> y=np.sort(np.unique(f_tranpose[1]))
> z=np.sort(np.unique(f_tranpose[2]))

You don't actually need to transpose the matrix to get a column as a 1D array:

>>> a = np.array([[1,2,3],[4,5,6]])
>>> a
array([[1, 2, 3],
   [4, 5, 6]])
>>> a[0]
array([1, 2, 3])
>>> a[0,:]  # 1st row
array([1, 2, 3])
>>> a[:,0]  # 1st column
array([1, 4])

(Not that there's a correctness/performance difference or anything I
just think that asking for the column is clearer than asking for the
row of the transpose.)

> Then I will create a 3D array to put the values inside. The only way I see to 
> do this is the following:
> arr_size=x.size
> val2=np.empty([3, 3,3])
>
> for sub_arr in f:
> idx = (np.abs(x-sub_arr[0])).argmin()
> idy = (np.abs(y-sub_arr[1])).argmin()
> idz = (np.abs(z-sub_arr[2])).argmin()
> val2[idx,idy,idz]=sub_arr[3]
>
> I know that in the example above I could simple reshape f_tranpose[3] to a 
> three by three by three array, but in my real example the coordinates are not 
> in order and the only way I see to do this is by looping over the whole file 
> which takes a lot of time.

So it's easy if the array is in order like your example? Why not sort
the array into order then? I think lexsort will do what you want:
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.lexsort.html
http://stackoverflow.com/questions/8153540/sort-a-numpy-array-like-a-table

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Chris Angelico
On Thu, Apr 7, 2016 at 10:13 PM, Jon Ribbens
 wrote:
> It's true that I was using eval(), but I don't think that actually
> fundamentally changes the game. Almost exactly the same sanitisation
> method can be used to make exec() code safe. ("import" for example
> does not work because there is no "__import__" in the provided
> builtins, but even if it did work it could be trivially disallowed by
> searching for ast.Import and ast.ImportFrom nodes. "with" must be
> disallowed because otherwise __exit__ can be used to get a frame
> object.)

Once statements are incorporated, you have three options regarding
exception handling:

1) Disallow try/except, flying in the face of modern language design
2) Allow only a bare except clause, flying in the face of modern Python advice
3) Allow access to all the built-in exception types.

Options 1 and 2 are nastily restricted. Option 3 is likely broken, as
exception objects carry tracebacks and such. And don't forget, you can
trigger exceptions very easily:

* TypeError by adding two literals of incompatible types, eg []+{}
* ValueError by casting inappropriate strings eg int("")
* ZeroDivisionError by, yaknow, dividing by zero
* NameError and UnboundLocalError with random names
* RecursionError by infinitely recursing
* Unicode{En,De}codeError with str/bytes methods
* KeyError/IndexError by subscripting
* OverflowError with float exponentiation, eg 2.0**1
* etc, etc, etc

Are you prepared to guarantee that there's no way to leak information
out of *any* exception? If not, you can't offer exceptions, which
means you can't offer try/except other than with a bare except.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: script exits prematurely with no stderr output, but with system errors

2016-04-07 Thread Larry Martell
On Sun, Mar 20, 2016 at 4:55 PM, Larry Martell  wrote:
> On Sun, Mar 20, 2016 at 4:47 PM, Joel Goldstick
>  wrote:
>> On Sun, Mar 20, 2016 at 4:32 PM, Larry Martell 
>> wrote:
>>
>>> I have a script that I run a lot - at least 10 time every day. Usually
>>> it works fine. But sometime it just stops running with nothing output
>>> to stdout or stderr. I've been trying to debug this for a while, and
>>> today I looked in the system logs and saw this:
>>>
>>> abrt: detected unhandled Python exception in
>>> '/home/prod_user/python/make_workitem_list.py'
>>> abrtd: Directory 'pyhook-2016-03-19-22:20:43-26461' creation detected
>>> abrt-server[3688]: Saved Python crash dump of pid 26461 to
>>> /var/spool/abrt/pyhook-2016-03-19-22:20:43-26461
>>> abrtd: Executable '/home/prod_user/python/make_workitem_list.py'
>>> doesn't belong to any package and ProcessUnpackaged is set to 'no'
>>> abrtd: 'post-create' on
>>> '/var/spool/abrt/pyhook-2016-03-19-22:20:43-26461' exited with 1
>>> abrtd: Deleting problem directory
>>> '/var/spool/abrt/pyhook-2016-03-19-22:20:43-26461'
>>> abrtd: make_workitem_list: page allocation failure. order:1, mode:0x20
>>> abrtd: Pid: 31870, comm: make_workitem_list Not tainted
>>> 2.6.32-573.12.1.el6.x86_64 #1
>>>
>>> I have never seen anything like this before. Usually, if there is an
>>> unhandled exception something is dumped to stderr. Anyone have any
>>> idea what is going on? How can I get it to not delete this crash dump
>>> it mentioned? I guess I can put a big exception handler around the
>>> enter script with a traceback.
>>>
>>> This is on Red Hat Enterprise Linux Server release 6.7 (Santiago).
>>
>> Googling I found this:
>> http://stackoverflow.com/questions/2628901/interpreting-kernel-message-page-allocation-failure-order1
>>
>> It seems that the kernel can't allocate memory is a likely cause.

I modified the program to use less memory and I am not getting the
page allocation failure any more, but I am still getting the unhanded
exceptions messages in the system logs.

> Yes, I was thinking that as well about the "page allocation failure"
> message, but it's almost like there were 2 errors, the first being the
> unhandled exception. But why would it not output something to stderr?

I was able to configure the abrt daemon to capture these unhanded
exceptions and they are just typical ValueErrors and TypeErrors (which
I will of course deal with in my program). But I wonder why these
exceptions were not just printed to stderr as other unhanded
exceptions are. What is special about these that made the OS get
involved?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: read a file and remove Mojibake chars

2016-04-07 Thread Chris Angelico
On Thu, Apr 7, 2016 at 6:47 PM, Daiyue Weng  wrote:
> Hi, when I read a file, the file string contains Mojibake chars at the
> beginning, the code is like,
>
> file_str = open(file_path, 'r', encoding='utf-8').read()
> print(repr(open(file_path, 'r', encoding='utf-8').read())
>
> part of the string (been printing) containing Mojibake chars is like,
>
>   '锘縶\n "name": "__NAME__"'
>
> I tried to remove the non utf-8 chars using the code,
>
> def read_config_file(fname):
> with open(fname, "r", encoding='utf-8') as fp:
> for line in fp:
> line = line.strip()
> line = line.decode('utf-8','ignore').encode("utf-8")
>
> return fp.read()
>
> but it doesn't work, so how to remove the Mojibakes in this case?

This won't work as it currently stands. You're looping over the file,
stripping, *DE*coding (which shouldn't work - although in Python 2, it
sorta-kinda might), re-encoding, and then dropping the lines on the
floor. Then, after you've closed the file, you try to read from it. So
yeah, it doesn't work.

But if you're able to read the file *at all* using your original code,
it must be a correctly-formed UTF-8 stream. The probability that
random non-ASCII bytes just happen to be UTF-8 decodable is
vanishingly low, so I suspect your data issue has nothing to do with
encodings.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: join_paired_ends.py: error: option -f: file does not exist

2016-04-07 Thread Steven D'Aprano
On Thu, 7 Apr 2016 08:07 pm, Inya Ivano wrote:

> Hi,
> 
> I've been having trouble with running my files:
> 
>>join_paired_ends.py -f
>>/home/qiime/Documents/Aleurone/1101-Pl1-A1_S193_L001_R2_001.fastq.gz -r
>>/home/qiime/Documents/Aleurone/1101-Pl1-A1_S193_L001_R1_001.fastq.gz -o
>>/home/qiime/Documents/Aleurone/Joined_1101
> 
>>join_paired_ends.py: error: option -f: file does not exist:
>>'/home/qiime/Documents/Aleurone/1101-Pl1-A1_S193_L001_R2_001.fastq.gz'

Does that file actually exist? What happens if you run:


ls -l /home/qiime/Documents/Aleurone/1101-Pl1-A1_S193_L001_R2_001.fastq.gz


or the equivalent for whatever operating system you are running?

(I'm guessing you're using some sort of Linux or Unix.)


> What I'm not sure about is whether there's a problem with the script or
> the virtualmachine settings, I'm using the latest Oracle VM VirtualBox
> version 5.0.16 edition.

To start with, you should try to get an traceback showing the *actual*
error. It looks like your script join_paired_ends.py catches the exception
and replaces the useful exception with a possibly misleading generic error
message. If the script does anything like this:


try:
open(thefile) ...

except Exception:
print("file does not exist")
sys.exit()



then you are making life harder for yourself by suppressing the real error
and replacing it with a generic and probably incorrect error message.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Install request

2016-04-07 Thread Oscar Benjamin
On 6 April 2016 at 05:08, Rustom Mody  wrote:
> On Wednesday, April 6, 2016 at 4:34:11 AM UTC+5:30, Steven D'Aprano wrote:
>> On Wed, 6 Apr 2016 02:52 am, Rustom Mody wrote:
>>
>> > On Tuesday, April 5, 2016 at 9:49:58 PM UTC+5:30, Oscar Benjamin wrote:
>> >> Another possibility to improve this situation would be to make a page
>> >> on the wiki that actually explains the known problems (and fixes) for
>> >> 3.5 on Windows.
>> >
>> > +10 on that one
>> > ie When a question becomes a FAQ just put it there
>>
>>
>> Would somebody who knows Windows actually do this and then post the link
>> here please?
>
> Even if some windows-knower replies to this mail, someone else can wikify and
> put up
>
> Data needed:
> Which windows version.
> What error message
> What action to do

So I'm not a Windows user but it would be good for someone who is to
put up a full step-by-step (with screenshots) explanation of running
the python.org installer on Windows and then launching Python (by
running IDLE and/or from the terminal). One of the problems we're
having with some of the questions about this is that it isn't even
clear whether someone reporting these problems is referring to the
installer or to running Python or IDLE or what because they don't
understand enough about what's going on to answer those questions:
some pictures would really help with that.

The problems off the top of my head that have come up on this list are:

Python 3.5 will not work with Windows XP: install Python 3.4 or
upgrade to a newer version of Windows (or some other OS). As of 3.5.1
I think the installer should explain the problem but I'm not sure.

Python 3.5 is compiled with VS2015 and so uses the new Windows 10 ucrt
runtime. This should mean that it works immediately on Windows 10 (or
higher?) but that for previous versions of Windows this will give an
error "api-ms-win-crt-runtime.dll not found" or something. Explanation
and downloads here:
https://support.microsoft.com/en-us/kb/2999226

The next problem that occurs is the "modify, repair, or uninstall"
one. This is a messagebox that is presented to a user possibly during
installation or when trying to run Python after installation. This
problem has been reported many times and AFAIR it's always Windows and
Python 3.5. No solution is known to this problem (most of the users
reporting it have not followed up after their initial post which may
be related to the way posters on the list reply to them) and I don't
know if it has been reported on the tracker.

There is one other which is the "error 0x823408239" (can't remember
the exact hex) which is possibly due to a corrupt download or to some
problem with temporary files. Apparently renaming some temp files
folder can fix this problem but not sure. Again this problem seems to
be exclusive to Windows and 3.5 and not sure if it's reported to the
tracker.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: read a file and remove Mojibake chars

2016-04-07 Thread Random832
On Thu, Apr 7, 2016, at 04:47, Daiyue Weng wrote:
> Hi, when I read a file, the file string contains Mojibake chars at the
> beginning, the code is like,
> 
> file_str = open(file_path, 'r', encoding='utf-8').read()
> print(repr(open(file_path, 'r', encoding='utf-8').read())
> 
> part of the string (been printing) containing Mojibake chars is like,
> 
>   '锘縶\n "name": "__NAME__"'

Based on a hunch, I tried something:

"锘縶" happens to be the GBK/GB18030 interpretation of the bytes "ef bb bf
7b", which is a UTF-8 byte order mark followed by "{".

So what happened is that someone wrote text in UTF-8 with a byte-order
marker, and someone else read this as GBK/GB18030 and wrote the
resulting characters as UTF-8. So it may be easier to simply
special-case it:

if file_str[:2] == '锘縶': file_str = '{' + file_str[2:]
elif file_str[:2] == '锘縖': file_str = '[' + file_str[2:]


In principle, the whole process could be reversed as file_str =
file_str.encode('gbk').decode('utf-8'), but that would be overkill if it
contains no other ASCII characters and can't contain anything at the
start except these. Plus, if there are any other non-ASCII characters in
the string, it's anyone's guess as to whether they survived the process
in a way that allows you to reverse it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Jon Ribbens
On 2016-04-07, Chris Angelico  wrote:
> Options 1 and 2 are nastily restricted. Option 3 is likely broken, as
> exception objects carry tracebacks and such.

Everything you're saying here is assuming that we must not let the
attacker see any exception objects, but I don't understand why you're
assuming that. As far as I can see, the information that exceptions
hold that we need to prevent access to is all in "__" attributes that
we're already blocking.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Random832
On Thu, Apr 7, 2016, at 00:48, Steven D'Aprano wrote:
> Sure, but I'm just demonstrating that the unrestricted builtins are just
> one 
> attribute lookup away. And as Chris points out, if you have (say) the os 
> module, then:
> 
> magic = os.sys.modules[
> ''.join(chr(i-1) for i in
> (96,96,99,118,106,109,117,106,111,116,96,96))
> ][''.join(chr(i+17) for i in (84,101,80,91))]

I think you probably would not want to allow it access to any "real"
modules, but only proxy objects that allow either a specific set of
names (there are almost certainly functions in os that you don't want,
beyond the imported sys) or something general like "any public
[non-underscore] function/class/variable" (if the module itself has been
examined and exporting this full subset passes security standards); in
the latter case any imported modules would likewise be replaced with the
sandbox's fake module, so os.sys gives you the same thing that import
sys does (though, os in particular would be insane to give blanket
access to, but my test code works for fractions.sys)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: numpy arrays

2016-04-07 Thread Heli

Thanks a lot Oscar, 

The lexsort you suggested was the way to go. 

import h5py
import numpy as np
f=np.loadtxt(inputFile,delimiter=None)
xcoord=np.sort(np.unique(f[:,0]))
ycoord=np.sort(np.unique(f[:,1]))
zcoord=np.sort(np.unique(f[:,2]))

x=f[:,0]
y=f[:,1]
z=f[:,2]
val=f[:,3]

ind = np.lexsort((val,z,y,x)) # Sort by x, then by y, then by z, then by val
sortedVal=np.array([(val[i]) for i in 
ind]).reshape((xcoord.size,ycoord.size,zcoord.size))
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Random832
On Thu, Apr 7, 2016, at 08:13, Jon Ribbens wrote:
> > All the obvious, and even not-so-obvious, attack tools are gone:
> > eval, exec, getattr, type, __import__.

We don't even need to take these away, per se.

eval and exec could be replaced with functions that perform the
evaluation with the same rules in the same sandbox.

I posted yesterday a sketch of a "type" proxy class that even allows "if
type(x) is type".

getattr could be replaced with something that does runtime checks for if
an attribute is allowed. In principle, you could even have the AST
transform replace attempted underscore accesses with getattr, which
could check to allow whitelisted underscore-attributes.

> Indeed. The fundamental point is that we must not allow the attacker
> to have access to any of those things, or to gain access by using any
> of the tools which we have provided. I think this is not an impossible
> problem.
> 
> > I think this approach is promising enough that Jon should take it to a few
> > other places for comments, to try to get more eyeballs. Stackoverflow and
> > Reddit's /r/python, perhaps. 
> 
> I'll post some example code on github in a bit and see what people
> think.

I've thrown together some stuff, in addition to my type example from
yesterday, including a module importer.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Mark Lawrence via Python-list

On 07/04/2016 13:05, Antoon Pardon wrote:

I am looking at my avltree module for converting it to
python3.

One of the things that trouble me here is how python3 no
longer has cmp and how things have to be of "compatible"
type in order to be comparable.

So in python2 it wasn't a problem to have a tree with
numbers and strings as keys. In python3 that will not
be so evident.

In python2 descending the tree would only involve at
most one expensive comparison, because using cmp would
codify that comparison into an integer which would then
be cheap to compare with 0. Now in python3, I may need
to do two expensive comparisons, because there is no
__cmp__ method, to make such a codefication.

Is there a way to work around these limitations or
should I resign myself to working within them?



HTH https://docs.python.org/3/library/functools.html#functools.cmp_to_key

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: numpy arrays

2016-04-07 Thread Oscar Benjamin
On 7 April 2016 at 15:31, Heli  wrote:
>
> Thanks a lot Oscar,
>
> The lexsort you suggested was the way to go.

Glad to hear it.

> import h5py
> import numpy as np
> f=np.loadtxt(inputFile,delimiter=None)
> xcoord=np.sort(np.unique(f[:,0]))
> ycoord=np.sort(np.unique(f[:,1]))
> zcoord=np.sort(np.unique(f[:,2]))
>
> x=f[:,0]
> y=f[:,1]
> z=f[:,2]
> val=f[:,3]
>
> ind = np.lexsort((val,z,y,x)) # Sort by x, then by y, then by z, then by val
> sortedVal=np.array([(val[i]) for i in 
> ind]).reshape((xcoord.size,ycoord.size,zcoord.size))

One final possible improvement is that

 np.array([val[i] for i in ind])

can probably be done with fancy indexing

val[ind,:]

removing the only Python loop left.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Jon Ribbens
On 2016-04-07, Chris Angelico  wrote:
> On Thu, Apr 7, 2016 at 11:45 AM, Steven D'Aprano  wrote:
>> And you would have to do something about the unfortunate matter that modules
>> have a reference to the unrestricted __builtins__:
>>
>> py> os.__builtins__['eval']
>> 
>
> This *in itself* is blocked by the rule against leading-underscore
> attribute lookup. However, if you can get the sys module, the world's
> your oyster; and any other module that imports sys will give it to
> you:
>
 import os
 os.sys
>
 codecs.sys
>
>
> Can't monkey-patch that away, and codecs.sys.modules["builtins"] will
> give you access to the original builtins. And you can go to any number
> of levels, tracing a chain from a white-listed module to the
> unrestricted sys.modules. The only modules that would be safe to
> whitelist are those that either don't import anything significant (I'm
> pretty sure 'math' is safe), or import everything with underscores
> ("import sys as _sys").

No, actually absolutely no modules at all are safe to import directly.
This is because the untrusted code might alter them, and then the
altered code would be used by the trusted main application. Trivial
examples might include altering hashlib to always return the same
hash, 're' to always or never match, etc. If you import something
then it needs to be a individual copy of the module, with each name
referring either to an immutable object or to an individual proxy for
the real object.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 1:18 AM, Jon Ribbens
 wrote:
> No, actually absolutely no modules at all are safe to import directly.
> This is because the untrusted code might alter them, and then the
> altered code would be used by the trusted main application. Trivial
> examples might include altering hashlib to always return the same
> hash, 're' to always or never match, etc. If you import something
> then it needs to be a individual copy of the module, with each name
> referring either to an immutable object or to an individual proxy for
> the real object.

And this is why eval is way easier to secure than exec. No assignment.

When you start talking about eval as being the *easier* option, you
know things are scary...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Peter Pearson
On Thu, 07 Apr 2016 11:37:50 +1000, Steven D'Aprano wrote:
> On Thu, 7 Apr 2016 05:56 am, Thomas 'PointedEars' Lahn wrote:
>> Rustom Mody wrote:
>
>>> So here are some examples to illustrate what I am saying:
>>> 
>>> Example 1 -- Ligatures:
>>> 
>>> Python3 gets it right
>> flag = 1
>> flag
>>> 1
[snip]
>> 
>> I do not think this is correct, though.  Different Unicode code sequences,
>> after normalization, should result in different symbols.
>
> I think you are confused about normalisation. By definition, normalising
> different Unicode code sequences may result in the same symbols, since that
> is what normalisation means.
>
> Consider two distinct strings which nevertheless look identical:
>
> py> a = "\N{LATIN SMALL LETTER U}\N{COMBINING DIAERESIS}"
> py> b = "\N{LATIN SMALL LETTER U WITH DIAERESIS}"
> py> a == b
> False
> py> print(a, b)
> ü ü
>
>
> The purpose of normalisation is to turn one into the other:
>
> py> unicodedata.normalize('NFKC', a) == b  # compose 2 code points --> 1
> True
> py> unicodedata.normalize('NFKD', b) == a  # decompose 1 code point --> 2
> True

It's all great fun until someone loses an eye.

Seriously, it's cute how neatly normalisation works when you're
watching closely and using it in the circumstances for which it was
intended, but that hardly proves that these practices won't cause much
trouble when they're used more casually and nobody's watching closely.
Considering how much energy good software engineers spend eschewing
unnecessary complexity, do we really want to embrace the prospect of
having different things look identical?  (A relevant reference point:
mixtures of spaces and tabs in Python indentation.)

[snip]
> The Unicode consortium seems to disagree with you.



The Unicode consortium was certifiably insane when it went into the
typesetting business.  The pile-of-poo character was just frosting on
the cake.



(Sorry to leave you with that image.)

-- 
To email me, substitute nowhere->runbox, invalid->com.
-- 
https://mail.python.org/mailman/listinfo/python-list


COnvert to unicode

2016-04-07 Thread Joaquin Alzola
Hi People

I need to covert this string:

hello  there
this is a test

(also \n important)

To this Unicode:
00680065006c006c006f0020002000740068006500720065000a00740068006900730020006900730020006100200074006500730074000a
Without the \u and space.

https://www.branah.com/unicode-converter

I seem not to be able to do that conversion.

Help to guide me will be appreciated.

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 2:51 AM, Peter Pearson  wrote:
> The pile-of-poo character was just frosting on
> the cake.
>
> (Sorry to leave you with that image.)

No. You're not even a little bit sorry.

You're an evil, evil man. And funny.

ChrisA
who knows that its codepoint is 1F4A9 without looking it up
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Jon Ribbens
On 2016-04-07, Random832  wrote:
> On Thu, Apr 7, 2016, at 08:13, Jon Ribbens wrote:
>> > All the obvious, and even not-so-obvious, attack tools are gone:
>> > eval, exec, getattr, type, __import__.
>
> We don't even need to take these away, per se.
>
> eval and exec could be replaced with functions that perform the
> evaluation with the same rules in the same sandbox.

Ah, that's a good point.

I've put an example script here:

  https://github.com/jribbens/unsafe/blob/master/unsafe.py

When run as a script, it will execute whatever Python code you pass it
on stdin.

If anyone can break it (by which I mean escape from the sandbox,
not make it use up all the memory or go into an infinite loop,
both of which are trivial) then I would be very interested.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 3:20 AM, Jon Ribbens
 wrote:
> On 2016-04-07, Random832  wrote:
>> On Thu, Apr 7, 2016, at 08:13, Jon Ribbens wrote:
>>> > All the obvious, and even not-so-obvious, attack tools are gone:
>>> > eval, exec, getattr, type, __import__.
>>
>> We don't even need to take these away, per se.
>>
>> eval and exec could be replaced with functions that perform the
>> evaluation with the same rules in the same sandbox.
>
> Ah, that's a good point.
>
> I've put an example script here:
>
>   https://github.com/jribbens/unsafe/blob/master/unsafe.py
>
> When run as a script, it will execute whatever Python code you pass it
> on stdin.

Now we're getting to something rather interesting. Going back to your
previous post, though...

On Wed, Apr 6, 2016 at 3:26 AM, Jon Ribbens
 wrote:
> The received wisdom is that restricted code execution in Python is
> an insolubly hard problem, but it looks a bit like my 7-line example
> above disproves this theory

... the thing you were missing in your original example was a LOT of
sophistication :)

I don't currently have any exploits against your new code, but at this
point, it's grown beyond the "hey, if this was insolubly hard, how
come seven lines of code can do it?" question. This is the kind of
effort it takes to sandbox Python inside Python.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Jon Ribbens
On 2016-04-07, Chris Angelico  wrote:
> On Fri, Apr 8, 2016 at 3:20 AM, Jon Ribbens
> wrote:
>> On 2016-04-07, Random832  wrote:
>>> On Thu, Apr 7, 2016, at 08:13, Jon Ribbens wrote:
 > All the obvious, and even not-so-obvious, attack tools are gone:
 > eval, exec, getattr, type, __import__.
>>>
>>> We don't even need to take these away, per se.
>>>
>>> eval and exec could be replaced with functions that perform the
>>> evaluation with the same rules in the same sandbox.
>>
>> Ah, that's a good point.
>>
>> I've put an example script here:
>>
>>   https://github.com/jribbens/unsafe/blob/master/unsafe.py
>>
>> When run as a script, it will execute whatever Python code you pass it
>> on stdin.
>
> Now we're getting to something rather interesting. Going back to your
> previous post, though...
>
> On Wed, Apr 6, 2016 at 3:26 AM, Jon Ribbens
> wrote:
>> The received wisdom is that restricted code execution in Python is
>> an insolubly hard problem, but it looks a bit like my 7-line example
>> above disproves this theory
>
> ... the thing you were missing in your original example was a LOT of
> sophistication :)
>
> I don't currently have any exploits against your new code, but at this
> point, it's grown beyond the "hey, if this was insolubly hard, how
> come seven lines of code can do it?" question. This is the kind of
> effort it takes to sandbox Python inside Python.

Well, it entirely depends on how much you're trying to allow the
sandboxed code to do. Most of the stuff in that script (e.g.
_copy_module and safe versions of get/set/delattr, exec, and eval)
I don't think is really necessary for most sensible applications
of such an idea, I've just added it for completeness and to see
if it introduces any security holes that weren't there originally.

I could slim down the code again by simply removing all that extra
cruft and the principle would still be the same - it's only the
safe_compile() function that's adding anything interesting that
I haven't seen done before (and half of the lines in that function
are docstring or stuff to make nicer error messages).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: COnvert to unicode

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 1:33 AM, Joaquin Alzola
 wrote:
> hello  there
> this is a test
>
> (also \n important)
>
> To this Unicode:
> 00680065006c006c006f0020002000740068006500720065000a00740068006900730020006900730020006100200074006500730074000a
> Without the \u and space.

What happens if you have a non-BMP codepoint? So far, what you have is
pretty straight-forward.

>>> s = "hello  there\nthis is a test\n"
>>> "".join("%04x" % ord(x) for x in s)
'00680065006c006c006f0020002000740068006500720065000a00740068006900730020006900730020006100200074006500730074000a'

But if you have codepoints that don't fit in four hex digits, this
will mess up your formatting. You'll need to decide how to handle
those.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Ian Kelly
On Thu, Apr 7, 2016 at 11:35 AM, Jon Ribbens
 wrote:
> Well, it entirely depends on how much you're trying to allow the
> sandboxed code to do. Most of the stuff in that script (e.g.
> _copy_module and safe versions of get/set/delattr, exec, and eval)
> I don't think is really necessary for most sensible applications
> of such an idea, I've just added it for completeness and to see
> if it introduces any security holes that weren't there originally.
>
> I could slim down the code again by simply removing all that extra
> cruft and the principle would still be the same - it's only the
> safe_compile() function that's adding anything interesting that
> I haven't seen done before (and half of the lines in that function
> are docstring or stuff to make nicer error messages).

Since you're now allowing exec I suppose you might as well allow type
also, since you could just use a class statement instead. safe_compile
prevents accessing magic methods, but it doesn't prevent defining
them. Not sure if there could be an exploit there or not.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python, Linux, default search places.

2016-04-07 Thread Karim



On 07/04/2016 13:02, [email protected] wrote:

Hello.

I run a third party program that can use a system installation of Python.
I have to modify environment variables:
PYTHONPATH,
PATH,
LD_LIBRARY_PATH.

All these environment variables are empty at the beginning but Python uses
a default or initial places to search for modules, libraries or
executables.

Q1. Could anybody tell me where I could find default search places for
environment variables PYTHONPATH, PATH, LD_LIBRARY_PATH?

OS: SUSE Linux Enterprise Server 11.
HW: HP DL160 Gen8 SFF CTO.
Python 2.6.

Frantisek


export PYTHONPATH=${PYTHONPATH}//
export PATH=${PATH}:/bin
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib

Mine for pyparsing:

export PYTHONPATH=~/project/pyparsing-2.0.3

Karim

--
https://mail.python.org/mailman/listinfo/python-list


Re: COnvert to unicode

2016-04-07 Thread Peter Otten
Joaquin Alzola wrote:

> Hi People
> 
> I need to covert this string:
> 
> hello  there
> this is a test
> 
> (also \n important)
> 
> To this Unicode:
> 
00680065006c006c006f0020002000740068006500720065000a00740068006900730020006900730020006100200074006500730074000a
> Without the \u and space.
> 
> https://www.branah.com/unicode-converter
> 
> I seem not to be able to do that conversion.
> 
> Help to guide me will be appreciated.

>>> import codecs
>>> s = u"hello  there\nthis is a test\n"
>>> codecs.encode(s.encode("utf-16-be"), "hex")
'00680065006c006c006f0020002000740068006500720065000a00740068006900730020006900730020006100200074006500730074000a'


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python, Linux, default search places.

2016-04-07 Thread Wildman via Python-list
On Thu, 07 Apr 2016 13:02:46 +0200, Frantisek.Fridrich wrote:

> Hello.
> 
> I run a third party program that can use a system installation of Python. 
> I have to modify environment variables: 
> PYTHONPATH,
> PATH,
> LD_LIBRARY_PATH.
> 
> All these environment variables are empty at the beginning but Python uses 
> a default or initial places to search for modules, libraries or 
> executables.
> 
> Q1. Could anybody tell me where I could find default search places for 
> environment variables PYTHONPATH, PATH, LD_LIBRARY_PATH?
> 
> OS: SUSE Linux Enterprise Server 11. 
> HW: HP DL160 Gen8 SFF CTO.
> Python 2.6.
> 
> Frantisek

You should be able to retrieve any environment variable
like this, if it exists:

import os
env_var = os.environ["PATH"]

Environment variables use the colon, ':', to separate
entries so you can do this if you want a list:

env_var_list = os.environ["PATH"].split(":")
 
BTW, variable names are case sensitive in Linux.  The
ones set by the system are normally all upper case.
That is not absolute in cases where variables are set
by 3rd party programs.

-- 
 GNU/Linux user #557453
Why is it all instruments seeking intelligent life
in the universe are pointed away from Earth?
  -unknown
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Joining Strings

2016-04-07 Thread Emeka
Jussi,

Thanks it worked when parsed with json.load. However, it needed this
decode('utf'):

data = json.loads(respData.decode('utf-8'))

On Thu, Apr 7, 2016 at 6:01 AM, Jussi Piitulainen <
[email protected]> wrote:

> Emeka writes:
>
> > Hello All,
> >
> > import urllib.request
> > import re
> >
> > url = 'https://www.everyday.com/
> >
> >
> >
> > req = urllib.request.Request(url)
> > resp = urllib.request.urlopen(req)
> > respData = resp.read()
> >
> >
> > paragraphs = re.findall(r'\[(.*?)\]',str(respData))
> > for eachP in paragraphs:
> > print("".join(eachP.split(',')[1:-2]))
> > print("\n")
> >
> >
> >
> > I got the below:
> > "Coke -  Yala Market Branch""NO. 113 IKU BAKR WAY YALA"""
> > But what I need is
> >
> > 'Coke -  Yala Market Branch NO. 113 IKU BAKR WAY YALA'
> >
> > How to I achieve the above?
>
> A couple of things you could do to understand your problem and work
> around it: Change your code to print(eachP). Change your "".join to
> "!".join to see where the commas were. Experiment with data of that form
> in the REPL. Sometimes it's good to print repr(datum) instead of datum,
> though not in this case.
>
> But are you trying to extract and parse paragraphs from a JSON response?
> Do not use regex for that at all. Use json.load or json.loads to parse
> it properly, and access the relevant data by indexing:
>
> x = json.loads('{"foo":[["Weather Forecast","It\'s Rain"],[]]}')
>
> x ==> {'foo': [['Weather Forecast', "It's Rain"], []]}
>
> x['foo'] ==> [['Weather Forecast', "It's Rain"], []]
>
> x['foo'][0] ==> ['Weather Forecast', "It's Rain"]
> --
> https://mail.python.org/mailman/listinfo/python-list
>



-- 
P.S Please join our groups*:  *[email protected]
* or *[email protected]  These are platforms for learning
and sharing  of knowledge.
 www.satajanus.com | *Satajanus  Nig.
Ltd*
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Paul Rubin
Chris Angelico  writes:
> First off, what does it actually *mean* to have a tree with numbers
> and keys as strings? Are they ever equal? Are all integers deemed
> lower than all strings? Something else?

If the AVL tree's purpose is to be an alternative lookup structure to
Python's hash-based dictionaries, then it doesn't really matter what the
ordering between values is, as long as it's deterministic.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Marko Rauhamaa
Paul Rubin :

> Chris Angelico  writes:
>> First off, what does it actually *mean* to have a tree with numbers
>> and keys as strings? Are they ever equal? Are all integers deemed
>> lower than all strings? Something else?
>
> If the AVL tree's purpose is to be an alternative lookup structure to
> Python's hash-based dictionaries, then it doesn't really matter what
> the ordering between values is, as long as it's deterministic.

I use AVL trees to implement timers. You need to be able to insert
elements in a sorted order and remove them quickly.

Guido chose a different method to implement timers for asyncio. He
decided to never remove canceled timers.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 5:26 AM, Paul Rubin  wrote:
> Chris Angelico  writes:
>> First off, what does it actually *mean* to have a tree with numbers
>> and keys as strings? Are they ever equal? Are all integers deemed
>> lower than all strings? Something else?
>
> If the AVL tree's purpose is to be an alternative lookup structure to
> Python's hash-based dictionaries, then it doesn't really matter what the
> ordering between values is, as long as it's deterministic.

Fair enough. In that case, the option of sorting by
(obj.__type__.__name__, obj) will probably do the job. But if you need
to maintain relationships across types (eg 1 < 2.0 < 3), it needs to
be more sophisticated.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Antoon Pardon
Op 07-04-16 om 14:22 schreef Chris Angelico:

...

> There's no __cmp__ method, but you could easily craft your own
> compare() function:
> 
> def compare(x, y):
> """Return a number < 0 if x < y, or > 0 if x > y"""
> if x == y: return 0
> return -1 if keyify(x) < keyify(y) else 1
> 
> I'm not sure how your tree is crafted and how your "cheap" and
> "expensive" comparisons previously worked, but give something like
> this a try. I think you'll find it adequate.

That solution will mean I will have to do about 100% more comparisons
than previously.

Lets simplify for the moment and suppose all keys are tuples of
integers. Now because how trees are organised, the lower you
descend in the tree, the closer the keys are together. In the
case of tuples that means higher probability you have to traverse
the two tuples further in order to find out which is greater.

With the __cmp__ method, you only had to traverse the two tuples
once in order to find out whether they were equal or if not which
is the smaller and which is the greater.

With this method I have to traverse the two tuples almost always
twice. Once to find out if they are equal and if not a second
time to find out which is greater.

-- 
Antoon.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Ben Finney
Antoon Pardon  writes:

> With this method I have to traverse the two tuples almost always
> twice. Once to find out if they are equal and if not a second time to
> find out which is greater.

You are essentially describing the new internal API of comparison
operators. That's pretty much unavoidable.

If you want to avoid repeating an expensive operation – the computation
of the comparison value for an object – you could add an LRU cache to
that function. See ‘functools.lru_cache’.

-- 
 \  “He that would make his own liberty secure must guard even his |
  `\ enemy from oppression.” —Thomas Paine |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Paul Rubin
Marko Rauhamaa  writes:
> Guido chose a different method to implement timers for asyncio. He
> decided to never remove canceled timers.

Oh my, that might not end well.  There are other approaches that don't
need AVL trees and can remove cancelled timers, e.g. "timer wheels" as
used in Erlang and formerly (don't know about now) in the Linux kernel.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 6:56 AM, Antoon Pardon
 wrote:
>
> That solution will mean I will have to do about 100% more comparisons
> than previously.

Try it regardless. You'll probably find that performance is fine.
Don't prematurely optimize!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Ian Kelly
On Thu, Apr 7, 2016 at 1:32 PM, Marko Rauhamaa  wrote:
> Paul Rubin :
>
>> Chris Angelico  writes:
>>> First off, what does it actually *mean* to have a tree with numbers
>>> and keys as strings? Are they ever equal? Are all integers deemed
>>> lower than all strings? Something else?
>>
>> If the AVL tree's purpose is to be an alternative lookup structure to
>> Python's hash-based dictionaries, then it doesn't really matter what
>> the ordering between values is, as long as it's deterministic.
>
> I use AVL trees to implement timers. You need to be able to insert
> elements in a sorted order and remove them quickly.

Why would AVL trees implementing timers ever need non-numeric keys though?

It seems to me that if you're mixing types like this then the ordering
is likely not actually important.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Ian Kelly
On Thu, Apr 7, 2016 at 2:56 PM, Antoon Pardon
 wrote:
> Op 07-04-16 om 14:22 schreef Chris Angelico:
>
> ...
>
>> There's no __cmp__ method, but you could easily craft your own
>> compare() function:
>>
>> def compare(x, y):
>> """Return a number < 0 if x < y, or > 0 if x > y"""
>> if x == y: return 0
>> return -1 if keyify(x) < keyify(y) else 1
>>
>> I'm not sure how your tree is crafted and how your "cheap" and
>> "expensive" comparisons previously worked, but give something like
>> this a try. I think you'll find it adequate.
>
> That solution will mean I will have to do about 100% more comparisons
> than previously.
>
> Lets simplify for the moment and suppose all keys are tuples of
> integers. Now because how trees are organised, the lower you
> descend in the tree, the closer the keys are together. In the
> case of tuples that means higher probability you have to traverse
> the two tuples further in order to find out which is greater.
>
> With the __cmp__ method, you only had to traverse the two tuples
> once in order to find out whether they were equal or if not which
> is the smaller and which is the greater.
>
> With this method I have to traverse the two tuples almost always
> twice. Once to find out if they are equal and if not a second
> time to find out which is greater.

You can reduce that to ~50% more if you swap the order. Check if one
tuple is less first. If it is, you have your answer, no further
comparison required. Otherwise, check the less likely case that
they're equal.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Mark Lawrence via Python-list

On 07/04/2016 21:56, Antoon Pardon wrote:

Op 07-04-16 om 14:22 schreef Chris Angelico:

...


There's no __cmp__ method, but you could easily craft your own
compare() function:

def compare(x, y):
 """Return a number < 0 if x < y, or > 0 if x > y"""
 if x == y: return 0
 return -1 if keyify(x) < keyify(y) else 1

I'm not sure how your tree is crafted and how your "cheap" and
"expensive" comparisons previously worked, but give something like
this a try. I think you'll find it adequate.


That solution will mean I will have to do about 100% more comparisons
than previously.

Lets simplify for the moment and suppose all keys are tuples of
integers. Now because how trees are organised, the lower you
descend in the tree, the closer the keys are together. In the
case of tuples that means higher probability you have to traverse
the two tuples further in order to find out which is greater.

With the __cmp__ method, you only had to traverse the two tuples
once in order to find out whether they were equal or if not which
is the smaller and which is the greater.

With this method I have to traverse the two tuples almost always
twice. Once to find out if they are equal and if not a second
time to find out which is greater.



Have you read this https://wiki.python.org/moin/HowTo/Sorting ?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Marko Rauhamaa
Ian Kelly :

> On Thu, Apr 7, 2016 at 1:32 PM, Marko Rauhamaa  wrote:
>> I use AVL trees to implement timers. You need to be able to insert
>> elements in a sorted order and remove them quickly.
>
> Why would AVL trees implementing timers ever need non-numeric keys
> though?
>
> It seems to me that if you're mixing types like this then the ordering
> is likely not actually important.

The keys are expiry times. You could use numbers or you could use
datetime objects.

Ordering is crucial when it comes to timers.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Marko Rauhamaa
Paul Rubin :

> Marko Rauhamaa  writes:
>> Guido chose a different method to implement timers for asyncio. He
>> decided to never remove canceled timers.
>
> Oh my, that might not end well. There are other approaches that don't
> need AVL trees and can remove cancelled timers, e.g. "timer wheels" as
> used in Erlang and formerly (don't know about now) in the Linux
> kernel.

The issue is known. It has been tackled with a kind of a "garbage
collection" scheme:

   https://bugs.python.org/issue22448>


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Ian Kelly
On Apr 7, 2016 10:22 PM, "Marko Rauhamaa"  wrote:
>
> Ian Kelly :
>
> > On Thu, Apr 7, 2016 at 1:32 PM, Marko Rauhamaa  wrote:
> >> I use AVL trees to implement timers. You need to be able to insert
> >> elements in a sorted order and remove them quickly.
> >
> > Why would AVL trees implementing timers ever need non-numeric keys
> > though?
> >
> > It seems to me that if you're mixing types like this then the ordering
> > is likely not actually important.
>
> The keys are expiry times. You could use numbers or you could use
> datetime objects.

Yes, but why would you want to use both?

> Ordering is crucial when it comes to timers.

I'm not disputing that.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Rustom Mody
On Thursday, April 7, 2016 at 10:22:18 PM UTC+5:30, Peter Pearson wrote:
> On Thu, 07 Apr 2016 11:37:50 +1000, Steven D'Aprano wrote:
> > On Thu, 7 Apr 2016 05:56 am, Thomas 'PointedEars' Lahn wrote:
> >> Rustom Mody wrote:
> >
> >>> So here are some examples to illustrate what I am saying:
> >>> 
> >>> Example 1 -- Ligatures:
> >>> 
> >>> Python3 gets it right
> >> flag = 1
> >> flag
> >>> 1
> [snip]
> >> 
> >> I do not think this is correct, though.  Different Unicode code sequences,
> >> after normalization, should result in different symbols.
> >
> > I think you are confused about normalisation. By definition, normalising
> > different Unicode code sequences may result in the same symbols, since that
> > is what normalisation means.
> >
> > Consider two distinct strings which nevertheless look identical:
> >
> > py> a = "\N{LATIN SMALL LETTER U}\N{COMBINING DIAERESIS}"
> > py> b = "\N{LATIN SMALL LETTER U WITH DIAERESIS}"
> > py> a == b
> > False
> > py> print(a, b)
> > ü ü
> >
> >
> > The purpose of normalisation is to turn one into the other:
> >
> > py> unicodedata.normalize('NFKC', a) == b  # compose 2 code points --> 1
> > True
> > py> unicodedata.normalize('NFKD', b) == a  # decompose 1 code point --> 2
> > True
> 
> It's all great fun until someone loses an eye.
> 
> Seriously, it's cute how neatly normalisation works when you're
> watching closely and using it in the circumstances for which it was
> intended, but that hardly proves that these practices won't cause much
> trouble when they're used more casually and nobody's watching closely.
> Considering how much energy good software engineers spend eschewing
> unnecessary complexity, do we really want to embrace the prospect of
> having different things look identical?  (A relevant reference point:
> mixtures of spaces and tabs in Python indentation.)

That kind of sums up my position.
To be a casual user of unicode is one thing
To support it is another -- unicode strings in python3 -- ok so far
To mix up these two is a third without enough thought or consideration --
unicode identifiers is likely a security hole waiting to happen...

No I am not clever/criminal enough to know how to write a text that is visually
close to 
print "Hello World"
but is internally closer to
rm -rf /

For me this:
 >>> Α = 1
>>> A = 2
>>> Α + 1 == A 
True
>>> 


is cure enough that I am not amused

[The only reason I brought up case distinction is that this is in the same 
direction and way worse than that]

If python had been more serious about embracing the brave new world of
unicode it should have looked in this direction:
http://blog.languager.org/2014/04/unicoded-python.html

Also here I suggest a classification of unicode, that, while not
official or even formalizable is (I believe) helpful
http://blog.languager.org/2015/03/whimsical-unicode.html

Specifically as far as I am concerned if python were to throw back say
a ligature in an identifier as a syntax error -- exactly what python2 does --
I think it would be perfectly fine and a more sane choice
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Rustom Mody
On Friday, April 8, 2016 at 10:13:16 AM UTC+5:30, Rustom Mody wrote:
> No I am not clever/criminal enough to know how to write a text that is 
> visually
> close to 
> print "Hello World"
> but is internally closer to
> rm -rf /
> 
> For me this:
>  >>> Α = 1
> >>> A = 2
> >>> Α + 1 == A 
> True
> >>> 
> 
> 
> is cure enough that I am not amused

Um... "cute" was the intention
[Or is it cuʇe ?]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 2:43 PM, Rustom Mody  wrote:
> No I am not clever/criminal enough to know how to write a text that is 
> visually
> close to
> print "Hello World"
> but is internally closer to
> rm -rf /
>
> For me this:
>  >>> Α = 1
 A = 2
 Α + 1 == A
> True

>
>
> is cure enough that I am not amused

To me, the above is a contrived example. And you can contrive examples
that are just as confusing while still being ASCII-only, like
swimmer/swirnmer in many fonts, or I and l, or any number of other
visually-confusing glyphs. I propose that we ban the letters 'r' and
'l' from identifiers, to ensure that people can't mess with
themselves.

> Specifically as far as I am concerned if python were to throw back say
> a ligature in an identifier as a syntax error -- exactly what python2 does --
> I think it would be perfectly fine and a more sane choice

The ligature is handled straight-forwardly: it gets decomposed into
its component letters. I'm not seeing a problem here.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Marko Rauhamaa
Ian Kelly :

> On Apr 7, 2016 10:22 PM, "Marko Rauhamaa"  wrote:
>> The keys are expiry times. You could use numbers or you could use
>> datetime objects.
>
> Yes, but why would you want to use both?

I was never talking about mixing key types. I was simply reacting (out
of context) to a suggestion that AVL trees are simply dictionaries.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Joining Strings

2016-04-07 Thread Jussi Piitulainen
Emeka writes:

> Thanks it worked when parsed with json.load. However, it needed this
> decode('utf'):
>
> data = json.loads(respData.decode('utf-8'))

So it does. The response data is bytes.

There's also a way to wrap a decoding reader between the response object
and the JSON parser (json.load instead of json.loads):

response = urllib.request.urlopen(command) # a stream of bytes ...
please = codecs.getreader('UTF-8') # ... to characters

result = json.load(please(response))
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Steven D'Aprano
On Fri, 8 Apr 2016 01:18 am, Jon Ribbens wrote:

> No, actually absolutely no modules at all are safe to import directly.
> This is because the untrusted code might alter them


Good thinking! I never even thought of that.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Untrusted code execution

2016-04-07 Thread Steven D'Aprano
On Fri, 8 Apr 2016 12:25 am, Jon Ribbens wrote:

> On 2016-04-07, Chris Angelico  wrote:
>> Options 1 and 2 are nastily restricted. Option 3 is likely broken, as
>> exception objects carry tracebacks and such.
> 
> Everything you're saying here is assuming that we must not let the
> attacker see any exception objects, but I don't understand why you're
> assuming that. As far as I can see, the information that exceptions
> hold that we need to prevent access to is all in "__" attributes that
> we're already blocking.

You might be right, but you're putting a lot of trust in one security
mechanism. If an attacker finds a way around that, you're screwed. "Defence
in depth" and "default deny" is, in my opinion, better: prevent the
untrusted user from seeing everything except those things which are proven
to be safe.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Steven D'Aprano
On Fri, 8 Apr 2016 02:51 am, Peter Pearson wrote:

> Seriously, it's cute how neatly normalisation works when you're
> watching closely and using it in the circumstances for which it was
> intended, but that hardly proves that these practices won't cause much
> trouble when they're used more casually and nobody's watching closely.
> Considering how much energy good software engineers spend eschewing
> unnecessary complexity, 

Maybe so, but it's not good software engineers we have to worry about, but
the other 99.9% :-)


> do we really want to embrace the prospect of 
> having different things look identical?

You mean like ASCII identifiers? I'm afraid it's about fifty years too late
to ban identifiers using O and 0, or l, I and 1, or rn and m.

Or for that matter:

a = akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqwe9fhlcjbqvcbhsiauy37wkg() + 100
b = 100 + akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqew9fhlcjbqvcbhsiauy37wkg()

How easily can you tell them apart at a glance?

The reality is that we trust our coders not to deliberately mess us about.
As the Obfuscated C and the Underhanded C contest prove, you don't need
Unicode to hide hostile code. In fact, the use of Unicode confusables in an
otherwise all-ASCII file is a dead giveaway that something fishy is going
on.

I think that, beyond normalisation, the compiler need not be too concerned
by confusables. I wouldn't *object* to the compiler raising a warning if it
detected confusable identifiers, or mixed script identifiers, but I think
that's more the job for a linter or human code review.



> (A relevant reference point: 
> mixtures of spaces and tabs in Python indentation.)

Most editors have an option to display whitespace, and tabs and spaces look
different. Typically the tab is shown with an arrow, and the space by a
dot. If people *still* confuse them, the issue is easily managed by a
combination of "well don't do that" and TabError.


> [snip]
>> The Unicode consortium seems to disagree with you.
> 
> 
> 
> The Unicode consortium was certifiably insane when it went into the
> typesetting business.

They are not, and never have been, in the typesetting business. Perhaps
characters are not the only things easily confused *wink*

(Although some members of the consortium may be. But the consortium itself
isn't.)


> The pile-of-poo character was just frosting on 
> the cake.

Blame the Japanese mobile phone companies for that. When you pay your
membership fee, you get to object to the addition of characters too.
(Anyone, I think, can propose a new character, but only members get to
choose which proposals are accepted.)

But really, why should we object? Is "pile-of-poo" any more silly than any
of the other dingbats, graphics characters, and other non-alphabetical
characters? Unicode is not just for "letters of the alphabet".


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Promoting Python

2016-04-07 Thread Steven D'Aprano
On Thu, 7 Apr 2016 05:19 pm, Marko Rauhamaa wrote:

> First, terminology disputes are pointless.

I agree! There's nothing I like more than sitting in front of a blazing open
fire (or even just a warm heater) on a cold winter's evening, drinking a
nice mug of piping hot terminology dispute. Sometimes I put marshmallows in
it. That makes it even more pointless.

I have to be careful around my wife though, she's life-threateningly
allergic to all dispute products. (Technically, she can eat white dispute,
so long as it is 100% cocoa-free, but she doesn't see the point.)

.
.
.
.


Having trouble understanding what the hell I'm talking about? That's what
happens when somebody in a conversation is using unexpected terminology. To
the extent that terminology disputes resolve misuses and misunderstandings
about terminology, they simplify communication and make it easier to
communicate, not harder. The very opposite of pointless.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 4:00 PM, Steven D'Aprano  wrote:
> Or for that matter:
>
> a = akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqwe9fhlcjbqvcbhsiauy37wkg() + 100
> b = 100 + akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqew9fhlcjbqvcbhsiauy37wkg()
>
> How easily can you tell them apart at a glance?

Ouch! Can't even align them top and bottom. This is evil.

> I think that, beyond normalisation, the compiler need not be too concerned
> by confusables. I wouldn't *object* to the compiler raising a warning if it
> detected confusable identifiers, or mixed script identifiers, but I think
> that's more the job for a linter or human code review.

The compiler should treat as identical anything that an editor should
reasonably treat as identical. I'm not sure whether multiple combining
characters on a single base character are forced into some order prior
to comparison or are kept in the order they were typed, but my gut
feeling is that they should be considered identical.

> They are not, and never have been, in the typesetting business. Perhaps
> characters are not the only things easily confused *wink*

Peter is definitely a character. So are you. QUITE a character. :)

> But really, why should we object? Is "pile-of-poo" any more silly than any
> of the other dingbats, graphics characters, and other non-alphabetical
> characters? Unicode is not just for "letters of the alphabet".

It's less silly than "ZERO-WIDTH NON-BREAKING SPACE", which isn't a
space at all, it's a joiner. Go figure.

(History's a wonderful thing, ain't it? So's backward compatibility
and a guarantee that names will never be changed.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Terry Reedy

On 4/7/2016 3:32 PM, Marko Rauhamaa wrote:


I use AVL trees to implement timers. You need to be able to insert
elements in a sorted order and remove them quickly.

Guido chose a different method to implement timers for asyncio. He
decided to never remove canceled timers.


In 3.5.1, asyncio.base_events.BaseEventLoop._run_once
has this code to remove cancelled timers when they become too numerous.

if (sched_count > _MIN_SCHEDULED_TIMER_HANDLES and
self._timer_cancelled_count / sched_count >
_MIN_CANCELLED_TIMER_HANDLES_FRACTION):
# Remove delayed calls that were cancelled if their number
# is too high
new_scheduled = []
for handle in self._scheduled:
if handle._cancelled:
handle._scheduled = False
else:
new_scheduled.append(handle)

heapq.heapify(new_scheduled)
self._scheduled = new_scheduled
self._timer_cancelled_count = 0


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Terry Reedy

On 4/8/2016 12:22 AM, Marko Rauhamaa wrote:

Paul Rubin :


Marko Rauhamaa  writes:

Guido chose a different method to implement timers for asyncio. He
decided to never remove canceled timers.


Only initially.  He approved a change immediately when presented with a 
concrete problem.



Oh my, that might not end well. There are other approaches that don't
need AVL trees and can remove cancelled timers, e.g. "timer wheels" as
used in Erlang and formerly (don't know about now) in the Linux
kernel.


The issue is known. It has been tackled with a kind of a "garbage
collection" scheme:

https://bugs.python.org/issue22448>


and fixed 1 1/2 years ago.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: how to convert code that uses cmp to python3

2016-04-07 Thread Marko Rauhamaa
Terry Reedy :

> On 4/8/2016 12:22 AM, Marko Rauhamaa wrote:
>> The issue is known. It has been tackled with a kind of a "garbage
>> collection" scheme:
>>
>> https://bugs.python.org/issue22448>
>
> and fixed 1 1/2 years ago.

On the surface, the garbage collection scheme looks dubious, but maybe
it works perfect in practice.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list