Re: python import module question

2013-07-28 Thread Peter Otten
syed khalid wrote:

> I am trying to do a "import shogun" in my python script. I can invoke
> shogun with a command line with no problem. But I cannot with a python
> import statement.
> 
>>invoking python from a command line...
> 
> Syedk@syedk-ThinkPad-T410:~/shogun-2.0.0/src/interfaces/cmdline_static$
> shogun | more libshogun (i686/v2.0.0_9c8012f_2012-09-04_09:08_164102447)
> 
> Copyright (C) 1999-2009 Fraunhofer Institute FIRST
> Copyright (C) 1999-2011 Max Planck Society
> Copyright (C) 2009-2011 Berlin Institute of Technology
> Copyright (C) 2012 Soeren Sonnenburg, Sergey Lisitsyn, Heiko Strathmann
> Written   (W) 1999-2012 Soeren Sonnenburg, Gunnar Raetsch et al.
> 
> ( configure options: "configure options --interfaces=python_static"
> compile flag s: "-fPIC -g -Wall -Wno-unused-parameter -Wformat
> -Wformat-security -Wparenthese s -Wshadow -Wno-deprecated -O9
> -fexpensive-optimizations -frerun-cse-after-loop -fcse-follow-jumps
> -finline-functions -fschedule-insns2 -fthread-jumps -fforce-a ddr
> -fstrength-reduce -funroll-loops -march=native -mtune=native -pthread"
> link flags: " -Xlinker --no-undefined" ) ( seeding random number generator
> with 3656470784 (seed size 256)) determined range for x in log(1+exp(-x))
> is:37 )
> 
Trying to call python from a script in the same directory where I
invoked the shogun from a command
linesyedk@syedk-ThinkPad-
T410:~/shogun-2.0.0/src/interfaces/cmdline_static$
python
> Python 2.7.3 (default, Apr 10 2013, 05:46:21)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
 import shogun
> Traceback (most recent call last):
>   File "", line 1, in 
> ImportError: No module named shogun

Poking around a bit on the project's website

http://shogun-toolbox.org/doc/en/2.0.1/interfaces.html

 it looks like you are trying to use their "modular" interface when you have 
only installed the "static" one. I'm guessing that

>>> import sg

will work.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread Antoon Pardon

Op 27-07-13 20:21, [email protected] schreef:


Quickly. sys.getsizeof() at the light of what I explained.

1) As this FSR works with multiple encoding, it has to keep
track of the encoding. it puts is in the overhead of str
class (overhead = real overhead + encoding). In such
a absurd way, that a


sys.getsizeof('€')

40

needs 14 bytes more than a


sys.getsizeof('z')

26

You may vary the length of the str. The problem is
still here. Not bad for a coding scheme.

2) Take a look at this. Get rid of the overhead.


sys.getsizeof('b'*100 + 'c')

126

sys.getsizeof('b'*100 + '€')

240

What does it mean? It means that Python has to
reencode a str every time it is necessary because
it works with multiple codings.


So? The same effect can be seen with other datatypes.

>>> nr = 32767
>>> sys.getsizeof(nr)
14
>>> nr += 1
>>> sys.getsizeof(nr)
16




This FSR is not even a copy of the utf-8.

len(('b'*100 + '€').encode('utf-8'))

103


Why should it be? Why should a unicode string be a copy
of its utf-8 encoding? That makes as much sense as expecting
that a number would be a copy of its string reprensentation.



utf-8 or any (utf) never need and never spend their time
in reencoding.


So? That python sometimes needs to do some kind of background
processing is not a problem, whether it is garbage collection,
allocating more memory, shufling around data blocks or reencoding a
string, that doesn't matter. If you've got a real world example where
one of those things noticeably slows your program down or makes the
program behave faulty then you have something that is worthy of
attention.

Until then you are merely harboring a pet peeve.

--
Antoon Pardon
--
http://mail.python.org/mailman/listinfo/python-list


Configuraion to run pyhton script on ubuntu 12.04

2013-07-28 Thread Jaiky
want to run a python script which contains simple form of html on firefox 
browser ,  but dont know what should be the configuration on ubuntu 12.04  to 
run this script  i.e cgi configuration 



My code is 
ubder 
in /var/www/cgi-bin/forms__.py



#!/usr/bin/env python
import webapp2

form ="""
  


  """


class MainPage(webapp2.RequestHandler):
def get(self):
#self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write(form)

app = webapp2.WSGIApplication([('/', MainPage)],
 debug=True)


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Thread is somehow interfering with a while loop called after the thread is started

2013-07-28 Thread Irmen de Jong
On 28-7-2013 4:29, [email protected] wrote:
> I have a simple scapy + nfqueue dns spoofing script that I want to turn into 
> a thread within a larger program:
> 
> http://www.bpaste.net/show/HrlfvmUBDA3rjPQdLmdp/
> 
> Below is my attempt to thread the program above. Somehow, the only way the 
> while loop actually prints "running" is if the callback function is called 
> consistently. If the callback function isn't started, the script will never 
> print "running". How can that be if the while loop is AFTER the thread was 
> started? Shouldn't the while loop and the thread operate independantly?
> 
> http://bpaste.net/show/0aCxSsSW7yHcQ7EBLctI/
> 

Try adding sys.stdout.flush() after your print statements, I think you're 
seeing a
stdout buffering issue.


Irmen

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Configuraion to run pyhton script on ubuntu 12.04

2013-07-28 Thread Pierre Jaury
Jaiky  writes:

> want to run a python script which contains simple form of html on firefox 
> browser ,  but dont know what should be the configuration on ubuntu 12.04  to 
> run this script  i.e cgi configuration 
>
>
>
> My code is 
> ubder 
> in /var/www/cgi-bin/forms__.py
>
>
>
> #!/usr/bin/env python
> import webapp2
>
> form ="""
>   
> 
> 
>   """
>
>
> class MainPage(webapp2.RequestHandler):
> def get(self):
> #self.response.headers['Content-Type'] = 'text/plain'
> self.response.out.write(form)
>
> app = webapp2.WSGIApplication([('/', MainPage)],
>  debug=True)

In order for you app to run as cgi, you would have to call the webapp2
run() function. Otherwise, it is going to implement wsgi interfaces.

Have a look at:
http://webapp-improved.appspot.com/api/webapp2.html#webapp2.WSGIApplication.run

As well as:
http://httpd.apache.org/docs/current/mod/mod_alias.html#scriptalias


pgpfL_98rpm90.pgp
Description: PGP signature
-- 
http://mail.python.org/mailman/listinfo/python-list


FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Michael Torrie
On 07/27/2013 12:21 PM, [email protected] wrote:
> Good point. FSR, nice tool for those who wish to teach
> Unicode. It is not every day, one has such an opportunity.

I had a long e-mail composed, but decided to chop it down, but still too
long.  so I ditched a lot of the context, which jmf also seems to do.
Apologies.

1. FSR *is* UTF-32 so it is as unicode compliant as UTF-32, since UTF-32
is an official encoding.  FSR only differs from UTF-32 in that the
padding zeros are stripped off such that it is stored in the most
compact form that can handle all the characters in string, which is
always known at string creation time.  Now you can argue many things,
but to say FSR is not unicode compliant is quite a stretch!  What
unicode entities or characters cannot be stored in strings using FSR?
What sequences of bytes in FSR result in invalid Unicode entities?

2. strings in Python *never change*.  They are immutable.  The +
operator always copies strings character by character into a new string
object, even if Python had used UTF-8 internally.  If you're doing a lot
of string concatenations, perhaps you're using the wrong data type.  A
byte buffer might be better for you, where you can stuff utf-8 sequences
into it to your heart's content.

3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
slicing a string would be very very slow, and that's unacceptable for
the use cases of python strings.  I'm assuming you understand big O
notation, as you talk of experience in many languages over the years.
FSR and UTF-32 both are O(1) for slicing and lookups.  UTF-8, 16 and any
variable-width encoding are always O(n).  A lot slower!

4. Unicode is, well, unicode.  You seem to hop all over the place from
talking about code points to bytes to bits, using them all
interchangeably.  And now you seem to be claiming that a particular byte
encoding standard is by definition unicode (UTF-8).  Or at least that's
how it sounds.  And also claim FSR is not compliant with unicode
standards, which appears to me to be completely false.

Is my understanding of these things wrong?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Configuraion to run pyhton script on ubuntu 12.04

2013-07-28 Thread Jaiky
Sir i already tried this "Alias" concept

I did the following steps
===
Step 1:
added 
"ScriptAlias /cgi-bin/ /var/www/cgi-bin/"

 in   /etc/apache2/sites-available/default


step 2:-

added :-

def main():
app.run()

if __name__ == '__main__':
main()

in  /var/www/cgi-bin/hello_world.py



Now my Configuration   of /etc/apache2/sites-available/default in under




ServerAdmin webmaster@localhost

DocumentRoot /var/www


Options FollowSymLinks
AllowOverride None
AddHandler mod_python .py
PythonHandler mod_python.publisher | .py
AddHandler mod_python .psp .psp_
PythonHandler mod_python.psp | .psp .psp


ScriptAlias /cgi-bin/ /var/www/cgi-bin/

Options Indexes FollowSymLinks MultiViews ExecCGI
AllowOverride None
Order allow,deny
allow from all
AddHandler cgi-script cgi pl


#ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/

AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
AddHandler cgi-script cgi pl
Order allow,deny
Allow from all


ErrorLog ${APACHE_LOG_DIR}/error.log

# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
LogLevel warn

CustomLog ${APACHE_LOG_DIR}/access.log combined


 Alias /doc/ "/usr/share/doc/"

Options Indexes MultiViews FollowSymLinks
AllowOverride None
Order deny,allow
Deny from all
Allow from 127.0.0.0/255.0.0.0 ::1/128




   

===

my code is under /var/www/cgi-bin/hello_world.py




import webapp2

class MainPage(webapp2.RequestHandler):
def get(self):
   self.response.headers['Content-Type'] = 'text/plain'
   self.response.out.write('Hello, webapp World!')

app = webapp2.WSGIApplication([('/', MainPage)],
 debug=True)


def main():
app.run()

if __name__ == '__main__':
main()
=


extra thing i did

in /etc/apache2/mods-available/mod_python.conf
where i created the file "mod_python.conf"




AddHandler mod_python .py .psp
PythonHandler mod_python.publisher | .py
PythonHandler mod_python.psp | .psp





when i run localhost/cgi-bin/hello_world.py


error i get 


Not Found

The requested URL /cgi-bin/hello_world.py was not found on this server.
Apache/2.2.22 (Ubuntu) Server at localhost Port 80

#
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Chris Angelico
On Sun, Jul 28, 2013 at 4:52 PM, Michael Torrie  wrote:
> Is my understanding of these things wrong?

No, your understanding of those matters is fine. There's just one area
you seem to be misunderstanding; you appear to think that jmf actually
cares about logical argument. I gave up on that theory a long time
ago, and now I respond for the benefit of those reading, rather than
jmf himself. I've also given up on trying to figure out what he
actually wants; the nearest I can come up with is that he's King
Gama-esque - that he just wants to complain.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Terry Reedy

On 7/28/2013 11:52 AM, Michael Torrie wrote:


3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
slicing a string would be very very slow,


Not necessarily so. See below.


and that's unacceptable for
the use cases of python strings.  I'm assuming you understand big O
notation, as you talk of experience in many languages over the years.
FSR and UTF-32 both are O(1) for slicing and lookups.


Slicing is at least O(m) where m is the length of the slice.


UTF-8, 16 and any variable-width encoding are always O(n).\


I posted about a week ago, in response to Chris A., a method by which 
lookup for UTF-16 can be made O(log2 k), or perhaps more accurately, 
O(1+log2(k+1)), where k is the number of non-BMP chars in the string.


This uses an auxiliary array of k ints. An auxiliary array of n ints 
would make UFT-16 lookup O(1), but then one is using more space than 
with UFT-32. Similar comments apply to UTF-8.


The unicode standard says that a single strings should use exactly one 
coding scheme. It does *not* say that all strings in an application must 
use the same scheme. I just rechecked a few days ago. It also does not 
say that an application cannot associate additional data with a string 
to make processing of the string easier.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: [Savoynet] G&S Opera Co: Pirates of Penzance

2013-07-28 Thread Chris Angelico
On Sun, Jul 28, 2013 at 6:36 PM, David Patterson
 wrote:
> By the way, Chris, I think the book that Ruth brought on was probably
> supposed to be Debretts Peerage.  I couldn't see the cover clearly but it
> would have been a more logical choice in view of the circumstances.

Sure. Makes no difference what the book actually is, and I'm not
someone who knows these things (people show me caricatures and I have
no idea who they're of, largely due to not watching TV). I've edited
the post to say Debrett's Peerage, thanks for the tip.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Chris Angelico
On Sun, Jul 28, 2013 at 6:36 PM, Terry Reedy  wrote:
> I posted about a week ago, in response to Chris A., a method by which lookup
> for UTF-16 can be made O(log2 k), or perhaps more accurately,
> O(1+log2(k+1)), where k is the number of non-BMP chars in the string.
>

Which is an optimization choice that favours strings containing very
few non-BMP characters. To justify the extra complexity of out-of-band
storage, you would need to be working with almost exclusively the BMP.
That would drastically improve jmf's microbenchmarks which do exactly
that, but it would penalize strings that are almost exclusively
higher-codepoint characters. Its quality, then, would be based on a
major survey of string usage: are there enough strings with
mostly-BMP-but-a-few-SMP? Bearing in mind that pure BMP is handled
better by PEP 393, so this is only of value when there are actually
those mixed strings.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
> On Sat, Jul 27, 2013 at 12:21 PM,   wrote:
> 
> > Back to utf. utfs are not only elements of a unique set of encoded
> 
> > code points. They have an interesting feature. Each "utf chunk"
> 
> > holds intrisically the character (in fact the code point) it is
> 
> > supposed to represent. In utf-32, the obvious case, it is just
> 
> > the code point. In utf-8, that's the first chunk which helps and
> 
> > utf-16 is a mixed case (utf-8 / utf-32). In other words, in an
> 
> > implementation using bytes, for any pointer position it is always
> 
> > possible to find the corresponding encoded code point and from this
> 
> > the corresponding character without any "programmed" information. See
> 
> > my editor example, how to find the char under the caret? In fact,
> 
> > a silly example, how can the caret can be positioned or moved, if
> 
> > the underlying corresponding encoded code point can not be
> 
> > dicerned!
> 
> 
> 
> Yes, given a pointer location into a utf-8 or utf-16 string, it is
> 
> easy to determine the identity of the code point at that location.
> 
> But this is not often a useful operation, save for resynchronization
> 
> in the case that the string data is corrupted.  The caret of an editor
> 
> does not conceptually correspond to a pointer location, but to a
> 
> character index.  Given a particular character index (e.g. 127504), an
> 
> editor must be able to determine the identity and/or the memory
> 
> location of the character at that index, and for UTF-8 and UTF-16
> 
> without an auxiliary data structure that is a O(n) operation.
> 
> 
> 
> > 2) Take a look at this. Get rid of the overhead.
> 
> >
> 
>  sys.getsizeof('b'*100 + 'c')
> 
> > 126
> 
>  sys.getsizeof('b'*100 + '€')
> 
> > 240
> 
> >
> 
> > What does it mean? It means that Python has to
> 
> > reencode a str every time it is necessary because
> 
> > it works with multiple codings.
> 
> 
> 
> Large strings in practical usage do not need to be resized like this
> 
> often.  Python 3.3 has been in production use for months now, and you
> 
> still have yet to produce any real-world application code that
> 
> demonstrates a performance regression.  If there is no real-world
> 
> regression, then there is no problem.
> 
> 
> 
> > 3) Unicode compliance. We know retrospectively, latin-1,
> 
> > is was a bad choice. Unusable for 17 European languages.
> 
> > Believe of not. 20 years of Unicode of incubation is not
> 
> > long enough to learn it. When discussing once with a French
> 
> > Python core dev, one with commit access, he did not know one
> 
> > can not use latin-1 for the French language!
> 
> 
> 
> Probably because for many French strings, one can.  As far as I am
> 
> aware, the only characters that are missing from Latin-1 are the Euro
> 
> sign (an unfortunate victim of history), the ligature œ (I have no
> 
> doubt that many users just type oe anyway), and the rare capital Ÿ
> 
> (the miniscule version is present in Latin-1).  All French strings
> 
> that are fortunate enough to be absent these characters can be
> 
> represented in Latin-1 and so will have a 1-byte width in the FSR.

--

latin-1? that's not even truth.

>>> sys.getsizeof('a')
26
>>> sys.getsizeof('ü')
38
>>> sys.getsizeof('aa')
27
>>> sys.getsizeof('aü')
39


jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread Joshua Landau
On 28 July 2013 09:45, Antoon Pardon  wrote:

> Op 27-07-13 20:21, [email protected] schreef:
>
>> utf-8 or any (utf) never need and never spend their time
>> in reencoding.
>>
>
> So? That python sometimes needs to do some kind of background
> processing is not a problem, whether it is garbage collection,
> allocating more memory, shufling around data blocks or reencoding a
> string, that doesn't matter. If you've got a real world example where
> one of those things noticeably slows your program down or makes the
> program behave faulty then you have something that is worthy of
> attention.


Somewhat off topic, but befitting of the triviality of this thread, do I
understand correctly that you are saying garbage collection never causes
any noticeable slowdown in real-world circumstances? That's not remotely
true.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread Chris Angelico
On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau  wrote:
> On 28 July 2013 09:45, Antoon Pardon  wrote:
>>
>> Op 27-07-13 20:21, [email protected] schreef:
>>>
>>> utf-8 or any (utf) never need and never spend their time
>>> in reencoding.
>>
>>
>> So? That python sometimes needs to do some kind of background
>> processing is not a problem, whether it is garbage collection,
>> allocating more memory, shufling around data blocks or reencoding a
>> string, that doesn't matter. If you've got a real world example where
>> one of those things noticeably slows your program down or makes the
>> program behave faulty then you have something that is worthy of
>> attention.
>
>
> Somewhat off topic, but befitting of the triviality of this thread, do I
> understand correctly that you are saying garbage collection never causes any
> noticeable slowdown in real-world circumstances? That's not remotely true.

If it's done properly, garbage collection shouldn't hurt the *overall*
performance of the app; most of the issues with GC timing are when one
operation gets unexpectedly delayed for a GC run (making performance
measurement hard, and such). It should certainly never cause your
program to behave faultily, though I have seen cases where the GC run
appears to cause the program to crash - something like this:

some_string = buggy_call()
...
gc()
...
print(some_string)

The buggy call mucked up the reference count, so the gc run actually
wiped the string from memory - resulting in a segfault on next usage.
But the GC wasn't at fault, the original call was. (Which, btw, was
quite a debugging search, especially since the function in question
wasn't my code.)

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


sqlite3 version lacks instr

2013-07-28 Thread Joseph L. Casale
I have some queries that utilize instr wrapped by substr but the old
version shipped in 2.7.5 doesn't have instr support.

Has anyone encountered this and utilized other existing functions
within the shipped 3.6.21 sqlite version to accomplish this?

Thanks,
jlc
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: sqlite3 version lacks instr

2013-07-28 Thread Joseph L. Casale
> Has anyone encountered this and utilized other existing functions
> within the shipped 3.6.21 sqlite version to accomplish this?

Sorry guys, forgot about create_function...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread MRAB

On 28/07/2013 19:13, [email protected] wrote:

Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :

On Sat, Jul 27, 2013 at 12:21 PM,   wrote:

> Back to utf. utfs are not only elements of a unique set of encoded

> code points. They have an interesting feature. Each "utf chunk"

> holds intrisically the character (in fact the code point) it is

> supposed to represent. In utf-32, the obvious case, it is just

> the code point. In utf-8, that's the first chunk which helps and

> utf-16 is a mixed case (utf-8 / utf-32). In other words, in an

> implementation using bytes, for any pointer position it is always

> possible to find the corresponding encoded code point and from this

> the corresponding character without any "programmed" information. See

> my editor example, how to find the char under the caret? In fact,

> a silly example, how can the caret can be positioned or moved, if

> the underlying corresponding encoded code point can not be

> dicerned!



Yes, given a pointer location into a utf-8 or utf-16 string, it is

easy to determine the identity of the code point at that location.

But this is not often a useful operation, save for resynchronization

in the case that the string data is corrupted.  The caret of an editor

does not conceptually correspond to a pointer location, but to a

character index.  Given a particular character index (e.g. 127504), an

editor must be able to determine the identity and/or the memory

location of the character at that index, and for UTF-8 and UTF-16

without an auxiliary data structure that is a O(n) operation.



> 2) Take a look at this. Get rid of the overhead.

>

 sys.getsizeof('b'*100 + 'c')

> 126

 sys.getsizeof('b'*100 + '€')

> 240

>

> What does it mean? It means that Python has to

> reencode a str every time it is necessary because

> it works with multiple codings.



Large strings in practical usage do not need to be resized like this

often.  Python 3.3 has been in production use for months now, and you

still have yet to produce any real-world application code that

demonstrates a performance regression.  If there is no real-world

regression, then there is no problem.



> 3) Unicode compliance. We know retrospectively, latin-1,

> is was a bad choice. Unusable for 17 European languages.

> Believe of not. 20 years of Unicode of incubation is not

> long enough to learn it. When discussing once with a French

> Python core dev, one with commit access, he did not know one

> can not use latin-1 for the French language!



Probably because for many French strings, one can.  As far as I am

aware, the only characters that are missing from Latin-1 are the Euro

sign (an unfortunate victim of history), the ligature œ (I have no

doubt that many users just type oe anyway), and the rare capital Ÿ

(the miniscule version is present in Latin-1).  All French strings

that are fortunate enough to be absent these characters can be

represented in Latin-1 and so will have a 1-byte width in the FSR.


--

latin-1? that's not even truth.


sys.getsizeof('a')

26

sys.getsizeof('ü')

38

sys.getsizeof('aa')

27

sys.getsizeof('aü')

39



>>> sys.getsizeof('aa') - sys.getsizeof('a')
1

One byte per codepoint.

>>> sys.getsizeof('üü') - sys.getsizeof('ü')
1

Also one byte per codepoint.

>>> sys.getsizeof('ü') - sys.getsizeof('a')
12

Clearly there's more going on here.

FSR is an optimisation. You'll always be able to find some
circumstances where an optimisation makes things worse, but what
matters is the overall result.

--
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread Terry Reedy

On 7/28/2013 2:29 PM, Chris Angelico wrote:

On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau  wrote:



Somewhat off topic, but befitting of the triviality of this thread, do I
understand correctly that you are saying garbage collection never causes any
noticeable slowdown in real-world circumstances? That's not remotely true.


If it's done properly, garbage collection shouldn't hurt the *overall*
performance of the app;


There are situations, some discussed on this list, where doing gc 
'right' means turning off the cycle garbage collector. As I remember, an 
example is creating a list of a million tuples, which otherwise triggers 
a lot of useless background bookkeeping. The cyclic gc is tuned for 
'normal' use patterns.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


RE: sqlite3 version lacks instr

2013-07-28 Thread Peter Otten
Joseph L. Casale wrote:

>> Has anyone encountered this and utilized other existing functions
>> within the shipped 3.6.21 sqlite version to accomplish this?
> 
> Sorry guys, forgot about create_function...

Too late, I already did the demo ;)

>>> import sqlite3
>>> db = sqlite3.connect(":memory:")
>>> cs = db.cursor()
>>> cs.execute('select instr("the quick brown fox", "brown")').fetchone()[0]
Traceback (most recent call last):
  File "", line 1, in 
sqlite3.OperationalError: no such function: instr
>>> def instr(a, b):
... return a.find(b) + 1 # add NULL-handling etc.
... 
>>> db.create_function("instr", 2, instr)
>>> cs.execute('select instr("the quick brown fox", "brown")').fetchone()[0]
11
>>> cs.execute('select instr("the quick brown fox", "red")').fetchone()[0]
0


-- 
http://mail.python.org/mailman/listinfo/python-list


embedding: how to create an "idle" handler to allow user to kill scripts?

2013-07-28 Thread David M. Cotter
in my C++ app, on the main thread i init python, init threads, then call 
PyEval_SaveThread(), since i'm not going to do any more python on the main 
thread.

then when the user invokes a script, i launch a preemptive thread 
(boost::threads), and from there, i have this:

static int  CB_S_Idle(void *in_thiz) {
CT_RunScript*thiz((CT_RunScript *)in_thiz);

return thiz->Idle();
}

int Idle()
{
int resultI = 0;
OSStatuserr = noErr;

ERR(i_taskRecP->MT_UpdateData(&i_progData));

if (err) {
resultI = -1;
}

ERR(ScheduleIdleCall());
return err;
}

int ScheduleIdleCall()
{
int resultI(Py_AddPendingCall(CB_S_Idle, this));
CFAbsoluteTime  timeT(CFAbsoluteTimeGetCurrent());
SuperString str; str.Set(timeT, SS_Time_LOG);

Logf("$$$ Python idle: (%d) %s\n", resultI, str.utf8Z());
return resultI;
}

virtual OSStatusoperator()(OSStatus err) {
ScPyGILStatesc;

ERR(ScheduleIdleCall());
ERR(PyRun_SimpleString(i_script.utf8Z()));
return err;
}

so, my operator() gets called, and i try to schedule an Idle call, which 
succeeds, then i run my script.  however, the CB_S_Idle() never gets called?

the MT_UpdateData() function returns an error if the user had canceled the 
script

must i schedule a run-loop on the main thread or something to get it to be 
called?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit :
> On 07/27/2013 12:21 PM, [email protected] wrote:
> 
> > Good point. FSR, nice tool for those who wish to teach
> 
> > Unicode. It is not every day, one has such an opportunity.
> 
> 
> 
> I had a long e-mail composed, but decided to chop it down, but still too
> 
> long.  so I ditched a lot of the context, which jmf also seems to do.
> 
> Apologies.
> 
> 
> 
> 1. FSR *is* UTF-32 so it is as unicode compliant as UTF-32, since UTF-32
> 
> is an official encoding.  FSR only differs from UTF-32 in that the
> 
> padding zeros are stripped off such that it is stored in the most
> 
> compact form that can handle all the characters in string, which is
> 
> always known at string creation time.  Now you can argue many things,
> 
> but to say FSR is not unicode compliant is quite a stretch!  What
> 
> unicode entities or characters cannot be stored in strings using FSR?
> 
> What sequences of bytes in FSR result in invalid Unicode entities?
> 
> 
> 
> 2. strings in Python *never change*.  They are immutable.  The +
> 
> operator always copies strings character by character into a new string
> 
> object, even if Python had used UTF-8 internally.  If you're doing a lot
> 
> of string concatenations, perhaps you're using the wrong data type.  A
> 
> byte buffer might be better for you, where you can stuff utf-8 sequences
> 
> into it to your heart's content.
> 
> 
> 
> 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
> 
> slicing a string would be very very slow, and that's unacceptable for
> 
> the use cases of python strings.  I'm assuming you understand big O
> 
> notation, as you talk of experience in many languages over the years.
> 
> FSR and UTF-32 both are O(1) for slicing and lookups.  UTF-8, 16 and any
> 
> variable-width encoding are always O(n).  A lot slower!
> 
> 
> 
> 4. Unicode is, well, unicode.  You seem to hop all over the place from
> 
> talking about code points to bytes to bits, using them all
> 
> interchangeably.  And now you seem to be claiming that a particular byte
> 
> encoding standard is by definition unicode (UTF-8).  Or at least that's
> 
> how it sounds.  And also claim FSR is not compliant with unicode
> 
> standards, which appears to me to be completely false.
> 
> 
> 
> Is my understanding of these things wrong?

--

Compare these (a BDFL exemple, where I'using a non-ascii char)

Py 3.2 (narrow build)

>>> timeit.timeit("a = 'hundred'; 'x' in a")
0.09897159682121348
>>> timeit.timeit("a = 'hundre€'; 'x' in a")
0.09079501961732461
>>> sys.getsizeof('d')
32
>>> sys.getsizeof('€')
32
>>> sys.getsizeof('dd')
34
>>> sys.getsizeof('d€')
34


Py3.3

>>> timeit.timeit("a = 'hundred'; 'x' in a")
0.12183182740848858
>>> timeit.timeit("a = 'hundre€'; 'x' in a")
0.2365732969632326
>>> sys.getsizeof('d')
26
>>> sys.getsizeof('€')
40
>>> sys.getsizeof('dd')
27
>>> sys.getsizeof('d€')
42

Tell me which one seems to be more "unicode compliant"?
The goal of Unicode is to handle every char "equaly".

Now, the problem: memory. Do not forget that à la "FSR"
mechanism for a non-ascii user is *irrelevant*. As
soon as one uses one single non-ascii, your ascii feature
is lost. (That why we have all these dedicated coding
schemes, utfs included).

>>> sys.getsizeof('abc' * 1000 + 'z')
3026
>>> sys.getsizeof('abc' * 1000 + '\U00010010')
12044

A bit secret. The larger a repertoire of characters
is, the more bits you needs.
Secret #2. You can not escape from this.


jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 21:04:56 UTC+2, MRAB a écrit :
> On 28/07/2013 19:13, [email protected] wrote:
> 
> > Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
> 
> >> On Sat, Jul 27, 2013 at 12:21 PM,   wrote:
> 
> >>
> 
> >> > Back to utf. utfs are not only elements of a unique set of encoded
> 
> >>
> 
> >> > code points. They have an interesting feature. Each "utf chunk"
> 
> >>
> 
> >> > holds intrisically the character (in fact the code point) it is
> 
> >>
> 
> >> > supposed to represent. In utf-32, the obvious case, it is just
> 
> >>
> 
> >> > the code point. In utf-8, that's the first chunk which helps and
> 
> >>
> 
> >> > utf-16 is a mixed case (utf-8 / utf-32). In other words, in an
> 
> >>
> 
> >> > implementation using bytes, for any pointer position it is always
> 
> >>
> 
> >> > possible to find the corresponding encoded code point and from this
> 
> >>
> 
> >> > the corresponding character without any "programmed" information. See
> 
> >>
> 
> >> > my editor example, how to find the char under the caret? In fact,
> 
> >>
> 
> >> > a silly example, how can the caret can be positioned or moved, if
> 
> >>
> 
> >> > the underlying corresponding encoded code point can not be
> 
> >>
> 
> >> > dicerned!
> 
> >>
> 
> >>
> 
> >>
> 
> >> Yes, given a pointer location into a utf-8 or utf-16 string, it is
> 
> >>
> 
> >> easy to determine the identity of the code point at that location.
> 
> >>
> 
> >> But this is not often a useful operation, save for resynchronization
> 
> >>
> 
> >> in the case that the string data is corrupted.  The caret of an editor
> 
> >>
> 
> >> does not conceptually correspond to a pointer location, but to a
> 
> >>
> 
> >> character index.  Given a particular character index (e.g. 127504), an
> 
> >>
> 
> >> editor must be able to determine the identity and/or the memory
> 
> >>
> 
> >> location of the character at that index, and for UTF-8 and UTF-16
> 
> >>
> 
> >> without an auxiliary data structure that is a O(n) operation.
> 
> >>
> 
> >>
> 
> >>
> 
> >> > 2) Take a look at this. Get rid of the overhead.
> 
> >>
> 
> >> >
> 
> >>
> 
> >>  sys.getsizeof('b'*100 + 'c')
> 
> >>
> 
> >> > 126
> 
> >>
> 
> >>  sys.getsizeof('b'*100 + '€')
> 
> >>
> 
> >> > 240
> 
> >>
> 
> >> >
> 
> >>
> 
> >> > What does it mean? It means that Python has to
> 
> >>
> 
> >> > reencode a str every time it is necessary because
> 
> >>
> 
> >> > it works with multiple codings.
> 
> >>
> 
> >>
> 
> >>
> 
> >> Large strings in practical usage do not need to be resized like this
> 
> >>
> 
> >> often.  Python 3.3 has been in production use for months now, and you
> 
> >>
> 
> >> still have yet to produce any real-world application code that
> 
> >>
> 
> >> demonstrates a performance regression.  If there is no real-world
> 
> >>
> 
> >> regression, then there is no problem.
> 
> >>
> 
> >>
> 
> >>
> 
> >> > 3) Unicode compliance. We know retrospectively, latin-1,
> 
> >>
> 
> >> > is was a bad choice. Unusable for 17 European languages.
> 
> >>
> 
> >> > Believe of not. 20 years of Unicode of incubation is not
> 
> >>
> 
> >> > long enough to learn it. When discussing once with a French
> 
> >>
> 
> >> > Python core dev, one with commit access, he did not know one
> 
> >>
> 
> >> > can not use latin-1 for the French language!
> 
> >>
> 
> >>
> 
> >>
> 
> >> Probably because for many French strings, one can.  As far as I am
> 
> >>
> 
> >> aware, the only characters that are missing from Latin-1 are the Euro
> 
> >>
> 
> >> sign (an unfortunate victim of history), the ligature œ (I have no
> 
> >>
> 
> >> doubt that many users just type oe anyway), and the rare capital Ÿ
> 
> >>
> 
> >> (the miniscule version is present in Latin-1).  All French strings
> 
> >>
> 
> >> that are fortunate enough to be absent these characters can be
> 
> >>
> 
> >> represented in Latin-1 and so will have a 1-byte width in the FSR.
> 
> >
> 
> > --
> 
> >
> 
> > latin-1? that's not even truth.
> 
> >
> 
>  sys.getsizeof('a')
> 
> > 26
> 
>  sys.getsizeof('ü')
> 
> > 38
> 
>  sys.getsizeof('aa')
> 
> > 27
> 
>  sys.getsizeof('aü')
> 
> > 39
> 
> >
> 
> 
> 
>  >>> sys.getsizeof('aa') - sys.getsizeof('a')
> 
> 1
> 
> 
> 
> One byte per codepoint.
> 
> 
> 
>  >>> sys.getsizeof('üü') - sys.getsizeof('ü')
> 
> 1
> 
> 
> 
> Also one byte per codepoint.
> 
> 
> 
>  >>> sys.getsizeof('ü') - sys.getsizeof('a')
> 
> 12
> 
> 
> 
> Clearly there's more going on here.
> 
> 
> 
> FSR is an optimisation. You'll always be able to find some
> 
> circumstances where an optimisation makes things worse, but what
> 
> matters is the overall result.




Yes, I know my examples are always wrong, never
real examples.

I can point long strings, I should point short strings.
I point a short string (char), it is not long enough.
Strings as dict keys, no the problem is in Python dict.
Performance? no that's a memory issue.
Memory? no, it's a question to keep perfomance.
I am using t

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread MRAB

On 28/07/2013 20:23, [email protected] wrote:
[snip]


Compare these (a BDFL exemple, where I'using a non-ascii char)

Py 3.2 (narrow build)


Why are you using a narrow build of Python 3.2? It doesn't treat all
codepoints equally (those outside the BMP can't be stored in one code
unit) and, therefore, it isn't "Unicode compliant"!


timeit.timeit("a = 'hundred'; 'x' in a")

0.09897159682121348

timeit.timeit("a = 'hundre€'; 'x' in a")

0.09079501961732461

sys.getsizeof('d')

32

sys.getsizeof('€')

32

sys.getsizeof('dd')

34

sys.getsizeof('d€')

34


Py3.3


timeit.timeit("a = 'hundred'; 'x' in a")

0.12183182740848858

timeit.timeit("a = 'hundre€'; 'x' in a")

0.2365732969632326

sys.getsizeof('d')

26

sys.getsizeof('€')

40

sys.getsizeof('dd')

27

sys.getsizeof('d€')

42

Tell me which one seems to be more "unicode compliant"?
The goal of Unicode is to handle every char "equaly".

Now, the problem: memory. Do not forget that à la "FSR"
mechanism for a non-ascii user is *irrelevant*. As
soon as one uses one single non-ascii, your ascii feature
is lost. (That why we have all these dedicated coding
schemes, utfs included).


sys.getsizeof('abc' * 1000 + 'z')

3026

sys.getsizeof('abc' * 1000 + '\U00010010')

12044

A bit secret. The larger a repertoire of characters
is, the more bits you needs.
Secret #2. You can not escape from this.


jmf



--
http://mail.python.org/mailman/listinfo/python-list


Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Antoon Pardon

Op 28-07-13 21:23, [email protected] schreef:

Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit :

On 07/27/2013 12:21 PM, [email protected] wrote:


Good point. FSR, nice tool for those who wish to teach



Unicode. It is not every day, one has such an opportunity.




I had a long e-mail composed, but decided to chop it down, but still too

long.  so I ditched a lot of the context, which jmf also seems to do.

Apologies.



1. FSR *is* UTF-32 so it is as unicode compliant as UTF-32, since UTF-32

is an official encoding.  FSR only differs from UTF-32 in that the

padding zeros are stripped off such that it is stored in the most

compact form that can handle all the characters in string, which is

always known at string creation time.  Now you can argue many things,

but to say FSR is not unicode compliant is quite a stretch!  What

unicode entities or characters cannot be stored in strings using FSR?

What sequences of bytes in FSR result in invalid Unicode entities?



2. strings in Python *never change*.  They are immutable.  The +

operator always copies strings character by character into a new string

object, even if Python had used UTF-8 internally.  If you're doing a lot

of string concatenations, perhaps you're using the wrong data type.  A

byte buffer might be better for you, where you can stuff utf-8 sequences

into it to your heart's content.



3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that

slicing a string would be very very slow, and that's unacceptable for

the use cases of python strings.  I'm assuming you understand big O

notation, as you talk of experience in many languages over the years.

FSR and UTF-32 both are O(1) for slicing and lookups.  UTF-8, 16 and any

variable-width encoding are always O(n).  A lot slower!



4. Unicode is, well, unicode.  You seem to hop all over the place from

talking about code points to bytes to bits, using them all

interchangeably.  And now you seem to be claiming that a particular byte

encoding standard is by definition unicode (UTF-8).  Or at least that's

how it sounds.  And also claim FSR is not compliant with unicode

standards, which appears to me to be completely false.



Is my understanding of these things wrong?


--

Compare these (a BDFL exemple, where I'using a non-ascii char)

Py 3.2 (narrow build)


timeit.timeit("a = 'hundred'; 'x' in a")

0.09897159682121348

timeit.timeit("a = 'hundre€'; 'x' in a")

0.09079501961732461

sys.getsizeof('d')

32

sys.getsizeof('€')

32

sys.getsizeof('dd')

34

sys.getsizeof('d€')

34


Py3.3


timeit.timeit("a = 'hundred'; 'x' in a")

0.12183182740848858

timeit.timeit("a = 'hundre€'; 'x' in a")

0.2365732969632326

sys.getsizeof('d')

26

sys.getsizeof('€')

40

sys.getsizeof('dd')

27

sys.getsizeof('d€')

42

Tell me which one seems to be more "unicode compliant"?


Cant tell, you give no relevant information on which one can decide
this question.


The goal of Unicode is to handle every char "equaly".


Not to this kind of detail, which is looking at irrelevant
implementation details.


Now, the problem: memory. Do not forget that à la "FSR"
mechanism for a non-ascii user is *irrelevant*. As
soon as one uses one single non-ascii, your ascii feature
is lost. (That why we have all these dedicated coding
schemes, utfs included).


So? Why should that trouble me? As far as I understand
whether I have an ascii string or not is totally irrelevant
to the application programmer. Within the application I
just process strings and let the programming environment
keep track of these details in a transparant way unless
you start looking at things like getsizeof, which gives
you implementation details that are mostly irrelevant
in deciding whether the behaviour is compliant or not.


sys.getsizeof('abc' * 1000 + 'z')

3026

sys.getsizeof('abc' * 1000 + '\U00010010')

12044

A bit secret. The larger a repertoire of characters
is, the more bits you needs.
Secret #2. You can not escape from this.


And totally unimportant for deciding complyance.

--
Antoon Pardon

--
http://mail.python.org/mailman/listinfo/python-list


collections.Counter surprisingly slow

2013-07-28 Thread Roy Smith
I've been doing an informal "intro to Python" lunchtime series for some 
co-workers (who are all experienced programmers, in other languages).  
This week I was going to cover list comprehensions, exceptions, and 
profiling. So, I did a little demo showing different ways to build a 
dictionary counting how many times a string appears in some input:

   test() implements a "look before you leap" python loop

   exception() uses a try/catch construct in a similar python loop

   default() uses a defaultdict

   count() uses a Counter

I profiled it, to show how the profiler works.  The full code is below. 
 The input is an 8.8 Mbyte file containing about 570,000 lines (11,000 
unique strings).  Python 2.7.3 on Ubuntu Precise.

As I expected, test() is slower than exception(), which is slower than 
default().  I'm rather shocked to discover that count() is the slowest 
of all!  I expected it to be the fastest.  Or, certainly, no slower 
than default().

The full profiler dump is at the end of this message, but the gist of 
it is:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0000.0000.3220.322 ./stations.py:42(count)
10.1590.1590.1590.159 ./stations.py:17(test)
10.1140.1140.1140.114 ./stations.py:27(exception)
10.0970.0970.0970.097 ./stations.py:36(default)

Why is count() [i.e. collections.Counter] so slow?

-
from collections import defaultdict, Counter

def main():
lines = open('stations.txt').readlines()

d1 = test(lines)
d2 = exception(lines)
d3 = default(lines)
d4 = count(lines)

print d1 == d2
print d1 == d3
print d1 == d4

def test(lines):
d = {}
for station in lines:
if station in d:
d[station] += 1
else:
d[station] = 1
return d


def exception(lines):
d = {}
for station in lines:
try:
d[station] += 1
except KeyError:
d[station] = 1
return d

def default(lines):
d = defaultdict(int)
for station in lines:
d[station] += 1
return d

def count(lines):
d = Counter(lines)
return d


if __name__ == '__main__':
import cProfile
import pstats
cProfile.run('main()', 'stations.stats')
p = pstats.Stats('stations.stats')
p.sort_stats('cumulative').print_stats()
-

 570335 function calls (570332 primitive calls) in 0.776 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   10.0230.0230.7760.776 :1()
   10.0060.0060.7530.753 ./stations.py:5(main)
   10.0000.0000.3220.322 ./stations.py:42(count)
   10.0000.0000.3220.322 
/usr/lib/python2.7/collections.py:407(__init__)
   10.2420.2420.3220.322 
/usr/lib/python2.7/collections.py:470(update)
   10.1590.1590.1590.159 ./stations.py:17(test)
   10.1140.1140.1140.114 ./stations.py:27(exception)
   10.0970.0970.0970.097 ./stations.py:36(default)
   5702850.0800.0000.0800.000 {method 'get' of 'dict' objects}
   10.0550.0550.0550.055 {method 'readlines' of 'file' objects}
   10.0000.0000.0000.000 {isinstance}
   10.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/abc.py:128(__instancecheck__)
  2/10.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/abc.py:148(__subclasscheck__)
  3/10.0000.0000.0000.000 {issubclass}
40.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:58(__iter__)
10.0000.0000.0000.000 {open}
20.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:26(__exit__)
20.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:20(__enter__)
20.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:36(__init__)
30.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:68(__contains__)
20.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:52(_commit_removals)
20.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:81(add)
30.0000.0000.0000.000 {getattr}
20.0000.0000.0000.000 
/home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:16(__init__)
20.0000.0000.0000.000 {method 'remove' of 'set' objects}
20.0000.0000.0000.000 {method '__subclasses__' of 
'type' obje

Re: RE Module Performance

2013-07-28 Thread Lele Gaifax
[email protected] writes:

> Suggestion. Start by solving all these "micro-benchmarks".
> all the memory cases. It a good start, no?

Since you seem the only one who has this dramatic problem with such
micro-benchmarks, that BTW have nothing to do with "unicode compliance",
I'd suggest *you* should find a better implementation and propose it to
the core devs.

An even better suggestion, with due respect, is to get a life and find
something more interesting to do, or at least better arguments :-)

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
[email protected]  | -- Fortunato Depero, 1929.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: collections.Counter surprisingly slow

2013-07-28 Thread Steven D'Aprano
On Sun, 28 Jul 2013 15:59:04 -0400, Roy Smith wrote:

[...]
> I'm rather shocked to discover that count() is the slowest
> of all!  I expected it to be the fastest.  Or, certainly, no slower than
> default().
> 
> The full profiler dump is at the end of this message, but the gist of it
> is:
> 
> ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> 10.0000.0000.3220.322 ./stations.py:42(count)
> 10.1590.1590.1590.159 ./stations.py:17(test)
> 10.1140.1140.1140.114 ./stations.py:27(exception)
> 10.0970.0970.0970.097 ./stations.py:36(default)
> 
> Why is count() [i.e. collections.Counter] so slow?

It's within a factor of 2 of test, and 3 of exception or default (give or 
take). I don't think that's surprisingly slow. In 2.7, Counter is written 
in Python, while defaultdict has an accelerated C version. I expect that 
has something to do with it.

Calling Counter ends up calling essentially this code:

for elem in iterable:
self[elem] = self.get(elem, 0) + 1

(although micro-optimized), where "iterable" is your data (lines). 
Calling the get method has higher overhead than dict[key], that will also 
contribute.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Steven D'Aprano
On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote:

> Do not forget that à la "FSR" mechanism for a non-ascii user is
> *irrelevant*.

You have been told repeatedly, Python's internals are *full* of ASCII-
only strings.

py> dir(list)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', 
'__dir__', '__doc__', '__eq__', '__format__', '__ge__', 
'__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', 
'__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', 
'__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', 
'__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', 
'__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 
'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

There's 45 ASCII-only strings right there, in only one built-in type, out 
of dozens. There are dozens, hundreds of ASCII-only strings in Python: 
builtin functions and classes, attributes, exceptions, internal 
attributes, variable names, and so on.

You already know this, and yet you persist in repeating nonsense.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: collections.Counter surprisingly slow

2013-07-28 Thread Roy Smith
In article <[email protected]>,
 Steven D'Aprano  wrote:

> > Why is count() [i.e. collections.Counter] so slow?
> 
> It's within a factor of 2 of test, and 3 of exception or default (give or 
> take). I don't think that's surprisingly slow.

It is for a module which describes itself as "High-performance container 
datatypes" :-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread Joshua Landau
On 28 July 2013 19:29, Chris Angelico  wrote:

> On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau  wrote:
> > On 28 July 2013 09:45, Antoon Pardon 
> wrote:
> >>
> >> Op 27-07-13 20:21, [email protected] schreef:
> >>>
> >>> utf-8 or any (utf) never need and never spend their time
> >>> in reencoding.
> >>
> >>
> >> So? That python sometimes needs to do some kind of background
> >> processing is not a problem, whether it is garbage collection,
> >> allocating more memory, shufling around data blocks or reencoding a
> >> string, that doesn't matter. If you've got a real world example where
> >> one of those things noticeably slows your program down or makes the
> >> program behave faulty then you have something that is worthy of
> >> attention.
> >
> >
> > Somewhat off topic, but befitting of the triviality of this thread, do I
> > understand correctly that you are saying garbage collection never causes
> any
> > noticeable slowdown in real-world circumstances? That's not remotely
> true.
>
> If it's done properly, garbage collection shouldn't hurt the *overall*
> performance of the app; most of the issues with GC timing are when one
> operation gets unexpectedly delayed for a GC run (making performance
> measurement hard, and such). It should certainly never cause your
> program to behave faultily, though I have seen cases where the GC run
> appears to cause the program to crash - something like this:
>
> some_string = buggy_call()
> ...
> gc()
> ...
> print(some_string)
>
> The buggy call mucked up the reference count, so the gc run actually
> wiped the string from memory - resulting in a segfault on next usage.
> But the GC wasn't at fault, the original call was. (Which, btw, was
> quite a debugging search, especially since the function in question
> wasn't my code.)
>

GC does have sometimes severe impact in memory-constrained environments,
though. See http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/,
about half-way down, specifically
http://sealedabstract.com/wp-content/uploads/2013/05/Screen-Shot-2013-05-14-at-10.15.29-PM.png
.

The best verification of these graphs I could find was
https://blog.mozilla.org/nnethercote/category/garbage-collection/, although
it's not immediately clear in Chrome's and Opera's case mainly due to none
of the benchmarks pushing memory usage significantly.

I also don't quite agree with the first post (sealedabstract) because I get
by *fine* on 2GB memory, so I don't see why you can't on a phone. Maybe IOS
is just really heavy. Nonetheless, the benchmarks aren't lying.
-- 
http://mail.python.org/mailman/listinfo/python-list


email 8bit encoding

2013-07-28 Thread rurpy
How, using Python-3.3's email module, do I "flatten" (I think
that's the right term) a Message object to get utf-8 encoded
body with the headers:
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
when the message payload was set to a python (unicode) string?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Savoynet] G&S Opera Co: Pirates of Penzance

2013-07-28 Thread Ethan Furman

On 07/28/2013 10:57 AM, Chris Angelico wrote:
   .
   .
   .

Okay, how did you get confused that this was a Python List question?  ;)

--
~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: [Savoynet] G&S Opera Co: Pirates of Penzance

2013-07-28 Thread Tim Chase
On 2013-07-28 16:49, Ethan Furman wrote:
> On 07/28/2013 10:57 AM, Chris Angelico wrote:
> .
> .
> .
> 
> Okay, how did you get confused that this was a Python List
> question?  ;)

Must have been this ancient thread

http://mail.python.org/pipermail/python-list/2006-September/376063.html

:-)

-tkc



-- 
http://mail.python.org/mailman/listinfo/python-list


dynamic type returning NameError:

2013-07-28 Thread Tim O'Callaghan
Hi, 

I hope that this hasn't been asked for the millionth time, so my apologies if 
it has. 

I have a base class (BaseClass - we'll call it for this example) with an http 
call that i would like to inherit into a dynamic class at runtime. We'll call 
that method in BaseClass;  'request'. 

I have a dictionary(json) of key (class name): value(method) that I would like 
to create inheriting this 'request' method from the BaseClass. So the derived 
class would look something like this

definition in json:
{"Whatever": [{"method1": "Some Default", "async": True},{"method2": "Some 
Other Default", "async": True}]}

Ideally I'd like the class def to look something like this if i were to type it 
out by hand

[excuse the indents]

class Whatever(BaseClass):
def method1(self):
stupid_data = super(Whatever, self).request("method1")
return stupid_data

 def method2(self):
stupid_data = super(Whatever, self).request("method1")
return stupid_data

Now, I've been trying to do this using the python cli, with out success. 

So, attempting this at runtime I get a plethora of wonderful errors that I 
suspect has broken my brain. 

Here is what i've tried:

# trying with just an empty object of type BaseClass
obj = type("Object", (BaseClass,), {})

whatever = type("WhatEver", (obj,), {"method1": super(WhatEver, 
self).request("method1")})

but when i try this I get 'NameError: name 'self' is not defined'

defining these classes manually works... 

I hope that this was clear enough, apologies if it wasn't. It's late(ish), I'm 
tired and borderline frustrated :) But enough about me...

Thanks in advance. 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dynamic type returning NameError:

2013-07-28 Thread Terry Reedy

On 7/28/2013 9:38 PM, Tim O'Callaghan wrote:

Hi,

I hope that this hasn't been asked for the millionth time, so my apologies if 
it has.

I have a base class (BaseClass - we'll call it for this example) with an http 
call that i would like to inherit into a dynamic class at runtime. We'll call 
that method in BaseClass;  'request'.

I have a dictionary(json) of key (class name): value(method) that I would like 
to create inheriting this 'request' method from the BaseClass. So the derived 
class would look something like this

definition in json:
{"Whatever": [{"method1": "Some Default", "async": True},{"method2": "Some Other 
Default", "async": True}]}

Ideally I'd like the class def to look something like this if i were to type it 
out by hand

[excuse the indents]

class Whatever(BaseClass):
 def method1(self):
 stupid_data = super(Whatever, self).request("method1")
 return stupid_data

  def method2(self):
 stupid_data = super(Whatever, self).request("method1")
 return stupid_data

Now, I've been trying to do this using the python cli, with out success.

So, attempting this at runtime I get a plethora of wonderful errors that I 
suspect has broken my brain.

Here is what i've tried:

# trying with just an empty object of type BaseClass
obj = type("Object", (BaseClass,), {})

whatever = type("WhatEver", (obj,), {"method1": super(WhatEver, 
self).request("method1")})


'method1' has to be mapped to a function object.


but when i try this I get 'NameError: name 'self' is not defined'

defining these classes manually works...

I hope that this was clear enough, apologies if it wasn't. It's late(ish), I'm 
tired and borderline frustrated :) But enough about me...

Thanks in advance.




--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: dynamic type returning NameError:

2013-07-28 Thread Tim O'Callaghan
On Sunday, July 28, 2013 10:51:57 PM UTC-4, Terry Reedy wrote:
> On 7/28/2013 9:38 PM, Tim O'Callaghan wrote:
> 
> > Hi,
> 
> >
> 
> > I hope that this hasn't been asked for the millionth time, so my apologies 
> > if it has.
> 
> >
> 
> > I have a base class (BaseClass - we'll call it for this example) with an 
> > http call that i would like to inherit into a dynamic class at runtime. 
> > We'll call that method in BaseClass;  'request'.
> 
> >
> 
> > I have a dictionary(json) of key (class name): value(method) that I would 
> > like to create inheriting this 'request' method from the BaseClass. So the 
> > derived class would look something like this
> 
> >
> 
> > definition in json:
> 
> > {"Whatever": [{"method1": "Some Default", "async": True},{"method2": "Some 
> > Other Default", "async": True}]}
> 
> >
> 
> > Ideally I'd like the class def to look something like this if i were to 
> > type it out by hand
> 
> >
> 
> > [excuse the indents]
> 
> >
> 
> > class Whatever(BaseClass):
> 
> >  def method1(self):
> 
> >  stupid_data = super(Whatever, self).request("method1")
> 
> >  return stupid_data
> 
> >
> 
> >   def method2(self):
> 
> >  stupid_data = super(Whatever, self).request("method1")
> 
> >  return stupid_data
> 
> >
> 
> > Now, I've been trying to do this using the python cli, with out success.
> 
> >
> 
> > So, attempting this at runtime I get a plethora of wonderful errors that I 
> > suspect has broken my brain.
> 
> >
> 
> > Here is what i've tried:
> 
> >
> 
> > # trying with just an empty object of type BaseClass
> 
> > obj = type("Object", (BaseClass,), {})
> 
> >
> 
> > whatever = type("WhatEver", (obj,), {"method1": super(WhatEver, 
> > self).request("method1")})
> 
> 
> 
> 'method1' has to be mapped to a function object.

But isn't that what calling super is doing? Calling the function object of the 
parent class BaseClass? 

> 
> > but when i try this I get 'NameError: name 'self' is not defined'
> 
> >
> 
> > defining these classes manually works...
> 
> >
> 
> > I hope that this was clear enough, apologies if it wasn't. It's late(ish), 
> > I'm tired and borderline frustrated :) But enough about me...
> 
> >
> 
> > Thanks in advance.
> 
> >
> 
> 
> 
> 
> 
> -- 
> 
> Terry Jan Reedy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dynamic type returning NameError:

2013-07-28 Thread Steven D'Aprano
On Sun, 28 Jul 2013 18:38:10 -0700, Tim O'Callaghan wrote:

> Hi,
> 
> I hope that this hasn't been asked for the millionth time, so my
> apologies if it has.
[...]
> I hope that this was clear enough, apologies if it wasn't.

Clear as mud. 


> It's late(ish), I'm tired and borderline frustrated :)

I see your smiley, but perhaps you would get better results by waiting 
until you can make a better post.

It *really* helps if you post actual "working" (even if "working" means 
"fails in the way I said"), *short*, *simple* code. Often you'll find 
that trying to simplify the problem gives you the insight to solve the 
problem yourself.

http://www.sscce.org/


I'm going to try to guess what you're attempting, but I may get it 
completely wrong. Sorry if I do, but hopefully you'll get some insight 
even from my misunderstandings.


> I have a base class (BaseClass - we'll call it for this example) with an
> http call that i would like to inherit into a dynamic class at runtime.
> We'll call that method in BaseClass;  'request'.

If I read this literally, you want to do this:

class BaseClass(DynamicParent):
def request(self):
...

except that DynamicParent isn't known until runtime. Am I close?

Obviously the above syntax won't work, but you can use a factory:

def make_baseclass(parent):
class BaseClass(parent):
def request(self):
...
return BaseClass

class Spam: ...

BaseClass = make_baseclass(Spam)


Or you can use the type() constructor directly:

BaseClass = type('BaseClass', (Spam,), dict_of_methods_and_stuff)


which is probably far less convenient. But all this assumes I read you 
literally, and reading on, I don't think that's what you are after.


> I have a dictionary(json) of key (class name): value(method) that I
> would like to create inheriting this 'request' method from the
> BaseClass. So the derived class would look something like this
> 
> definition in json:
> {"Whatever": [{"method1": "Some Default", "async": True},{"method2":
> "Some Other Default", "async": True}]}


Pure gobbledygook to me. I don't understand what you're talking about, 
how does a derived class turn into JSON? (Could be worse, it could be 
XML.) Is BaseClass the "derived class", or are you talking about 
inheriting from BaseClass? What's "Some Default"? It looks like a string, 
and it certainly isn't a valid method name, not with a space in it.

Where did async and method2 come from? How do these things relate to 
"request" you talk about above? I think you're too close to the problem 
and don't realise that others don't sharing your knowledge of the problem.

But, moving along, if I've understood you correctly, I don't think 
inheritance is the right solution here. I think that composition or 
delegation may be better. Something like this:

class BaseClass:
def request(self):
# Delegate to a method set dynamically, on the instance.
return self.some_method()


a = BaseClass()
a.some_method = one_thing.method1

b = BaseClass()
b.some_method = another_thing.method2


Now you have instance a.request calling method1 of another object, and 
b.request calling method2 of a different object. Does that solve your 
problem, or am I on a wild-goose chase?


> Ideally I'd like the class def to look something like this if i were to
> type it out by hand
> 
> [excuse the indents]
> 
> class Whatever(BaseClass):
> def method1(self):
> stupid_data = super(Whatever, self).request("method1")
> return stupid_data
> 
>  def method2(self):
> stupid_data = super(Whatever, self).request("method1")
> return stupid_data


Since request is not the method you are currently in, the above is 
equivalent to:

class Whatever(BaseClass):
def method1(self):
 return self.request("method1")
def method2(self):
 return self.request("method2")

where "request" is defined by BaseClass, and assuming you don't override 
it in the subclass. (I assume "method1" in your code above was a typo.)


> Now, I've been trying to do this using the python cli, with out success.
> 
> So, attempting this at runtime I get a plethora of wonderful errors that
> I suspect has broken my brain.
> 
> Here is what i've tried:
> 
> # trying with just an empty object of type BaseClass 
> obj = type("Object", (BaseClass,), {})

"obj" here is a class called "Object", inheriting from BaseClass. It 
overrides no methods. Why does it exist? It doesn't do anything.


> whatever = type("WhatEver", (obj,), {"method1": super(WhatEver,
> self).request("method1")})
> 
> but when i try this I get 'NameError: name 'self' is not defined'

This is what you are doing:

* look up names WhatEver and self, in the current scope (i.e. the scope 
where you are running this call to type, which is likely the global 
scope);

* pass those objects (if they exist!) to super(), right now;

* on the object returned, look up the attribute "request", right now;

* call that object with

Re: collections.Counter surprisingly slow

2013-07-28 Thread Serhiy Storchaka

28.07.13 22:59, Roy Smith написав(ла):

  The input is an 8.8 Mbyte file containing about 570,000 lines (11,000
unique strings).


Repeat you tests with totally unique lines.


The full profiler dump is at the end of this message, but the gist of
it is:


Profiler affects execution time. In particular it slowdown Counter 
implementation which uses more function calls. For real world 
measurement use different approach.



Why is count() [i.e. collections.Counter] so slow?


Feel free to contribute a patch which fixes this "wart". Note that 
Counter shouldn't be slowdowned on mostly unique data.



--
http://mail.python.org/mailman/listinfo/python-list


Re: [Savoynet] G&S Opera Co: Pirates of Penzance

2013-07-28 Thread Chris Angelico
On Mon, Jul 29, 2013 at 12:49 AM, Ethan Furman  wrote:
> On 07/28/2013 10:57 AM, Chris Angelico wrote:
>.
>.
>.
>
> Okay, how did you get confused that this was a Python List question?  ;)

*sigh* Because I still haven't gotten around to switching mail clients
to one that has a Reply-List feature. The post I was replying to was
on Savoynet, which is about Gilbert & Sullivan.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list