date:20130714

Re: Understanding other people's code

2013-07-14 Thread CM

> Basically the problem is I am new to the language and this was clearly 
> written by someone who at the moment is far better at it than I am!

Sure, as a beginner, yes, but also it sounds like the programmer didn't 
document it much at all, and that doesn't help you.  I bet s/he didn't always 
use very human readable names for objects/methods/classes, either, eh?

> I'm starting to get pretty worried about my lack of overall progress and so I 
> wondered if anyone out there had some tips and techniques for understanding 
> other peoples code. There has to be 10/15 different scripts with at least 10 
> functions in each file I would say.

Unless the programmer was really super spaghetti coding, I would think that 
there would be some method to the madness, and that the 10-15 scripts each have 
some specific kind of purpose.  The first thing, I'd think (and having not seen 
your codebase) would be to sketch out what those scripts do, and familiarize 
yourself with their names.  

Did the coder use this form for importing from modules?

from client_utils import *

If so, that's going to make your life much harder, because all of the names of 
the module will now be available to the script it was imported into, and yet 
they are not defined in that script.  If s/he had written:

import client_utils

Than at least you would expect lines like this in the script you're looking at:

customer_name = client_utils.GetClient()

Or, if the naming is abstruse, at very least:

cn = client_utils.GC()

It's awful, but at least then you know that GC() is a function within the 
client_utils.py script and you don't have to go searching for it.

If s/he did use "from module import *", then maybe it'd be worth it to re-do 
all the imports in the "import module" style, which will break everything, but 
then force you to go through all the errors and make the names like 
module.FunctionName() instead of just FunctionName().

Some of that depends on how big this project is, of course.

> Literally any idea will help, pen and paper, printing off all the code and 
> doing some sort of highlighting session - anything! 

What tools are you using to work on this code?  Do you have an IDE that has a 
"browse to" function that allows you to click on a name and see where in the 
code above it was defined?  Or does it have UML or something like that?  

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Beazley 4E P.E.R, Page29: Unicode

2013-07-14 Thread Terry Reedy


On 7/13/2013 11:09 PM, [email protected] wrote:

http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-unicode


Is this David Beazley? (You referred to 'DB' later.)


 "directly writing a raw UTF-8 encoded string such as
'Jalape\xc3\xb1o' simply produces a nine-character string U+004A,
U+0061, U+006C, U+0061, U+0070, U+0065, U+00C3, U+00B1, U+006F, which
is probably not what you intended.This is because in UTF-8, the
multi- byte sequence \xc3\xb1 is supposed to represent the single
character U+00F1, not the two characters U+00C3 and U+00B1."

My original question was: Shouldn't this be 8 characters - not 9? He
says: \xc3\xb1 is supposed to represent the single character. However
after some interaction with fellow Pythonistas i'm even more
confused.

With reference to the above para: 1. What does he mean by "writing a
raw UTF-8 encoded string"??


As much respect as I have for DB, I think this is an impossible to parse 
confused statement, fueled by the Python 2 confusion between characters 
and bytes. I suggest forgetting it and the discussion that followed. 
Bytes as bytes can carry any digital information, just as modulated sine 
waves can carry any analog information. In both cases, one can regard 
them as either purely what they are or as encoding information in some 
other form.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Beazley 4E P.E.R, Page29: Unicode

2013-07-14 Thread Joshua Landau

On 14 July 2013 04:09,   wrote:
> http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-unicode
>
> "directly writing a raw UTF-8 encoded string such as 'Jalape\xc3\xb1o' simply 
> produces a nine-character string U+004A, U+0061, U+006C, U+0061, U+0070, 
> U+0065, U+00C3, U+00B1, U+006F, which is probably not what you intended.This 
> is because in UTF-8, the multi- byte sequence \xc3\xb1 is supposed to 
> represent the single character U+00F1, not the two characters U+00C3 and 
> U+00B1."

Correct.

> My original question was: Shouldn't this be 8 characters - not 9?

No, Python tends to be right on these things.

> He says: \xc3\xb1 is supposed to represent the single character. However 
> after some interaction with fellow Pythonistas i'm even more confused.

You would be, given the way he said it.

> With reference to the above para:
> 1. What does he mean by "writing a raw UTF-8 encoded string"??

Well, that doesn't really mean much with no context like he gave it.

> In Python2, once can do 'Jalape funny-n o'. This is a 'bytes' string where 
> each glyph is 1 byte long when stored internally so each glyph is associated 
> with an integer as per charset ASCII or Latin-1. If these charsets have a 
> funny-n glyph then yay! else nay! There is no UTF-8 here!! or UTF-16!! These 
> are plain bytes (8 bits).
>
> Unicode is a really big mapping table between glyphs and integers and are 
> denoted as U or U-.

*Waits for our resident unicode experts to explain why you're actually wrong*

> UTF-8 UTF-16 are encodings to store those big integers in an efficient 
> manner. So when DB says "writing a raw UTF-8 encoded string" - well the only 
> way to do this is to use Python3 where the default string literals are stored 
> in Unicode which then will use a UTF-8 UTF-16 internally to store the bytes 
> in their respective structures; or, one could use u'Jalape' which is unicode 
> in both languages (note the leading 'u').

Correct.

> 2. So assuming this is Python 3: 'Jalape \xYY \xZZ o' (spaces for 
> readability) what DB is saying is that, the stupid-user would expect Jalapeno 
> with a squiggly-n but instead he gets is: Jalape funny1 funny2 o (spaces for 
> readability) -9 glyphs or 9 Unicode-points or 9-UTF8 characters. Correct?

I think so.

> 3. Which leaves me wondering what he means by:
> "This is because in UTF-8, the multi- byte sequence \xc3\xb1 is supposed to 
> represent the single character U+00F1, not the two characters U+00C3 and 
> U+00B1"

He's mixed some things up, AFAICT.

> Could someone take the time to read carefully and clarify what DB is saying??

Here's a simple explanation: you're both wrong (or you're both *almost* right):

As of Python 3:

>>> "\xc3\xb1"
'Ã±'
>>> b"\xc3\xb1".decode()
'ñ'

"WHAT?!" you scream, "THAT'S WRONG!" But it's not. Let me explain.

Python 3's strings want you to give each character separately (*winces
in case I'm wrong*). Python is interpreting the "\xc3" as "\N{LATIN
CAPITAL LETTER A WITH TILDE}" and "\xb1" as "\N{PLUS-MINUS SIGN}"¹.
This means that Python is given *two* characters. Python is basically
doing this:

number = int("c3", 16) # Convert from base16
chr(number) # Turn to the character from the Unicode mapping

When you give Python *raw bytes*, you are saying that this is what the
string looks like *when encoded* -- you are not giving Python Unicode,
but *encoded Unicode*. This means that when you decode it (.decode())
it is free to convert multibyte sections to their relevant characters.

To see how an *encoded string* is not the same as the string itself, see:

>>> "Jalepeño".encode("ASCII", errors="xmlcharrefreplace")
b'Jalepeño'

Those *represent* the same thing, but the first (according to Python)
*is* the thing, the second needs to be *decoded*.

Now, bringing this back to the original:

>>> "\xc3\xb1".encode()
b'\xc3\x83\xc2\xb1'

You can see that the *encoded* bytes represent the *two* characters;
the string you see above is *not the encoded one*. The encoding is
*internal to Python*.

I hope that helps; good luck.

¹ Note that I find the "\N{...}" form much easier to read, and recommend it.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: GeoIP2 for retrieving city and region ?

2013-07-14 Thread Νικόλας


Στις 14/7/2013 8:24 πμ, ο/η Chris Angelico έγραψε:

On Sun, Jul 14, 2013 at 3:18 PM, ���  wrote:

Can we get the location serived from lat/long coordinates?


Yes, assuming you get accurate latitude and longitude, so you're back
to square 1.

ChrisA




Dear Freelance,

Thank you for your interest in MaxMind Web Services. We have set up a 
demo account which includes the following web service(s):


GeoIP City Demo (1000 lookups available)
Usage:
http://geoip.maxmind.com/b?l=YOUR_LICENSE_KEY&i=24.24.24.24
Example scripts may be found at: http://dev.maxmind.com/geoip/web-services

GeoIP City with ISP and Organization Demo (1000 lookups available)
Usage:
http://geoip.maxmind.com/f?l=YOUR_LICENSE_KEY&i=24.24.24.24
Example scripts may be found at: http://dev.maxmind.com/geoip/web-services

Lets see if that would be of any help.
Please try it too you can request a demo trial of maxminds Geo web services.

--
What is now proved was at first only imagined!
--
http://mail.python.org/mailman/listinfo/python-list

Re: Beazley 4E P.E.R, Page29: Unicode

2013-07-14 Thread vek . m1234

thank you (both of you) I follow now :) 
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Beazley 4E P.E.R, Page29: Unicode

2013-07-14 Thread Steven D'Aprano

On Sat, 13 Jul 2013 20:09:31 -0700, vek.m1234 wrote:

> http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-
unicode
> 
> "directly writing a raw UTF-8 encoded string such as 'Jalape\xc3\xb1o'
> simply produces a nine-character string U+004A, U+0061, U+006C, U+0061,
> U+0070, U+0065, U+00C3, U+00B1, U+006F, which is probably not what you
> intended. This is because in UTF-8, the multi-byte sequence \xc3\xb1 is
> supposed to represent the single character U+00F1, not the two
> characters U+00C3 and U+00B1."

This demonstrates confusion of the fundamental concepts, while still 
accidentally stumbling across the basic facts right. No wonder it is 
confusing you, it confuses me too! :-)

Encoding does not generate a character string, it generates bytes. So the 
person you are quoting is causing confusion when he talks about an 
"encoded string", he should either make it clear he means a string of 
bytes, or not mention the word string at all. Either of these would work:

... a UTF-8 encoded byte-string b'Jalape\xc3\xb1o'

... UTF-8 encoded bytes b'Jalape\xc3\xb1o'

For older versions of Python (2.5 or older), unfortunately the b'' 
notation does not work, and you have to leave out the b.

Even better would be if Python did not conflate ASCII characters with 
bytes, and forced you to write byte strings like this:

... a UTF-8 encoded byte-string b'\x4a\x61\x6c\x61\x70\x65\xc3\xb1\x6f'

thus keeping the distinction between ASCII characters and bytes clear. 
But that would break backwards compatibility *way* too much, and so 
Python continues to conflate ASCII characters with bytes, even in Python 
3. But I digress.

The important thing here is that bytes b'Jalape\xc3\xb1o' consists of 
nine hexadecimal values, as shown above. Seven of them represent the 
ASCII characters Jalape and o and two of them are not ASCII. Their 
meaning depends on what encoding you are using.

(To be precise, even the meaning of the other seven bytes depends on the 
encoding. Fortunately, or unfortunately as the case may be, *most* but 
not all encodings use the same hex values for ASCII characters as ASCII 
itself does, so I will stop mentioning this and just pretend that 
character J always equals hex byte 4A. But now you know the truth.)

Since we're using the UTF-8 encoding, the two bytes \xc3\xb1 represent 
the character ñ, also known as LATIN SMALL LETTER N WITH TILDE. In other 
encodings, those two bytes will represent something different.

So, I presume that the original person's *intention* was to get a Unicode 
text string 'Jalapeño'. If they were wise in the ways of Unicode, they 
would write one of these:

'Jalape\N{LATIN SMALL LETTER N WITH TILDE}o'
'Jalape\u00F1o'
'Jalape\U00F1o'
'Jalape\xF1o'  # hex
'Jalape\361o'  # octal

and be happy. (In Python 2, they would need to prefix all of these with 
u, to use Unicode strings instead of byte strings.)

But alas they have been mislead by those who propagate myths, 
misunderstandings and misapprehensions about Unicode all over the 
Internet, and so they looked up ñ somewhere, discovered that it has the 
double-byte hex value c3b1 in UTF-8, and thought they could write this:

'Jalape\xc3\xb1o'

This does not do what they think it does. It creates a *text string*, a 
Unicode string, with NINE characters:

J a l a p e Ã ± o

Why? Because character Ã has ordinal value 195, which is c3 in hex, hence 
\xc3 is the character Ã; likewise \xb1 is the character ± which has 
ordinal value 177 (b1 in hex). And so they have discovered the wickedness 
that is mojibake.

http://en.wikipedia.org/wiki/Mojibake 

Instead, if they had started with a *byte-string*, and explicitly decoded 
it as UTF-8, they would have been fine:

# I manually encoded 'Jalapeño' to get the bytes below:
bytes = b'Jalape\xc3\xb1o'
print(bytes.decode('utf-8'))

> My original question was: Shouldn't this be 8 characters - not 9? He
> says: \xc3\xb1 is supposed to represent the single character. However
> after some interaction with fellow Pythonistas i'm even more confused.

Depends on the context. \xc3\xb1 could mean the Unicode string 
'\xc3\xb1' (in Python 2, written u'\xc3\xb1') or it could mean the byte-
string b'\xc3\xb1' (in Python 2.5 or older, written without the b).

As a string, \xc3\xb1 means two characters, with ordinal values 0xC3 (or 
decimal 195) and 0xB1 (or decimal 177), namely 'Ã' and '±'.

As bytes, \xc3\xb1 represent two bytes (well, duh), which could mean 
nearly anything:

- the 16-bit Big Endian integer 50097

- the 16-bit Little Endian integer 45507

- a 4x4 black and white bitmap

- the character '簽' (CJK UNIFIED IDEOGRAPH-7C3D) in Big5 encoded bytes

- '뇃' (HANGUL SYLLABLE NWAES) in UTF-16 (Big Endian) encoded bytes

- 'ñ' in UTF-8 encoded bytes

- the two characters 'Ã±' in Latin-1 encoded bytes

- '√±' in MacRoman encoded bytes

- 'Γ±' in ISO-8859-7 encoded bytes

and so forth. Without knowing the context, there is no way of telling 
what those two bytes represent, or whe

Re: hex dump w/ or w/out utf-8 chars

2013-07-14 Thread wxjmfauth

Le samedi 13 juillet 2013 21:02:24 UTC+2, Dave Angel a écrit :
> On 07/13/2013 10:37 AM, [email protected] wrote:
> 
> 
> 
> 
> 
> Fortunately for us, Python (in version 3.3 and later) and Pike did it 
> 
> right.  Some day the others may decide to do similarly.
> 
> 
> 

---
Possible but I doubt.
For a very simple reason, the latin-1 block: considered
and accepted today as beeing a Unicode design mistake.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Beazley 4E P.E.R, Page29: Unicode

2013-07-14 Thread vek . m1234

Hello Steven, a 'thank you' sounds insufficient and largely disproportionate to 
to the time and energy you spent in drafting a thoroughly comprehensive answer 
to my question. I've cross posted both answers to stackoverflow (with some 
minor formatting changes). I'll try to do something nice on your account.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [Python-ideas] float('∞')=float('inf')

2013-07-14 Thread Serhiy Storchaka


14.07.13 06:09, Chris Angelico написав(ла):

Incidents like this are a definite push, but my D&D campaign is
demanding my attention right now, so I haven't made the move.


Are you role-playing Chaos Mage [1]?

[1] http://www.dandwiki.com/wiki/Chaos_Mage_(3.5e_Class)

--
http://mail.python.org/mailman/listinfo/python-list

Re: [Python-ideas] float('∞')=float('inf')

2013-07-14 Thread Chris Angelico

On Sun, Jul 14, 2013 at 8:23 PM, Serhiy Storchaka  wrote:
> 14.07.13 06:09, Chris Angelico написав(ла):
>
>> Incidents like this are a definite push, but my D&D campaign is
>> demanding my attention right now, so I haven't made the move.
>
>
> Are you role-playing Chaos Mage [1]?
>
> [1] http://www.dandwiki.com/wiki/Chaos_Mage_(3.5e_Class)

I should probably try that some time. Though in our D&D parties, I
tend to land the role of Dungeon Master by default... still looking
for people to DM some more campaigns.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: hex dump w/ or w/out utf-8 chars

2013-07-14 Thread Steven D'Aprano

On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote:

> For a very simple reason, the latin-1 block: considered and accepted
> today as beeing a Unicode design mistake.

Latin-1 (also known as ISO-8859-1) was based on DEC's "Multinational 
Character Set", which goes back to 1983. ISO-8859-1 was first published 
in 1985, and was in use on Commodore computers the same year.

The concept of Unicode wasn't even started until 1987, and the first 
draft wasn't published until the end of 1990. Unicode wasn't considered 
ready for production use until 1991, six years after Latin-1 was already 
in use in people's computers.

-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

How to internationalize a python script? (i18n)

2013-07-14 Thread gialloporpora


Hello,
I am trying to internationalize a script. First I have tried with a 
little script to understand how it works, but unfortunately, it doesn't.


I have followed instruction in this page:
http://docs.python.org/2/library/i18n.html

I have created my script, marked strings with the _() function, 
installed it in the main namespace, but when I try to load the locale 
file it said methat locale is unavailable:


IOError: [Errno 2] No translation file found for domain: 'helloi18n'

C:\dropbox\Public\helloi18n>

I have created the pot file with pygettext, localized it with poedit and 
compiled the related .mo file.


As reported in this page:
http://docs.python.org/2/library/gettext.html
«Bind the domain to the locale directory localedir. More concretely, 
gettext will look for binary .mo files for the given domain using the 
path (on Unix): localedir/language/LC_MESSAGES/domain.mo, where 
languages is searched for in the environment variables LANGUAGE, LC_ALL, 
LC_MESSAGES, and LANG respectively.»


I have put my .mo file in locale\it\LC_MESSAGES naming it helloi18n.mo

here are all my files:
https://dl.dropboxusercontent.com/u/4400966/helloi18n.tar.gz

This is the code of the helloi18n.py file:


# Simple script to use internationalization (i18n)
import gettext
import os


LOCALE_DIR = os.path.join(os.path.abspath('.'), 'locale')
print LOCALE_DIR
print "---"
a=gettext.find('helloi18n', LOCALE_DIR, 'it')
print a
gettext.install('helloi18n', localedir=LOCALE_DIR, unicode=1)
gettext.textdomain ('helloi18n')
gettext.translation('helloi18n', LOCALE_DIR, 'it')


if __name__ == '__main__':
print _('Hello world!')
print (_('My first localized python script'))




Somebody could help me?


Sandro


--
http://mail.python.org/mailman/listinfo/python-list

Re: hex dump w/ or w/out utf-8 chars

2013-07-14 Thread wxjmfauth

Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit :
> On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote:
> 
> 
> 
> > For a very simple reason, the latin-1 block: considered and accepted
> 
> > today as beeing a Unicode design mistake.
> 
> 
> 
> Latin-1 (also known as ISO-8859-1) was based on DEC's "Multinational 
> 
> Character Set", which goes back to 1983. ISO-8859-1 was first published 
> 
> in 1985, and was in use on Commodore computers the same year.
> 
> 
> 
> The concept of Unicode wasn't even started until 1987, and the first 
> 
> draft wasn't published until the end of 1990. Unicode wasn't considered 
> 
> ready for production use until 1991, six years after Latin-1 was already 
> 
> in use in people's computers.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

--

"Unicode" (in fact iso-14xxx) was not created in one
night (Deus ex machina).

What's count today is this:

>>> timeit.repeat("a = 'hundred'; 'x' in a")
[0.11785943134991479, 0.09850454944486256, 0.09761604599423179]
>>> timeit.repeat("a = 'hundreœ'; 'x' in a")
[0.23955250303158593, 0.2195812612416752, 0.22133896997401692]
>>> 
>>> 
>>> sys.getsizeof('d')
26
>>> sys.getsizeof('œ')
40
>>> sys.version
'3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)]'

jmf


-- 
http://mail.python.org/mailman/listinfo/python-list

Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

2013-07-14 Thread Chris Angelico

On Sun, Jul 14, 2013 at 11:44 PM,   wrote:
> Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit :
>> On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote:
>>
>>
>>
>> > For a very simple reason, the latin-1 block: considered and accepted
>>
>> > today as beeing a Unicode design mistake.
>>
>>
>>
>> Latin-1 (also known as ISO-8859-1) was based on DEC's "Multinational
>>
>> Character Set", which goes back to 1983. ISO-8859-1 was first published
>>
>> in 1985, and was in use on Commodore computers the same year.
>>
>>
>>
>> The concept of Unicode wasn't even started until 1987, and the first
>>
>> draft wasn't published until the end of 1990. Unicode wasn't considered
>>
>> ready for production use until 1991, six years after Latin-1 was already
>>
>> in use in people's computers.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Steven
>
> --
>
> "Unicode" (in fact iso-14xxx) was not created in one
> night (Deus ex machina).
>
> What's count today is this:
>
 timeit.repeat("a = 'hundred'; 'x' in a")
> [0.11785943134991479, 0.09850454944486256, 0.09761604599423179]
 timeit.repeat("a = 'hundreœ'; 'x' in a")
> [0.23955250303158593, 0.2195812612416752, 0.22133896997401692]

 sys.getsizeof('d')
> 26
 sys.getsizeof('œ')
> 40
 sys.version
> '3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit 
> (Intel)]'

jmf has raised an interesting point. Some string membership operations
do seem oddly slow.

# Get ourselves a longish ASCII string with no duplicates - escape
apostrophe and backslash for code later on
>>> asciichars=''.join(chr(i) for i in 
>>> range(32,128)).replace("\\",r"\\").replace("'",r"\'")
>>> haystack=[
("ASCII",asciichars+"\u0001"),
("BMP",asciichars+"\u1234"),
("SMP",asciichars+"\U00012345"),
]
>>> needle=[
("ASCII","\u0002"),
("BMP","\u1235"),
("SMP","\U00012346"),
]
>>> useset=[
("",""),
(", as set","; a=set(a)"),
]
>>> for time,desc in sorted((min(timeit.repeat("'%s' in 
>>> a"%n,("a='%s'"%h)+s)),"%s in %s%s"%(nd,hd,sd)) for nd,n in needle for hd,h 
>>> in haystack for sd,s in useset):
print("%.10f %s"%(time,desc))

0.1765129367 ASCII in ASCII, as set
0.1767096097 BMP in SMP, as set
0.1778647845 ASCII in BMP, as set
0.1785266004 BMP in BMP, as set
0.1789093307 SMP in SMP, as set
0.1790431465 SMP in BMP, as set
0.1796504863 BMP in ASCII, as set
0.1803854959 SMP in ASCII, as set
0.1810674262 ASCII in SMP, as set
0.1817367850 SMP in BMP
0.1884555160 SMP in ASCII
0.2132371572 BMP in ASCII
0.3137454621 ASCII in ASCII
0.4472624314 BMP in BMP
0.6672795006 SMP in SMP
0.7493052888 ASCII in BMP
0.9261783271 ASCII in SMP
0.9865787412 BMP in SMP

(In separate testing I ascertained that it makes little difference
whether the character is absent from the string or is the last
character in it. Presumably the figures would be lower if the
character is at the start of the string, but this is not germane to
this discussion.)

Set membership is faster than string membership, though marginally on
something this short. If the needle is wider than the haystack, it
obviously can't be present, so a false return comes back at the speed
of a set check. Otherwise, an actual search must be done. Searching
for characters in strings of the same width gets slower as the strings
get larger in memory (unsurprising). What I'm seeing of the top-end
results, though, is that the search for a narrower string in a wider
one is quite significantly slower.

I don't know of an actual proven use-case for this, but it seems
likely to happen (eg you take user input and want to know if there are
any HTML-sensitive characters in it, so you check ('<' in string or
'&' in string), for instance). The question is, is it worth
constructing an "expanded string" at the haystack's width prior to
doing the search?

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to internationalize a python script? (i18n)

2013-07-14 Thread gialloporpora


Risposta al messaggio di gialloporpora :


gettext.translation('helloi18n', LOCALE_DIR, 'it')


Ok, I have, with a little help of my friend, found the issue. The 
language code must be passed as a list not as a string.


Sorry.
Sandro





--
*Thunderbird come evitare il circolo vizioso “Re: R:” negli oggetti 
delle mail * - http://bit.ly/19yMSsZ

--
http://mail.python.org/mailman/listinfo/python-list

List comp help

2013-07-14 Thread Joseph L. Casale

I have a dict of lists. I need to create a list of 2 tuples, where each tuple 
is a key from
the dict with one of the keys list items.

my_dict = {
'key_a': ['val_a', 'val_b'],
'key_b': ['val_c'],
'key_c': []
}
[(k, x) for k, v in my_dict.items() for x in v]

This works, but I need to test for an empty v like the last key, and create one 
tuple ('key_c', None).
Anyone know the trick to reorganize this to accept the test for an empty v and 
add the else?

Thanks!
jlc
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: List comp help

2013-07-14 Thread Chris Angelico

On Mon, Jul 15, 2013 at 3:10 AM, Joseph L. Casale
 wrote:
> I have a dict of lists. I need to create a list of 2 tuples, where each tuple 
> is a key from
> the dict with one of the keys list items.
>
> my_dict = {
> 'key_a': ['val_a', 'val_b'],
> 'key_b': ['val_c'],
> 'key_c': []
> }
> [(k, x) for k, v in my_dict.items() for x in v]
>
> This works, but I need to test for an empty v like the last key, and create 
> one tuple ('key_c', None).
> Anyone know the trick to reorganize this to accept the test for an empty v 
> and add the else?

Yeah, it's remarkably easy too! Try this:

[(k, x) for k, v in my_dict.items() for x in v or [None]]

An empty list counts as false, so the 'or' will then take the second
option, and iterate over the one-item list with None in it.

Have fun!

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Editor Ergonomics [was: Important features for editors]

2013-07-14 Thread Giorgos Tzampanakis

On 2013-07-12, Steven D'Aprano wrote:

> On Thu, 11 Jul 2013 09:45:33 -0400, Roy Smith wrote:
>
>> In article <[email protected]>,
>>  [email protected] wrote:
>> 
>>> On Wednesday, July 10, 2013 2:17:12 PM UTC+10, Xue Fuqiao wrote:
>>> 
>>> > * It is especially handy for selecting and deleting text.
>>> 
>>> When coding I never use a mouse to select text regions or to delete
>>> text.
>>> 
>>> These operations I do using just the keyboard.
>> 
>> For good typists, there is high overhead to getting your hands oriented
>> on the keyboard (that's why the F and J keys have little bumps).  So,
>> any time you move your hand from the keyboard to the mouse, you pay a
>> price.
>> 
>> The worst thing is to constantly be alternating between mouse actions
>> and keyboard actions.  You spend all your time getting your fingers
>> hands re-oriented.  That's slow.
>
> Big deal. I am utterly unconvinced that raw typing speed is even close to 
> a bottleneck when programming. Data entry and transcribing from (say) 
> dictated text, yes. Coding, not unless you are a one-fingered hunt-and-
> peek typist. The bottleneck is not typing speed but thinking speed: 
> thinking about program design and APIs, thinking about data structures 
> and algorithms, debugging, etc.

Typing time is definitely a small portion of coding time. However,
since I learned touch typing I have found that I can work more hours
without getting tired. It used to be that the repetitive up-down motion of
the head was quickly leading to headaches and general tiredness.

-- 
Real (i.e. statistical) tennis and snooker player rankings and ratings:
http://www.statsfair.com/ 
-- 
http://mail.python.org/mailman/listinfo/python-list

RE: List comp help

2013-07-14 Thread Joseph L. Casale

> Yeah, it's remarkably easy too! Try this:
>
> [(k, x) for k, v in my_dict.items() for x in v or [None]]
>
> An empty list counts as false, so the 'or' will then take the second option, 
> and iterate over the one-item list with > > None in it.

Right, I overlooked that!

Much appreciated,
jlc
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: RE Module Performance

2013-07-14 Thread 88888 Dihedral

On Saturday, July 13, 2013 1:37:46 PM UTC+8, Steven D'Aprano wrote:
> On Fri, 12 Jul 2013 13:58:29 -0400, Devyn Collier Johnson wrote:
> 
> 
> 
> > I plan to spend some time optimizing the re.py module for Unix systems.
> 
> > I would love to amp up my programs that use that module.
> 
> 
> 
> In my experience, often the best way to optimize a regex is to not use it 
> 
> at all.
> 
> 
> 
> [steve@ando ~]$ python -m timeit -s "import re" \
> 
> > -s "data = 'a'*100+'b'" \
> 
> > "if re.search('b', data): pass"
> 
> 10 loops, best of 3: 2.77 usec per loop
> 
> 
> 
> [steve@ando ~]$ python -m timeit -s "data = 'a'*100+'b'" \
> 
> > "if 'b' in data: pass"
> 
> 100 loops, best of 3: 0.219 usec per loop
> 
> 
> 
> In Python, we often use plain string operations instead of regex-based 
> 
> solutions for basic tasks. Regexes are a 10lb sledge hammer. Don't use 
> 
> them for cracking peanuts.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

OK, lets talk about the indexed search algorithms of 
a character streamor strig which can be buffered and
indexed randomly for RW operations but faster in sequential 
block RW operations after some pre-processing.

This was solved long time ago in the suffix array or 
suffix tree part and summarized in the famous BWT paper in 199X.

Do we want volunteers to speed up 
search operations in the string module in Python?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: List comp help

2013-07-14 Thread rurpy

On 07/14/2013 11:16 AM, Chris Angelico wrote:
> On Mon, Jul 15, 2013 at 3:10 AM, Joseph L. Casale
>  wrote:
>> I have a dict of lists. I need to create a list of 2 tuples, where each 
>> tuple is a key from
>> the dict with one of the keys list items.
>>
>> my_dict = {
>> 'key_a': ['val_a', 'val_b'],
>> 'key_b': ['val_c'],
>> 'key_c': []
>> }
>> [(k, x) for k, v in my_dict.items() for x in v]
>>
>> This works, but I need to test for an empty v like the last key, and create 
>> one tuple ('key_c', None).
>> Anyone know the trick to reorganize this to accept the test for an empty v 
>> and add the else?
> 
> Yeah, it's remarkably easy too! Try this:
> 
> [(k, x) for k, v in my_dict.items() for x in v or [None]]
> 
> An empty list counts as false, so the 'or' will then take the second
> option, and iterate over the one-item list with None in it.

Or more simply:

  [(k, v or None) for k, v in my_dict.items()]

This assumes that all the values in my_dict are lists, and not
other false values like 0, which would also be replaced by None. 
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: List comp help

2013-07-14 Thread rurpy

On Sunday, July 14, 2013 12:32:34 PM UTC-6, [email protected] wrote:
> Or more simply:
>   [(k, v or None) for k, v in my_dict.items()]

Too simply :-(  Didn't read the op carefully enough.  Sorry.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: GeoIP2 for retrieving city and region ?

2013-07-14 Thread Tim Chase

On 2013-07-13 16:57, Michael Torrie wrote:
> On 07/13/2013 12:23 PM, Νικόλας wrote:
> > Do you know a way of implementing anyone of these methods to a
> > script?
> 
> Yes.  Modern browsers all support a location API in the browser for
> javascript. 

And the good browsers give the user the option to disclose this
information or not (and, as I mentioned elsewhere on this thread,
even lie about where you are such as with the Geolocater plugin[1] for
FF).

Some of us value the modicum of privacy that we receive by not being
locatable by IP address.

-tkc

[1]
https://addons.mozilla.org/en-us/firefox/addon/geolocater/
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ideal way to separate GUI and logic?

2013-07-14 Thread fronagzen

Thanks for all the responses!

So as a general idea, I should at the very least separate the GUI from the 
program logic by defining the logic as a function, correct? And the next level 
of separation is to define the logic as a class in one or more separate files, 
and then import it to the file with the GUI, correct?

My next question is, to what degree should I 'slice' my logic into functions? 
How small or how large should one function be, as a rule of thumb?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ideal way to separate GUI and logic?

2013-07-14 Thread fronagzen

Thanks for all the responses!

So as a general idea, I should at the very least separate the GUI from the 
program logic by defining the logic as a function, correct? And the next level 
of separation is to define the logic as a class in one or more separate files, 
and then import it to the file with the GUI, correct?

My next question is, to what degree should I 'slice' my logic into functions? 
How small or how large should one function be, as a rule of thumb?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ideal way to separate GUI and logic?

2013-07-14 Thread Joel Goldstick

On Sun, Jul 14, 2013 at 8:25 PM,  wrote:

> Thanks for all the responses!
>
> So as a general idea, I should at the very least separate the GUI from the
> program logic by defining the logic as a function, correct? And the next
> level of separation is to define the logic as a class in one or more
> separate files, and then import it to the file with the GUI, correct?
>
> My next question is, to what degree should I 'slice' my logic into
> functions? How small or how large should one function be, as a rule of
> thumb?
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Others may differ.  I think you should just write the code.  In actually
doing that you will learn the pitfalls of how you have divided up your
logic.  Writing code isn't all theory.  It takes practice, and since the
days of The Mythical Man-Month, it has been well understood that you always
end up throwing away the first system anyway.  It has to be built to truly
understand what you think you want to create, but in the learning, you
realize that its easier and better to start more or less from scratch
rather than try to fix the first concept.

-- 
Joel Goldstick
http://joelgoldstick.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ideal way to separate GUI and logic?

2013-07-14 Thread Roy Smith

In article ,
 Joel Goldstick  wrote:

> Writing code isn't all theory.  It takes practice, and since the days 
> of The Mythical Man-Month, it has been well understood that you 
> always end up throwing away the first system anyway.

If I may paraphrase Brooks, "Plan to throw the first one away, because 
it's going to suck.  Then, the next one you write to replace it will 
also suck because it's going to suffer from Second System Effect" :-)

BTW, anybody who enjoyed The Mythical Man-Month should also read Ed 
Yourdon's Death March.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ideal way to separate GUI and logic?

2013-07-14 Thread Steven D'Aprano

On Sun, 14 Jul 2013 17:25:32 -0700, fronagzen wrote:

> My next question is, to what degree should I 'slice' my logic into
> functions? How small or how large should one function be, as a rule of
> thumb?

I aim to keep my functions preferably below a dozen lines (excluding the 
doc string), and definitely below a page.

But more important than size is functionality. Every function should do 
*one thing*. If that thing can be divided into two or more "sub-things" 
then they should be factored out into separate functions, which I then 
call. Possibly private, internal only functions.

-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

what thread-synch mech to use for clean exit from a thread

2013-07-14 Thread Gildor Oronar

A currency exchange thread updates exchange rate once a minute. If the 
thread faield to update currency rate for 5 hours, it should inform 
main() for a clean exit. This has to be done gracefully, because main() 
could be doing something delicate.


I, a newbie, read all the thread sync tool, and wasn't sure which one to 
use. In fact I am not sure if there is a need of thread sync, because 
there is no racing cond. I thought of this naive way:


class CurrencyExchange():
   def __init__(in_case_callback):
  this.callback = in_case_callback
   def __run__():
  while time.time() - self.rate_timestamp < 5*3600:
 ... # update exchange rate
 if success:
self.rate_timestamp == time.time()
 time.sleep(60)
  this.callback() # rate not updated 5 hours, a crisis

def main():
   def callback()
  Go_On = False

   agio = CurrencyExchange(in_case = callback)
   agio.start()

   Go_On = True
   while Go_On:
  do_something_delicate(rate_supplied_by=agio)

As you can see, if there is no update of currency rate for 5 hours, the 
CurrencyExchange object calls the callback, which prevents main() from 
doing the next delicate_thing, but do not interrupt the current 
delicate_thing.


This seems OK, but doesn't look pythonic -- replacing callback() with a 
lambda doesn't help much, it still look naive. What is the professional 
way in this case?


Thanks in advance!
--
http://mail.python.org/mailman/listinfo/python-list

Re: what thread-synch mech to use for clean exit from a thread

2013-07-14 Thread zhangweiwu

On Monday, July 15, 2013 10:27:45 AM UTC+8, Gildor Oronar wrote:
> What is the professional way in this case?

Hi. I am not a professional neither but I think a professional does this:

class CurrencyExchange():
def __init__(in_case_callback):
   this.callback = in_case_callback
def __run__():
   while time.time() - self.rate_timestamp < 5*3600:
  ... # update exchange rate
  if success:
 self.rate_timestamp == time.time()
  time.sleep(60)

def main():

agio = CurrencyExchange(in_case = callback)
agio.start()
while agio.is_alive():
   do_something_delicate(rate_supplied_by=agio) 

Notice even if agio is no longer alive, it can still supply exchange rate for 
the last delicate_thing, only that it no longer updates punctually. This is 
semantic wrong, and I think it is the fault of python: how can something dead 
execute its method? In the API, thread.is_alive() should be renamed to 
thread.is_activate_and_on_his_own()

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: what thread-synch mech to use for clean exit from a thread

2013-07-14 Thread Steven D'Aprano

On Mon, 15 Jul 2013 10:27:45 +0800, Gildor Oronar wrote:

> A currency exchange thread updates exchange rate once a minute. If the
> thread faield to update currency rate for 5 hours, it should inform
> main() for a clean exit. This has to be done gracefully, because main()
> could be doing something delicate.
> 
> I, a newbie, read all the thread sync tool, and wasn't sure which one to
> use. In fact I am not sure if there is a need of thread sync, because
> there is no racing cond. I thought of this naive way:
> 
> class CurrencyExchange():
> def __init__(in_case_callback):
>this.callback = in_case_callback

You need to declare the instance parameter, which is conventionally 
called "self" not "this". Also, your class needs to inherit from Thread, 
and critically it MUST call the superclass __init__.

So:

class CurrencyExchange(threading.Thread):
def __init__(self, in_case_callback):
super(CurrencyExchange, self).__init__()
self.callback = in_case_callback

But I'm not sure that a callback is the right approach here. See below.

> def __run__():

Likewise, you need a "self" parameter.

>while time.time() - self.rate_timestamp < 5*3600:
>   ... # update exchange rate
>   if success:
>  self.rate_timestamp == time.time()
>   time.sleep(60)
>this.callback() # rate not updated 5 hours, a crisis

I think that a cleaner way is to just set a flag on the thread instance. 
Initiate it with:

self.updates_seen = True

in the __init__ method, and then add this after the while loop:

self.updates_seen = False

> def main():
> def callback()
>Go_On = False

I don't believe this callback will work, because it will simply create a 
local variable call "Go_On", not change the non-local variable.

In Python 3, you can use the nonlocal keyword to get what you want, but I 
think a better approach is with a flag on the thread.

> agio = CurrencyExchange(in_case = callback) 
> agio.start()
> 
> Go_On = True
> while Go_On:
>do_something_delicate(rate_supplied_by=agio)

Change to:

while agio.updates_seen:
do_something_delicate...

-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: what thread-synch mech to use for clean exit from a thread

2013-07-14 Thread Steven D'Aprano

Oh, I forgot another comment...

On Mon, 15 Jul 2013 03:04:14 +, Steven D'Aprano wrote:

> On Mon, 15 Jul 2013 10:27:45 +0800, Gildor Oronar wrote:

>>while time.time() - self.rate_timestamp < 5*3600:
>>   ... # update exchange rate
>>   if success:
>>  self.rate_timestamp == time.time()
>>   time.sleep(60)
>>this.callback() # rate not updated 5 hours, a crisis
> 
> I think that a cleaner way is to just set a flag on the thread instance.
> Initiate it with:
> 
> self.updates_seen = True
> 
> in the __init__ method, and then add this after the while loop:
> 
> self.updates_seen = False

Sorry, I forgot to mention... I assume that the intention is that if the 
thread hasn't seen any updates for five hours, it should set the flag, 
and then *keep going*. Perhaps the rate will start updating again later.

If the intention is to actually close the thread, then there's no reason 
for an extra flag. Just exit the run() method normally, the thread will 
die, and you can check the thread's status with the is_alive() method.

-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

SMITHSONIAN HAS IT'S LAST WORDS...

2013-07-14 Thread LOUZY


===
A TOUCHY SUBJECT...
===
>
A WILY THRINAXODON SUED THE SMITHSONIAN FIVE HUNDRED DOLLARS FOR 
SUPPRESSION OF FREEDOM OF EXPRESSION.

>
"This is a blow to evolutionism," SAID RICHARD DAWKINS.
>
ONE WHOM THRINAXODON HAS HAD SEVERAL *long* RUNNING FEUDS OVER THE PAST 
40 YEARS.

>
THE SMITHSONIAN IS BEING TORN DOWN.
>
THE SPECIMENS BURNED, BOOKS REWRITTEN, etc.
>
"This never happened with Ed Conrad," SAID BARACK OBAMA.
>
EVOLUTIONISTS ALL OVER THE WORLD GET SUNK IN TEARS AS ONE OF THE MAJOR 
CORPORATIONS GET SHUT DOWN...

>

>
I KNOW, I KNOW...YOU NEED A RESOURCE FOR THIS SCIENCE.
>
WELL TYPE IN news://sci.bio.paleontology, news://sci.skeptic, 
news://dc.smithsonian, etc ON YOUR WEB BROWSER.

>
===
>
http://thrinaxodon.wordpress.com/
>
===
>
THRINAXODON IS NOW ON TWITTER.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

2013-07-14 Thread Terry Reedy

On 7/14/2013 10:56 AM, Chris Angelico wrote:

On Sun, Jul 14, 2013 at 11:44 PM,   wrote:

timeit.repeat("a = 'hundred'; 'x' in a")

[0.11785943134991479, 0.09850454944486256, 0.09761604599423179]

timeit.repeat("a = 'hundreœ'; 'x' in a")

[0.23955250303158593, 0.2195812612416752, 0.22133896997401692]

sys.version

'3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)]'

As issue about finding stings in strings was opened last September and, 
as reported on this list, fixes were applied about last March. As I 
remember, some but not all of the optimizations were applied to 3.3. 
Perhaps some were applied too late for 3.3.1 (3.3.2 is 3.3.1 with some 
emergency patches to correct regressions).

Python 3.4.0a2:
>>> import timeit
>>> timeit.repeat("a = 'hundred'; 'x' in a")
[0.17396483610667152, 0.16277956641670813, 0.1627937074749941]
>>> timeit.repeat("a = 'hundreo'; 'x' in a")
[0.18441108179403187, 0.16277311071618783, 0.16270517215355085]

The difference is gone, again, as previously reported.

jmf has raised an interesting point. Some string membership operations
do seem oddly slow.

He raised it a year ago and action was taken.

# Get ourselves a longish ASCII string with no duplicates - escape
apostrophe and backslash for code later on

asciichars=''.join(chr(i) for i in 
range(32,128)).replace("\\",r"\\").replace("'",r"\'")
haystack=[

("ASCII",asciichars+"\u0001"),
("BMP",asciichars+"\u1234"),
("SMP",asciichars+"\U00012345"),
]

needle=[

("ASCII","\u0002"),
("BMP","\u1235"),
("SMP","\U00012346"),
]

useset=[

("",""),
(", as set","; a=set(a)"),
]

for time,desc in sorted((min(timeit.repeat("'%s' in a"%n,("a='%s'"%h)+s)),"%s in 
%s%s"%(nd,hd,sd)) for nd,n in needle for hd,h in haystack for sd,s in useset):

print("%.10f %s"%(time,desc))

0.1765129367 ASCII in ASCII, as set
0.1767096097 BMP in SMP, as set
0.1778647845 ASCII in BMP, as set
0.1785266004 BMP in BMP, as set
0.1789093307 SMP in SMP, as set
0.1790431465 SMP in BMP, as set
0.1796504863 BMP in ASCII, as set
0.1803854959 SMP in ASCII, as set
0.1810674262 ASCII in SMP, as set

Much of this time is overhead; 'pass' would not run too much faster.

0.1817367850 SMP in BMP
0.1884555160 SMP in ASCII
0.2132371572 BMP in ASCII

For these, 3.3 does no searching because it knows from the internal char 
kind that the answer is No without looking.

0.3137454621 ASCII in ASCII
0.4472624314 BMP in BMP
0.6672795006 SMP in SMP
0.7493052888 ASCII in BMP
0.9261783271 ASCII in SMP
0.9865787412 BMP in SMP

...

Set membership is faster than string membership, though marginally on
something this short. If the needle is wider than the haystack, it
obviously can't be present, so a false return comes back at the speed
of a set check.

Jim ignores these cases where 3.3+ uses the information about the max 
codepoint to do the operation much faster than in 3.2.

Otherwise, an actual search must be done. Searching
for characters in strings of the same width gets slower as the strings
get larger in memory (unsurprising). What I'm seeing of the top-end
results, though, is that the search for a narrower string in a wider
one is quite significantly slower.

50% longer is not bad, even

I don't know of an actual proven use-case for this, but it seems
likely to happen (eg you take user input and want to know if there are
any HTML-sensitive characters in it, so you check ('<' in string or
'&' in string), for instance).

In my editing of code, I nearly always search for words or long names.

 The question is, is it worth

constructing an "expanded string" at the haystack's width prior to
doing the search?

I would not make any assumptions about what Python does or does not do 
without checking the code. All I know is that Python uses a modified 
version of one of the pre-process and skip-forward algorithms 
(Boyer-Moore?, Knuth-Pratt?, I forget). These are designed to work 
efficiently with needles longer than 1 char, and indeed may work better 
with longer needles. Searching for an single char in n chars is O(n). 
Searching for a len m needle is potentially O(m*n) and the point of the 
fancy algorithms is make all searches as close to O(n) as possible.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

2013-07-14 Thread Chris Angelico

On Mon, Jul 15, 2013 at 2:18 PM, Terry Reedy  wrote:
> On 7/14/2013 10:56 AM, Chris Angelico wrote:
> As issue about finding stings in strings was opened last September and, as
> reported on this list, fixes were applied about last March. As I remember,
> some but not all of the optimizations were applied to 3.3. Perhaps some were
> applied too late for 3.3.1 (3.3.2 is 3.3.1 with some emergency patches to
> correct regressions).

D'oh. I knew there was something raised and solved regarding that, but
I forgot to go check a 3.4 alpha to see if it exhibited the same.
Whoops. My bad. Sorry!

> Python 3.4.0a2:
 import timeit
>
 timeit.repeat("a = 'hundred'; 'x' in a")
> [0.17396483610667152, 0.16277956641670813, 0.1627937074749941]
 timeit.repeat("a = 'hundreo'; 'x' in a")
> [0.18441108179403187, 0.16277311071618783, 0.16270517215355085]
>
> The difference is gone, again, as previously reported.

Yep, that looks exactly like I would have hoped it would.

>> 0.1765129367 ASCII in ASCII, as set
>
> Much of this time is overhead; 'pass' would not run too much faster.
>
>> 0.1817367850 SMP in BMP
>> 0.1884555160 SMP in ASCII
>> 0.2132371572 BMP in ASCII
>
> For these, 3.3 does no searching because it knows from the internal char
> kind that the answer is No without looking.

Yeah, I mainly included those results so I could say to jmf "Look, FSR
allows some string membership operations to be, I kid you not, as fast
as set operations!".

>> 0.3137454621 ASCII in ASCII
>> 0.4472624314 BMP in BMP
>> 0.6672795006 SMP in SMP
>> 0.7493052888 ASCII in BMP
>> 0.9261783271 ASCII in SMP
>> 0.9865787412 BMP in SMP
>
>> Otherwise, an actual search must be done. Searching
>> for characters in strings of the same width gets slower as the strings
>> get larger in memory (unsurprising). What I'm seeing of the top-end
>> results, though, is that the search for a narrower string in a wider
>> one is quite significantly slower.
>
> 50% longer is not bad, even

Hard to give an estimate; my first tests were the ASCII in ASCII and
ASCII in BMP, which then looked more like 2:1 time. However, rescaling
the needle to BMP makes it more like the 50% you're quoting, so yes,
it's not as much as I thought.

In any case, the most important thing to note is: 3.4 has already
fixed this, ergo jmf should shut up about it. And here I thought I
could credit him with a second actually-useful report...

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: List comp help

2013-07-14 Thread Terry Reedy


On 7/14/2013 1:10 PM, Joseph L. Casale wrote:

I have a dict of lists. I need to create a list of 2 tuples, where each tuple 
is a key from
the dict with one of the keys list items.

my_dict = {
 'key_a': ['val_a', 'val_b'],
 'key_b': ['val_c'],
 'key_c': []
}
[(k, x) for k, v in my_dict.items() for x in v]


The order of the tuples in not deterministic unless you sort, so if 
everything is hashable, a set may be better.



This works, but I need to test for an empty v like the last key, and create one 
tuple ('key_c', None).
Anyone know the trick to reorganize this to accept the test for an empty v and 
add the else?


When posting code, it is a good idea to includes the expected or desired 
answer in code as well as text.


pairs = {(k, x) for k, v in my_dict.items() for x in v or [None]}
assert pairs == {('key_a', 'val_a'), ('key_a', 'val_b'),
   ('key_b', 'val_c'), ('key_c', None)}


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Understanding other people's code

Re: Beazley 4E P.E.R, Page29: Unicode

Re: Beazley 4E P.E.R, Page29: Unicode

Re: GeoIP2 for retrieving city and region ?

Re: Beazley 4E P.E.R, Page29: Unicode

Re: Beazley 4E P.E.R, Page29: Unicode

Re: hex dump w/ or w/out utf-8 chars

Re: Beazley 4E P.E.R, Page29: Unicode

Re: [Python-ideas] float('∞')=float('inf')

Re: [Python-ideas] float('∞')=float('inf')

Re: hex dump w/ or w/out utf-8 chars

How to internationalize a python script? (i18n)

Re: hex dump w/ or w/out utf-8 chars

Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

Re: How to internationalize a python script? (i18n)

List comp help

Re: List comp help

Re: Editor Ergonomics [was: Important features for editors]

RE: List comp help

Re: RE Module Performance

Re: List comp help

Re: List comp help

Re: GeoIP2 for retrieving city and region ?

Re: Ideal way to separate GUI and logic?

Re: Ideal way to separate GUI and logic?

Re: Ideal way to separate GUI and logic?

Re: Ideal way to separate GUI and logic?

Re: Ideal way to separate GUI and logic?

what thread-synch mech to use for clean exit from a thread

Re: what thread-synch mech to use for clean exit from a thread

Re: what thread-synch mech to use for clean exit from a thread

Re: what thread-synch mech to use for clean exit from a thread

SMITHSONIAN HAS IT'S LAST WORDS...

Re: Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

Re: Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)

Re: List comp help

36 matches

Site Navigation

Mail list logo

Footer information