Re: Understanding other people's code
> Basically the problem is I am new to the language and this was clearly > written by someone who at the moment is far better at it than I am! Sure, as a beginner, yes, but also it sounds like the programmer didn't document it much at all, and that doesn't help you. I bet s/he didn't always use very human readable names for objects/methods/classes, either, eh? > I'm starting to get pretty worried about my lack of overall progress and so I > wondered if anyone out there had some tips and techniques for understanding > other peoples code. There has to be 10/15 different scripts with at least 10 > functions in each file I would say. Unless the programmer was really super spaghetti coding, I would think that there would be some method to the madness, and that the 10-15 scripts each have some specific kind of purpose. The first thing, I'd think (and having not seen your codebase) would be to sketch out what those scripts do, and familiarize yourself with their names. Did the coder use this form for importing from modules? from client_utils import * If so, that's going to make your life much harder, because all of the names of the module will now be available to the script it was imported into, and yet they are not defined in that script. If s/he had written: import client_utils Than at least you would expect lines like this in the script you're looking at: customer_name = client_utils.GetClient() Or, if the naming is abstruse, at very least: cn = client_utils.GC() It's awful, but at least then you know that GC() is a function within the client_utils.py script and you don't have to go searching for it. If s/he did use "from module import *", then maybe it'd be worth it to re-do all the imports in the "import module" style, which will break everything, but then force you to go through all the errors and make the names like module.FunctionName() instead of just FunctionName(). Some of that depends on how big this project is, of course. > Literally any idea will help, pen and paper, printing off all the code and > doing some sort of highlighting session - anything! What tools are you using to work on this code? Do you have an IDE that has a "browse to" function that allows you to click on a name and see where in the code above it was defined? Or does it have UML or something like that? -- http://mail.python.org/mailman/listinfo/python-list
Re: Beazley 4E P.E.R, Page29: Unicode
On 7/13/2013 11:09 PM, [email protected] wrote: http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-unicode Is this David Beazley? (You referred to 'DB' later.) "directly writing a raw UTF-8 encoded string such as 'Jalape\xc3\xb1o' simply produces a nine-character string U+004A, U+0061, U+006C, U+0061, U+0070, U+0065, U+00C3, U+00B1, U+006F, which is probably not what you intended.This is because in UTF-8, the multi- byte sequence \xc3\xb1 is supposed to represent the single character U+00F1, not the two characters U+00C3 and U+00B1." My original question was: Shouldn't this be 8 characters - not 9? He says: \xc3\xb1 is supposed to represent the single character. However after some interaction with fellow Pythonistas i'm even more confused. With reference to the above para: 1. What does he mean by "writing a raw UTF-8 encoded string"?? As much respect as I have for DB, I think this is an impossible to parse confused statement, fueled by the Python 2 confusion between characters and bytes. I suggest forgetting it and the discussion that followed. Bytes as bytes can carry any digital information, just as modulated sine waves can carry any analog information. In both cases, one can regard them as either purely what they are or as encoding information in some other form. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Beazley 4E P.E.R, Page29: Unicode
On 14 July 2013 04:09, wrote:
> http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-unicode
>
> "directly writing a raw UTF-8 encoded string such as 'Jalape\xc3\xb1o' simply
> produces a nine-character string U+004A, U+0061, U+006C, U+0061, U+0070,
> U+0065, U+00C3, U+00B1, U+006F, which is probably not what you intended.This
> is because in UTF-8, the multi- byte sequence \xc3\xb1 is supposed to
> represent the single character U+00F1, not the two characters U+00C3 and
> U+00B1."
Correct.
> My original question was: Shouldn't this be 8 characters - not 9?
No, Python tends to be right on these things.
> He says: \xc3\xb1 is supposed to represent the single character. However
> after some interaction with fellow Pythonistas i'm even more confused.
You would be, given the way he said it.
> With reference to the above para:
> 1. What does he mean by "writing a raw UTF-8 encoded string"??
Well, that doesn't really mean much with no context like he gave it.
> In Python2, once can do 'Jalape funny-n o'. This is a 'bytes' string where
> each glyph is 1 byte long when stored internally so each glyph is associated
> with an integer as per charset ASCII or Latin-1. If these charsets have a
> funny-n glyph then yay! else nay! There is no UTF-8 here!! or UTF-16!! These
> are plain bytes (8 bits).
>
> Unicode is a really big mapping table between glyphs and integers and are
> denoted as U or U-.
*Waits for our resident unicode experts to explain why you're actually wrong*
> UTF-8 UTF-16 are encodings to store those big integers in an efficient
> manner. So when DB says "writing a raw UTF-8 encoded string" - well the only
> way to do this is to use Python3 where the default string literals are stored
> in Unicode which then will use a UTF-8 UTF-16 internally to store the bytes
> in their respective structures; or, one could use u'Jalape' which is unicode
> in both languages (note the leading 'u').
Correct.
> 2. So assuming this is Python 3: 'Jalape \xYY \xZZ o' (spaces for
> readability) what DB is saying is that, the stupid-user would expect Jalapeno
> with a squiggly-n but instead he gets is: Jalape funny1 funny2 o (spaces for
> readability) -9 glyphs or 9 Unicode-points or 9-UTF8 characters. Correct?
I think so.
> 3. Which leaves me wondering what he means by:
> "This is because in UTF-8, the multi- byte sequence \xc3\xb1 is supposed to
> represent the single character U+00F1, not the two characters U+00C3 and
> U+00B1"
He's mixed some things up, AFAICT.
> Could someone take the time to read carefully and clarify what DB is saying??
Here's a simple explanation: you're both wrong (or you're both *almost* right):
As of Python 3:
>>> "\xc3\xb1"
'ñ'
>>> b"\xc3\xb1".decode()
'ñ'
"WHAT?!" you scream, "THAT'S WRONG!" But it's not. Let me explain.
Python 3's strings want you to give each character separately (*winces
in case I'm wrong*). Python is interpreting the "\xc3" as "\N{LATIN
CAPITAL LETTER A WITH TILDE}" and "\xb1" as "\N{PLUS-MINUS SIGN}"¹.
This means that Python is given *two* characters. Python is basically
doing this:
number = int("c3", 16) # Convert from base16
chr(number) # Turn to the character from the Unicode mapping
When you give Python *raw bytes*, you are saying that this is what the
string looks like *when encoded* -- you are not giving Python Unicode,
but *encoded Unicode*. This means that when you decode it (.decode())
it is free to convert multibyte sections to their relevant characters.
To see how an *encoded string* is not the same as the string itself, see:
>>> "Jalepeño".encode("ASCII", errors="xmlcharrefreplace")
b'Jalepeño'
Those *represent* the same thing, but the first (according to Python)
*is* the thing, the second needs to be *decoded*.
Now, bringing this back to the original:
>>> "\xc3\xb1".encode()
b'\xc3\x83\xc2\xb1'
You can see that the *encoded* bytes represent the *two* characters;
the string you see above is *not the encoded one*. The encoding is
*internal to Python*.
I hope that helps; good luck.
¹ Note that I find the "\N{...}" form much easier to read, and recommend it.
--
http://mail.python.org/mailman/listinfo/python-list
Re: GeoIP2 for retrieving city and region ?
Στις 14/7/2013 8:24 πμ, ο/η Chris Angelico έγραψε: On Sun, Jul 14, 2013 at 3:18 PM, ��� wrote: Can we get the location serived from lat/long coordinates? Yes, assuming you get accurate latitude and longitude, so you're back to square 1. ChrisA Dear Freelance, Thank you for your interest in MaxMind Web Services. We have set up a demo account which includes the following web service(s): GeoIP City Demo (1000 lookups available) Usage: http://geoip.maxmind.com/b?l=YOUR_LICENSE_KEY&i=24.24.24.24 Example scripts may be found at: http://dev.maxmind.com/geoip/web-services GeoIP City with ISP and Organization Demo (1000 lookups available) Usage: http://geoip.maxmind.com/f?l=YOUR_LICENSE_KEY&i=24.24.24.24 Example scripts may be found at: http://dev.maxmind.com/geoip/web-services Lets see if that would be of any help. Please try it too you can request a demo trial of maxminds Geo web services. -- What is now proved was at first only imagined! -- http://mail.python.org/mailman/listinfo/python-list
Re: Beazley 4E P.E.R, Page29: Unicode
thank you (both of you) I follow now :) -- http://mail.python.org/mailman/listinfo/python-list
Re: Beazley 4E P.E.R, Page29: Unicode
On Sat, 13 Jul 2013 20:09:31 -0700, vek.m1234 wrote:
> http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-
unicode
>
> "directly writing a raw UTF-8 encoded string such as 'Jalape\xc3\xb1o'
> simply produces a nine-character string U+004A, U+0061, U+006C, U+0061,
> U+0070, U+0065, U+00C3, U+00B1, U+006F, which is probably not what you
> intended. This is because in UTF-8, the multi-byte sequence \xc3\xb1 is
> supposed to represent the single character U+00F1, not the two
> characters U+00C3 and U+00B1."
This demonstrates confusion of the fundamental concepts, while still
accidentally stumbling across the basic facts right. No wonder it is
confusing you, it confuses me too! :-)
Encoding does not generate a character string, it generates bytes. So the
person you are quoting is causing confusion when he talks about an
"encoded string", he should either make it clear he means a string of
bytes, or not mention the word string at all. Either of these would work:
... a UTF-8 encoded byte-string b'Jalape\xc3\xb1o'
... UTF-8 encoded bytes b'Jalape\xc3\xb1o'
For older versions of Python (2.5 or older), unfortunately the b''
notation does not work, and you have to leave out the b.
Even better would be if Python did not conflate ASCII characters with
bytes, and forced you to write byte strings like this:
... a UTF-8 encoded byte-string b'\x4a\x61\x6c\x61\x70\x65\xc3\xb1\x6f'
thus keeping the distinction between ASCII characters and bytes clear.
But that would break backwards compatibility *way* too much, and so
Python continues to conflate ASCII characters with bytes, even in Python
3. But I digress.
The important thing here is that bytes b'Jalape\xc3\xb1o' consists of
nine hexadecimal values, as shown above. Seven of them represent the
ASCII characters Jalape and o and two of them are not ASCII. Their
meaning depends on what encoding you are using.
(To be precise, even the meaning of the other seven bytes depends on the
encoding. Fortunately, or unfortunately as the case may be, *most* but
not all encodings use the same hex values for ASCII characters as ASCII
itself does, so I will stop mentioning this and just pretend that
character J always equals hex byte 4A. But now you know the truth.)
Since we're using the UTF-8 encoding, the two bytes \xc3\xb1 represent
the character ñ, also known as LATIN SMALL LETTER N WITH TILDE. In other
encodings, those two bytes will represent something different.
So, I presume that the original person's *intention* was to get a Unicode
text string 'Jalapeño'. If they were wise in the ways of Unicode, they
would write one of these:
'Jalape\N{LATIN SMALL LETTER N WITH TILDE}o'
'Jalape\u00F1o'
'Jalape\U00F1o'
'Jalape\xF1o' # hex
'Jalape\361o' # octal
and be happy. (In Python 2, they would need to prefix all of these with
u, to use Unicode strings instead of byte strings.)
But alas they have been mislead by those who propagate myths,
misunderstandings and misapprehensions about Unicode all over the
Internet, and so they looked up ñ somewhere, discovered that it has the
double-byte hex value c3b1 in UTF-8, and thought they could write this:
'Jalape\xc3\xb1o'
This does not do what they think it does. It creates a *text string*, a
Unicode string, with NINE characters:
J a l a p e à ± o
Why? Because character à has ordinal value 195, which is c3 in hex, hence
\xc3 is the character Ã; likewise \xb1 is the character ± which has
ordinal value 177 (b1 in hex). And so they have discovered the wickedness
that is mojibake.
http://en.wikipedia.org/wiki/Mojibake
Instead, if they had started with a *byte-string*, and explicitly decoded
it as UTF-8, they would have been fine:
# I manually encoded 'Jalapeño' to get the bytes below:
bytes = b'Jalape\xc3\xb1o'
print(bytes.decode('utf-8'))
> My original question was: Shouldn't this be 8 characters - not 9? He
> says: \xc3\xb1 is supposed to represent the single character. However
> after some interaction with fellow Pythonistas i'm even more confused.
Depends on the context. \xc3\xb1 could mean the Unicode string
'\xc3\xb1' (in Python 2, written u'\xc3\xb1') or it could mean the byte-
string b'\xc3\xb1' (in Python 2.5 or older, written without the b).
As a string, \xc3\xb1 means two characters, with ordinal values 0xC3 (or
decimal 195) and 0xB1 (or decimal 177), namely 'Ã' and '±'.
As bytes, \xc3\xb1 represent two bytes (well, duh), which could mean
nearly anything:
- the 16-bit Big Endian integer 50097
- the 16-bit Little Endian integer 45507
- a 4x4 black and white bitmap
- the character '簽' (CJK UNIFIED IDEOGRAPH-7C3D) in Big5 encoded bytes
- '뇃' (HANGUL SYLLABLE NWAES) in UTF-16 (Big Endian) encoded bytes
- 'ñ' in UTF-8 encoded bytes
- the two characters 'ñ' in Latin-1 encoded bytes
- 'ñ' in MacRoman encoded bytes
- 'Γ±' in ISO-8859-7 encoded bytes
and so forth. Without knowing the context, there is no way of telling
what those two bytes represent, or whe
Re: hex dump w/ or w/out utf-8 chars
Le samedi 13 juillet 2013 21:02:24 UTC+2, Dave Angel a écrit : > On 07/13/2013 10:37 AM, [email protected] wrote: > > > > > > Fortunately for us, Python (in version 3.3 and later) and Pike did it > > right. Some day the others may decide to do similarly. > > > --- Possible but I doubt. For a very simple reason, the latin-1 block: considered and accepted today as beeing a Unicode design mistake. jmf -- http://mail.python.org/mailman/listinfo/python-list
Re: Beazley 4E P.E.R, Page29: Unicode
Hello Steven, a 'thank you' sounds insufficient and largely disproportionate to to the time and energy you spent in drafting a thoroughly comprehensive answer to my question. I've cross posted both answers to stackoverflow (with some minor formatting changes). I'll try to do something nice on your account. -- http://mail.python.org/mailman/listinfo/python-list
Re: [Python-ideas] float('∞')=float('inf')
14.07.13 06:09, Chris Angelico написав(ла): Incidents like this are a definite push, but my D&D campaign is demanding my attention right now, so I haven't made the move. Are you role-playing Chaos Mage [1]? [1] http://www.dandwiki.com/wiki/Chaos_Mage_(3.5e_Class) -- http://mail.python.org/mailman/listinfo/python-list
Re: [Python-ideas] float('∞')=float('inf')
On Sun, Jul 14, 2013 at 8:23 PM, Serhiy Storchaka wrote: > 14.07.13 06:09, Chris Angelico написав(ла): > >> Incidents like this are a definite push, but my D&D campaign is >> demanding my attention right now, so I haven't made the move. > > > Are you role-playing Chaos Mage [1]? > > [1] http://www.dandwiki.com/wiki/Chaos_Mage_(3.5e_Class) I should probably try that some time. Though in our D&D parties, I tend to land the role of Dungeon Master by default... still looking for people to DM some more campaigns. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: hex dump w/ or w/out utf-8 chars
On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote: > For a very simple reason, the latin-1 block: considered and accepted > today as beeing a Unicode design mistake. Latin-1 (also known as ISO-8859-1) was based on DEC's "Multinational Character Set", which goes back to 1983. ISO-8859-1 was first published in 1985, and was in use on Commodore computers the same year. The concept of Unicode wasn't even started until 1987, and the first draft wasn't published until the end of 1990. Unicode wasn't considered ready for production use until 1991, six years after Latin-1 was already in use in people's computers. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
How to internationalize a python script? (i18n)
Hello,
I am trying to internationalize a script. First I have tried with a
little script to understand how it works, but unfortunately, it doesn't.
I have followed instruction in this page:
http://docs.python.org/2/library/i18n.html
I have created my script, marked strings with the _() function,
installed it in the main namespace, but when I try to load the locale
file it said methat locale is unavailable:
IOError: [Errno 2] No translation file found for domain: 'helloi18n'
C:\dropbox\Public\helloi18n>
I have created the pot file with pygettext, localized it with poedit and
compiled the related .mo file.
As reported in this page:
http://docs.python.org/2/library/gettext.html
«Bind the domain to the locale directory localedir. More concretely,
gettext will look for binary .mo files for the given domain using the
path (on Unix): localedir/language/LC_MESSAGES/domain.mo, where
languages is searched for in the environment variables LANGUAGE, LC_ALL,
LC_MESSAGES, and LANG respectively.»
I have put my .mo file in locale\it\LC_MESSAGES naming it helloi18n.mo
here are all my files:
https://dl.dropboxusercontent.com/u/4400966/helloi18n.tar.gz
This is the code of the helloi18n.py file:
# Simple script to use internationalization (i18n)
import gettext
import os
LOCALE_DIR = os.path.join(os.path.abspath('.'), 'locale')
print LOCALE_DIR
print "---"
a=gettext.find('helloi18n', LOCALE_DIR, 'it')
print a
gettext.install('helloi18n', localedir=LOCALE_DIR, unicode=1)
gettext.textdomain ('helloi18n')
gettext.translation('helloi18n', LOCALE_DIR, 'it')
if __name__ == '__main__':
print _('Hello world!')
print (_('My first localized python script'))
Somebody could help me?
Sandro
--
http://mail.python.org/mailman/listinfo/python-list
Re: hex dump w/ or w/out utf-8 chars
Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit :
> On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote:
>
>
>
> > For a very simple reason, the latin-1 block: considered and accepted
>
> > today as beeing a Unicode design mistake.
>
>
>
> Latin-1 (also known as ISO-8859-1) was based on DEC's "Multinational
>
> Character Set", which goes back to 1983. ISO-8859-1 was first published
>
> in 1985, and was in use on Commodore computers the same year.
>
>
>
> The concept of Unicode wasn't even started until 1987, and the first
>
> draft wasn't published until the end of 1990. Unicode wasn't considered
>
> ready for production use until 1991, six years after Latin-1 was already
>
> in use in people's computers.
>
>
>
>
>
>
>
> --
>
> Steven
--
"Unicode" (in fact iso-14xxx) was not created in one
night (Deus ex machina).
What's count today is this:
>>> timeit.repeat("a = 'hundred'; 'x' in a")
[0.11785943134991479, 0.09850454944486256, 0.09761604599423179]
>>> timeit.repeat("a = 'hundreœ'; 'x' in a")
[0.23955250303158593, 0.2195812612416752, 0.22133896997401692]
>>>
>>>
>>> sys.getsizeof('d')
26
>>> sys.getsizeof('œ')
40
>>> sys.version
'3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)]'
jmf
--
http://mail.python.org/mailman/listinfo/python-list
Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)
On Sun, Jul 14, 2013 at 11:44 PM, wrote:
> Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit :
>> On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote:
>>
>>
>>
>> > For a very simple reason, the latin-1 block: considered and accepted
>>
>> > today as beeing a Unicode design mistake.
>>
>>
>>
>> Latin-1 (also known as ISO-8859-1) was based on DEC's "Multinational
>>
>> Character Set", which goes back to 1983. ISO-8859-1 was first published
>>
>> in 1985, and was in use on Commodore computers the same year.
>>
>>
>>
>> The concept of Unicode wasn't even started until 1987, and the first
>>
>> draft wasn't published until the end of 1990. Unicode wasn't considered
>>
>> ready for production use until 1991, six years after Latin-1 was already
>>
>> in use in people's computers.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Steven
>
> --
>
> "Unicode" (in fact iso-14xxx) was not created in one
> night (Deus ex machina).
>
> What's count today is this:
>
timeit.repeat("a = 'hundred'; 'x' in a")
> [0.11785943134991479, 0.09850454944486256, 0.09761604599423179]
timeit.repeat("a = 'hundreœ'; 'x' in a")
> [0.23955250303158593, 0.2195812612416752, 0.22133896997401692]
sys.getsizeof('d')
> 26
sys.getsizeof('œ')
> 40
sys.version
> '3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit
> (Intel)]'
jmf has raised an interesting point. Some string membership operations
do seem oddly slow.
# Get ourselves a longish ASCII string with no duplicates - escape
apostrophe and backslash for code later on
>>> asciichars=''.join(chr(i) for i in
>>> range(32,128)).replace("\\",r"\\").replace("'",r"\'")
>>> haystack=[
("ASCII",asciichars+"\u0001"),
("BMP",asciichars+"\u1234"),
("SMP",asciichars+"\U00012345"),
]
>>> needle=[
("ASCII","\u0002"),
("BMP","\u1235"),
("SMP","\U00012346"),
]
>>> useset=[
("",""),
(", as set","; a=set(a)"),
]
>>> for time,desc in sorted((min(timeit.repeat("'%s' in
>>> a"%n,("a='%s'"%h)+s)),"%s in %s%s"%(nd,hd,sd)) for nd,n in needle for hd,h
>>> in haystack for sd,s in useset):
print("%.10f %s"%(time,desc))
0.1765129367 ASCII in ASCII, as set
0.1767096097 BMP in SMP, as set
0.1778647845 ASCII in BMP, as set
0.1785266004 BMP in BMP, as set
0.1789093307 SMP in SMP, as set
0.1790431465 SMP in BMP, as set
0.1796504863 BMP in ASCII, as set
0.1803854959 SMP in ASCII, as set
0.1810674262 ASCII in SMP, as set
0.1817367850 SMP in BMP
0.1884555160 SMP in ASCII
0.2132371572 BMP in ASCII
0.3137454621 ASCII in ASCII
0.4472624314 BMP in BMP
0.6672795006 SMP in SMP
0.7493052888 ASCII in BMP
0.9261783271 ASCII in SMP
0.9865787412 BMP in SMP
(In separate testing I ascertained that it makes little difference
whether the character is absent from the string or is the last
character in it. Presumably the figures would be lower if the
character is at the start of the string, but this is not germane to
this discussion.)
Set membership is faster than string membership, though marginally on
something this short. If the needle is wider than the haystack, it
obviously can't be present, so a false return comes back at the speed
of a set check. Otherwise, an actual search must be done. Searching
for characters in strings of the same width gets slower as the strings
get larger in memory (unsurprising). What I'm seeing of the top-end
results, though, is that the search for a narrower string in a wider
one is quite significantly slower.
I don't know of an actual proven use-case for this, but it seems
likely to happen (eg you take user input and want to know if there are
any HTML-sensitive characters in it, so you check ('<' in string or
'&' in string), for instance). The question is, is it worth
constructing an "expanded string" at the haystack's width prior to
doing the search?
ChrisA
--
http://mail.python.org/mailman/listinfo/python-list
Re: How to internationalize a python script? (i18n)
Risposta al messaggio di gialloporpora :
gettext.translation('helloi18n', LOCALE_DIR, 'it')
Ok, I have, with a little help of my friend, found the issue. The
language code must be passed as a list not as a string.
Sorry.
Sandro
--
*Thunderbird come evitare il circolo vizioso “Re: R:” negli oggetti
delle mail * - http://bit.ly/19yMSsZ
--
http://mail.python.org/mailman/listinfo/python-list
List comp help
I have a dict of lists. I need to create a list of 2 tuples, where each tuple
is a key from
the dict with one of the keys list items.
my_dict = {
'key_a': ['val_a', 'val_b'],
'key_b': ['val_c'],
'key_c': []
}
[(k, x) for k, v in my_dict.items() for x in v]
This works, but I need to test for an empty v like the last key, and create one
tuple ('key_c', None).
Anyone know the trick to reorganize this to accept the test for an empty v and
add the else?
Thanks!
jlc
--
http://mail.python.org/mailman/listinfo/python-list
Re: List comp help
On Mon, Jul 15, 2013 at 3:10 AM, Joseph L. Casale
wrote:
> I have a dict of lists. I need to create a list of 2 tuples, where each tuple
> is a key from
> the dict with one of the keys list items.
>
> my_dict = {
> 'key_a': ['val_a', 'val_b'],
> 'key_b': ['val_c'],
> 'key_c': []
> }
> [(k, x) for k, v in my_dict.items() for x in v]
>
> This works, but I need to test for an empty v like the last key, and create
> one tuple ('key_c', None).
> Anyone know the trick to reorganize this to accept the test for an empty v
> and add the else?
Yeah, it's remarkably easy too! Try this:
[(k, x) for k, v in my_dict.items() for x in v or [None]]
An empty list counts as false, so the 'or' will then take the second
option, and iterate over the one-item list with None in it.
Have fun!
ChrisA
--
http://mail.python.org/mailman/listinfo/python-list
Re: Editor Ergonomics [was: Important features for editors]
On 2013-07-12, Steven D'Aprano wrote: > On Thu, 11 Jul 2013 09:45:33 -0400, Roy Smith wrote: > >> In article <[email protected]>, >> [email protected] wrote: >> >>> On Wednesday, July 10, 2013 2:17:12 PM UTC+10, Xue Fuqiao wrote: >>> >>> > * It is especially handy for selecting and deleting text. >>> >>> When coding I never use a mouse to select text regions or to delete >>> text. >>> >>> These operations I do using just the keyboard. >> >> For good typists, there is high overhead to getting your hands oriented >> on the keyboard (that's why the F and J keys have little bumps). So, >> any time you move your hand from the keyboard to the mouse, you pay a >> price. >> >> The worst thing is to constantly be alternating between mouse actions >> and keyboard actions. You spend all your time getting your fingers >> hands re-oriented. That's slow. > > Big deal. I am utterly unconvinced that raw typing speed is even close to > a bottleneck when programming. Data entry and transcribing from (say) > dictated text, yes. Coding, not unless you are a one-fingered hunt-and- > peek typist. The bottleneck is not typing speed but thinking speed: > thinking about program design and APIs, thinking about data structures > and algorithms, debugging, etc. Typing time is definitely a small portion of coding time. However, since I learned touch typing I have found that I can work more hours without getting tired. It used to be that the repetitive up-down motion of the head was quickly leading to headaches and general tiredness. -- Real (i.e. statistical) tennis and snooker player rankings and ratings: http://www.statsfair.com/ -- http://mail.python.org/mailman/listinfo/python-list
RE: List comp help
> Yeah, it's remarkably easy too! Try this: > > [(k, x) for k, v in my_dict.items() for x in v or [None]] > > An empty list counts as false, so the 'or' will then take the second option, > and iterate over the one-item list with > > None in it. Right, I overlooked that! Much appreciated, jlc -- http://mail.python.org/mailman/listinfo/python-list
Re: RE Module Performance
On Saturday, July 13, 2013 1:37:46 PM UTC+8, Steven D'Aprano wrote:
> On Fri, 12 Jul 2013 13:58:29 -0400, Devyn Collier Johnson wrote:
>
>
>
> > I plan to spend some time optimizing the re.py module for Unix systems.
>
> > I would love to amp up my programs that use that module.
>
>
>
> In my experience, often the best way to optimize a regex is to not use it
>
> at all.
>
>
>
> [steve@ando ~]$ python -m timeit -s "import re" \
>
> > -s "data = 'a'*100+'b'" \
>
> > "if re.search('b', data): pass"
>
> 10 loops, best of 3: 2.77 usec per loop
>
>
>
> [steve@ando ~]$ python -m timeit -s "data = 'a'*100+'b'" \
>
> > "if 'b' in data: pass"
>
> 100 loops, best of 3: 0.219 usec per loop
>
>
>
> In Python, we often use plain string operations instead of regex-based
>
> solutions for basic tasks. Regexes are a 10lb sledge hammer. Don't use
>
> them for cracking peanuts.
>
>
>
>
>
>
>
> --
>
> Steven
OK, lets talk about the indexed search algorithms of
a character streamor strig which can be buffered and
indexed randomly for RW operations but faster in sequential
block RW operations after some pre-processing.
This was solved long time ago in the suffix array or
suffix tree part and summarized in the famous BWT paper in 199X.
Do we want volunteers to speed up
search operations in the string module in Python?
--
http://mail.python.org/mailman/listinfo/python-list
Re: List comp help
On 07/14/2013 11:16 AM, Chris Angelico wrote:
> On Mon, Jul 15, 2013 at 3:10 AM, Joseph L. Casale
> wrote:
>> I have a dict of lists. I need to create a list of 2 tuples, where each
>> tuple is a key from
>> the dict with one of the keys list items.
>>
>> my_dict = {
>> 'key_a': ['val_a', 'val_b'],
>> 'key_b': ['val_c'],
>> 'key_c': []
>> }
>> [(k, x) for k, v in my_dict.items() for x in v]
>>
>> This works, but I need to test for an empty v like the last key, and create
>> one tuple ('key_c', None).
>> Anyone know the trick to reorganize this to accept the test for an empty v
>> and add the else?
>
> Yeah, it's remarkably easy too! Try this:
>
> [(k, x) for k, v in my_dict.items() for x in v or [None]]
>
> An empty list counts as false, so the 'or' will then take the second
> option, and iterate over the one-item list with None in it.
Or more simply:
[(k, v or None) for k, v in my_dict.items()]
This assumes that all the values in my_dict are lists, and not
other false values like 0, which would also be replaced by None.
--
http://mail.python.org/mailman/listinfo/python-list
Re: List comp help
On Sunday, July 14, 2013 12:32:34 PM UTC-6, [email protected] wrote: > Or more simply: > [(k, v or None) for k, v in my_dict.items()] Too simply :-( Didn't read the op carefully enough. Sorry. -- http://mail.python.org/mailman/listinfo/python-list
Re: GeoIP2 for retrieving city and region ?
On 2013-07-13 16:57, Michael Torrie wrote: > On 07/13/2013 12:23 PM, Νικόλας wrote: > > Do you know a way of implementing anyone of these methods to a > > script? > > Yes. Modern browsers all support a location API in the browser for > javascript. And the good browsers give the user the option to disclose this information or not (and, as I mentioned elsewhere on this thread, even lie about where you are such as with the Geolocater plugin[1] for FF). Some of us value the modicum of privacy that we receive by not being locatable by IP address. -tkc [1] https://addons.mozilla.org/en-us/firefox/addon/geolocater/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Ideal way to separate GUI and logic?
Thanks for all the responses! So as a general idea, I should at the very least separate the GUI from the program logic by defining the logic as a function, correct? And the next level of separation is to define the logic as a class in one or more separate files, and then import it to the file with the GUI, correct? My next question is, to what degree should I 'slice' my logic into functions? How small or how large should one function be, as a rule of thumb? -- http://mail.python.org/mailman/listinfo/python-list
Re: Ideal way to separate GUI and logic?
Thanks for all the responses! So as a general idea, I should at the very least separate the GUI from the program logic by defining the logic as a function, correct? And the next level of separation is to define the logic as a class in one or more separate files, and then import it to the file with the GUI, correct? My next question is, to what degree should I 'slice' my logic into functions? How small or how large should one function be, as a rule of thumb? -- http://mail.python.org/mailman/listinfo/python-list
Re: Ideal way to separate GUI and logic?
On Sun, Jul 14, 2013 at 8:25 PM, wrote: > Thanks for all the responses! > > So as a general idea, I should at the very least separate the GUI from the > program logic by defining the logic as a function, correct? And the next > level of separation is to define the logic as a class in one or more > separate files, and then import it to the file with the GUI, correct? > > My next question is, to what degree should I 'slice' my logic into > functions? How small or how large should one function be, as a rule of > thumb? > -- > http://mail.python.org/mailman/listinfo/python-list > Others may differ. I think you should just write the code. In actually doing that you will learn the pitfalls of how you have divided up your logic. Writing code isn't all theory. It takes practice, and since the days of The Mythical Man-Month, it has been well understood that you always end up throwing away the first system anyway. It has to be built to truly understand what you think you want to create, but in the learning, you realize that its easier and better to start more or less from scratch rather than try to fix the first concept. -- Joel Goldstick http://joelgoldstick.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Ideal way to separate GUI and logic?
In article , Joel Goldstick wrote: > Writing code isn't all theory. It takes practice, and since the days > of The Mythical Man-Month, it has been well understood that you > always end up throwing away the first system anyway. If I may paraphrase Brooks, "Plan to throw the first one away, because it's going to suck. Then, the next one you write to replace it will also suck because it's going to suffer from Second System Effect" :-) BTW, anybody who enjoyed The Mythical Man-Month should also read Ed Yourdon's Death March. -- http://mail.python.org/mailman/listinfo/python-list
Re: Ideal way to separate GUI and logic?
On Sun, 14 Jul 2013 17:25:32 -0700, fronagzen wrote: > My next question is, to what degree should I 'slice' my logic into > functions? How small or how large should one function be, as a rule of > thumb? I aim to keep my functions preferably below a dozen lines (excluding the doc string), and definitely below a page. But more important than size is functionality. Every function should do *one thing*. If that thing can be divided into two or more "sub-things" then they should be factored out into separate functions, which I then call. Possibly private, internal only functions. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
what thread-synch mech to use for clean exit from a thread
A currency exchange thread updates exchange rate once a minute. If the thread faield to update currency rate for 5 hours, it should inform main() for a clean exit. This has to be done gracefully, because main() could be doing something delicate. I, a newbie, read all the thread sync tool, and wasn't sure which one to use. In fact I am not sure if there is a need of thread sync, because there is no racing cond. I thought of this naive way: class CurrencyExchange(): def __init__(in_case_callback): this.callback = in_case_callback def __run__(): while time.time() - self.rate_timestamp < 5*3600: ... # update exchange rate if success: self.rate_timestamp == time.time() time.sleep(60) this.callback() # rate not updated 5 hours, a crisis def main(): def callback() Go_On = False agio = CurrencyExchange(in_case = callback) agio.start() Go_On = True while Go_On: do_something_delicate(rate_supplied_by=agio) As you can see, if there is no update of currency rate for 5 hours, the CurrencyExchange object calls the callback, which prevents main() from doing the next delicate_thing, but do not interrupt the current delicate_thing. This seems OK, but doesn't look pythonic -- replacing callback() with a lambda doesn't help much, it still look naive. What is the professional way in this case? Thanks in advance! -- http://mail.python.org/mailman/listinfo/python-list
Re: what thread-synch mech to use for clean exit from a thread
On Monday, July 15, 2013 10:27:45 AM UTC+8, Gildor Oronar wrote: > What is the professional way in this case? Hi. I am not a professional neither but I think a professional does this: class CurrencyExchange(): def __init__(in_case_callback): this.callback = in_case_callback def __run__(): while time.time() - self.rate_timestamp < 5*3600: ... # update exchange rate if success: self.rate_timestamp == time.time() time.sleep(60) def main(): agio = CurrencyExchange(in_case = callback) agio.start() while agio.is_alive(): do_something_delicate(rate_supplied_by=agio) Notice even if agio is no longer alive, it can still supply exchange rate for the last delicate_thing, only that it no longer updates punctually. This is semantic wrong, and I think it is the fault of python: how can something dead execute its method? In the API, thread.is_alive() should be renamed to thread.is_activate_and_on_his_own() -- http://mail.python.org/mailman/listinfo/python-list
Re: what thread-synch mech to use for clean exit from a thread
On Mon, 15 Jul 2013 10:27:45 +0800, Gildor Oronar wrote: > A currency exchange thread updates exchange rate once a minute. If the > thread faield to update currency rate for 5 hours, it should inform > main() for a clean exit. This has to be done gracefully, because main() > could be doing something delicate. > > I, a newbie, read all the thread sync tool, and wasn't sure which one to > use. In fact I am not sure if there is a need of thread sync, because > there is no racing cond. I thought of this naive way: > > class CurrencyExchange(): > def __init__(in_case_callback): >this.callback = in_case_callback You need to declare the instance parameter, which is conventionally called "self" not "this". Also, your class needs to inherit from Thread, and critically it MUST call the superclass __init__. So: class CurrencyExchange(threading.Thread): def __init__(self, in_case_callback): super(CurrencyExchange, self).__init__() self.callback = in_case_callback But I'm not sure that a callback is the right approach here. See below. > def __run__(): Likewise, you need a "self" parameter. >while time.time() - self.rate_timestamp < 5*3600: > ... # update exchange rate > if success: > self.rate_timestamp == time.time() > time.sleep(60) >this.callback() # rate not updated 5 hours, a crisis I think that a cleaner way is to just set a flag on the thread instance. Initiate it with: self.updates_seen = True in the __init__ method, and then add this after the while loop: self.updates_seen = False > def main(): > def callback() >Go_On = False I don't believe this callback will work, because it will simply create a local variable call "Go_On", not change the non-local variable. In Python 3, you can use the nonlocal keyword to get what you want, but I think a better approach is with a flag on the thread. > agio = CurrencyExchange(in_case = callback) > agio.start() > > Go_On = True > while Go_On: >do_something_delicate(rate_supplied_by=agio) Change to: while agio.updates_seen: do_something_delicate... -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: what thread-synch mech to use for clean exit from a thread
Oh, I forgot another comment... On Mon, 15 Jul 2013 03:04:14 +, Steven D'Aprano wrote: > On Mon, 15 Jul 2013 10:27:45 +0800, Gildor Oronar wrote: >>while time.time() - self.rate_timestamp < 5*3600: >> ... # update exchange rate >> if success: >> self.rate_timestamp == time.time() >> time.sleep(60) >>this.callback() # rate not updated 5 hours, a crisis > > I think that a cleaner way is to just set a flag on the thread instance. > Initiate it with: > > self.updates_seen = True > > in the __init__ method, and then add this after the while loop: > > self.updates_seen = False Sorry, I forgot to mention... I assume that the intention is that if the thread hasn't seen any updates for five hours, it should set the flag, and then *keep going*. Perhaps the rate will start updating again later. If the intention is to actually close the thread, then there's no reason for an extra flag. Just exit the run() method normally, the thread will die, and you can check the thread's status with the is_alive() method. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
SMITHSONIAN HAS IT'S LAST WORDS...
=== A TOUCHY SUBJECT... === > A WILY THRINAXODON SUED THE SMITHSONIAN FIVE HUNDRED DOLLARS FOR SUPPRESSION OF FREEDOM OF EXPRESSION. > "This is a blow to evolutionism," SAID RICHARD DAWKINS. > ONE WHOM THRINAXODON HAS HAD SEVERAL *long* RUNNING FEUDS OVER THE PAST 40 YEARS. > THE SMITHSONIAN IS BEING TORN DOWN. > THE SPECIMENS BURNED, BOOKS REWRITTEN, etc. > "This never happened with Ed Conrad," SAID BARACK OBAMA. > EVOLUTIONISTS ALL OVER THE WORLD GET SUNK IN TEARS AS ONE OF THE MAJOR CORPORATIONS GET SHUT DOWN... > > I KNOW, I KNOW...YOU NEED A RESOURCE FOR THIS SCIENCE. > WELL TYPE IN news://sci.bio.paleontology, news://sci.skeptic, news://dc.smithsonian, etc ON YOUR WEB BROWSER. > === > http://thrinaxodon.wordpress.com/ > === > THRINAXODON IS NOW ON TWITTER. -- http://mail.python.org/mailman/listinfo/python-list
Re: Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)
On 7/14/2013 10:56 AM, Chris Angelico wrote:
On Sun, Jul 14, 2013 at 11:44 PM, wrote:
timeit.repeat("a = 'hundred'; 'x' in a")
[0.11785943134991479, 0.09850454944486256, 0.09761604599423179]
timeit.repeat("a = 'hundreœ'; 'x' in a")
[0.23955250303158593, 0.2195812612416752, 0.22133896997401692]
sys.version
'3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)]'
As issue about finding stings in strings was opened last September and,
as reported on this list, fixes were applied about last March. As I
remember, some but not all of the optimizations were applied to 3.3.
Perhaps some were applied too late for 3.3.1 (3.3.2 is 3.3.1 with some
emergency patches to correct regressions).
Python 3.4.0a2:
>>> import timeit
>>> timeit.repeat("a = 'hundred'; 'x' in a")
[0.17396483610667152, 0.16277956641670813, 0.1627937074749941]
>>> timeit.repeat("a = 'hundreo'; 'x' in a")
[0.18441108179403187, 0.16277311071618783, 0.16270517215355085]
The difference is gone, again, as previously reported.
jmf has raised an interesting point. Some string membership operations
do seem oddly slow.
He raised it a year ago and action was taken.
# Get ourselves a longish ASCII string with no duplicates - escape
apostrophe and backslash for code later on
asciichars=''.join(chr(i) for i in
range(32,128)).replace("\\",r"\\").replace("'",r"\'")
haystack=[
("ASCII",asciichars+"\u0001"),
("BMP",asciichars+"\u1234"),
("SMP",asciichars+"\U00012345"),
]
needle=[
("ASCII","\u0002"),
("BMP","\u1235"),
("SMP","\U00012346"),
]
useset=[
("",""),
(", as set","; a=set(a)"),
]
for time,desc in sorted((min(timeit.repeat("'%s' in a"%n,("a='%s'"%h)+s)),"%s in
%s%s"%(nd,hd,sd)) for nd,n in needle for hd,h in haystack for sd,s in useset):
print("%.10f %s"%(time,desc))
0.1765129367 ASCII in ASCII, as set
0.1767096097 BMP in SMP, as set
0.1778647845 ASCII in BMP, as set
0.1785266004 BMP in BMP, as set
0.1789093307 SMP in SMP, as set
0.1790431465 SMP in BMP, as set
0.1796504863 BMP in ASCII, as set
0.1803854959 SMP in ASCII, as set
0.1810674262 ASCII in SMP, as set
Much of this time is overhead; 'pass' would not run too much faster.
0.1817367850 SMP in BMP
0.1884555160 SMP in ASCII
0.2132371572 BMP in ASCII
For these, 3.3 does no searching because it knows from the internal char
kind that the answer is No without looking.
0.3137454621 ASCII in ASCII
0.4472624314 BMP in BMP
0.6672795006 SMP in SMP
0.7493052888 ASCII in BMP
0.9261783271 ASCII in SMP
0.9865787412 BMP in SMP
...
Set membership is faster than string membership, though marginally on
something this short. If the needle is wider than the haystack, it
obviously can't be present, so a false return comes back at the speed
of a set check.
Jim ignores these cases where 3.3+ uses the information about the max
codepoint to do the operation much faster than in 3.2.
Otherwise, an actual search must be done. Searching
for characters in strings of the same width gets slower as the strings
get larger in memory (unsurprising). What I'm seeing of the top-end
results, though, is that the search for a narrower string in a wider
one is quite significantly slower.
50% longer is not bad, even
I don't know of an actual proven use-case for this, but it seems
likely to happen (eg you take user input and want to know if there are
any HTML-sensitive characters in it, so you check ('<' in string or
'&' in string), for instance).
In my editing of code, I nearly always search for words or long names.
The question is, is it worth
constructing an "expanded string" at the haystack's width prior to
doing the search?
I would not make any assumptions about what Python does or does not do
without checking the code. All I know is that Python uses a modified
version of one of the pre-process and skip-forward algorithms
(Boyer-Moore?, Knuth-Pratt?, I forget). These are designed to work
efficiently with needles longer than 1 char, and indeed may work better
with longer needles. Searching for an single char in n chars is O(n).
Searching for a len m needle is potentially O(m*n) and the point of the
fancy algorithms is make all searches as close to O(n) as possible.
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list
Re: Timing of string membership (was Re: hex dump w/ or w/out utf-8 chars)
On Mon, Jul 15, 2013 at 2:18 PM, Terry Reedy wrote:
> On 7/14/2013 10:56 AM, Chris Angelico wrote:
> As issue about finding stings in strings was opened last September and, as
> reported on this list, fixes were applied about last March. As I remember,
> some but not all of the optimizations were applied to 3.3. Perhaps some were
> applied too late for 3.3.1 (3.3.2 is 3.3.1 with some emergency patches to
> correct regressions).
D'oh. I knew there was something raised and solved regarding that, but
I forgot to go check a 3.4 alpha to see if it exhibited the same.
Whoops. My bad. Sorry!
> Python 3.4.0a2:
import timeit
>
timeit.repeat("a = 'hundred'; 'x' in a")
> [0.17396483610667152, 0.16277956641670813, 0.1627937074749941]
timeit.repeat("a = 'hundreo'; 'x' in a")
> [0.18441108179403187, 0.16277311071618783, 0.16270517215355085]
>
> The difference is gone, again, as previously reported.
Yep, that looks exactly like I would have hoped it would.
>> 0.1765129367 ASCII in ASCII, as set
>
> Much of this time is overhead; 'pass' would not run too much faster.
>
>> 0.1817367850 SMP in BMP
>> 0.1884555160 SMP in ASCII
>> 0.2132371572 BMP in ASCII
>
> For these, 3.3 does no searching because it knows from the internal char
> kind that the answer is No without looking.
Yeah, I mainly included those results so I could say to jmf "Look, FSR
allows some string membership operations to be, I kid you not, as fast
as set operations!".
>> 0.3137454621 ASCII in ASCII
>> 0.4472624314 BMP in BMP
>> 0.6672795006 SMP in SMP
>> 0.7493052888 ASCII in BMP
>> 0.9261783271 ASCII in SMP
>> 0.9865787412 BMP in SMP
>
>> Otherwise, an actual search must be done. Searching
>> for characters in strings of the same width gets slower as the strings
>> get larger in memory (unsurprising). What I'm seeing of the top-end
>> results, though, is that the search for a narrower string in a wider
>> one is quite significantly slower.
>
> 50% longer is not bad, even
Hard to give an estimate; my first tests were the ASCII in ASCII and
ASCII in BMP, which then looked more like 2:1 time. However, rescaling
the needle to BMP makes it more like the 50% you're quoting, so yes,
it's not as much as I thought.
In any case, the most important thing to note is: 3.4 has already
fixed this, ergo jmf should shut up about it. And here I thought I
could credit him with a second actually-useful report...
ChrisA
--
http://mail.python.org/mailman/listinfo/python-list
Re: List comp help
On 7/14/2013 1:10 PM, Joseph L. Casale wrote:
I have a dict of lists. I need to create a list of 2 tuples, where each tuple
is a key from
the dict with one of the keys list items.
my_dict = {
'key_a': ['val_a', 'val_b'],
'key_b': ['val_c'],
'key_c': []
}
[(k, x) for k, v in my_dict.items() for x in v]
The order of the tuples in not deterministic unless you sort, so if
everything is hashable, a set may be better.
This works, but I need to test for an empty v like the last key, and create one
tuple ('key_c', None).
Anyone know the trick to reorganize this to accept the test for an empty v and
add the else?
When posting code, it is a good idea to includes the expected or desired
answer in code as well as text.
pairs = {(k, x) for k, v in my_dict.items() for x in v or [None]}
assert pairs == {('key_a', 'val_a'), ('key_a', 'val_b'),
('key_b', 'val_c'), ('key_c', None)}
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list
