[Tutor] map vs. list comprehension
I read somewhere that the function 'map' might one day be deprecated in favor of list comprehensions. But I can't see a way to do this in a list comprehension: >>> map (pow, [2, 2, 2, 2], [1, 2, 3, 4]) [2, 4, 8, 16] Is there a way? Cheers, Mike ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] map vs. list comprehension
On Feb 14, 2006, at 3:46 PM, Andre Roberge wrote: > [2**i for i in [1, 2, 3, 4]] Ah yes, I'm sorry, I was thinking of the most general case, where the arguments are two arbitrary lists. My example was too specific. Is there a way to do something like the following in a list comprehension? map(pow, [1, 7, 6, 2], [3, 8, 2, 5]) Mike ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Unexpected behavior of +=
I just discovered the following behavior, but can't find any documentation about it: >>> list = [] >>> list = list + 'abc' Traceback (most recent call last): File "", line 1, in ? TypeError: can only concatenate list (not "str") to list but: >>> list = [] >>> list += 'abc' >>> list ['a', 'b', 'c'] Is this a special characteristic that has been added to the augmented assignment operator +=; or is it an automatic consequence of += assignment being performed'in place'? (Tho I can't see how it could be...) It just seems very un-Pythonesque to be able to successfully concatenate objects of different types like this. And it seems very inconsistent with standard assignment. Indeed, the Python Reference Manual, section 6.3.1 states: "With the exception of assigning to tuples and multiple targets in a single statement, the assignment done by augmented assignment statements is handled the same way as normal assignments. Similarly, with the exception of the possible in-place behavior, the binary operation performed by augmented assignment is the same as the normal binary operations." ...which is patently not the case here. I was scandalized lol! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] 'in-place' methods
I think I understand this sorting-a-list 'in place' stuff, and things of that kind (reversing for example); but I am finding it very difficult to get used to, since sorting a list doesn't return the sorted list as a value, but simply does the work as a side effect. The place where it really trips me up is when I want to do something like pass a sorted list as a value in a program, say. Here is a trivial example. Say I want to define a function to always print lists in sorted form. I can't do it as follows, tho it's what I want to do intuitively: def sort_print(L): print L.sort() This is on a strict analogy to writing a function that prints strings in upper case only: def upper_print(s): print s.upper() (The fact that the second works and the first doesn't really does bug me as a newbie.) Anyway, first question: is the fact that the first doesn't work purely a function of the fact that lists are mutable? (At least I could kind of understand that, that methods work differently for objects of different types). Or is it possible to have methods like .upper() that return the result of performing the operation even for lists? Second question. Do I really have to write the sort_print() function like this: def sort_print(L): L.sort() print L i.e. first perform the operation in-place, then pass the variable? Is this the idiomatic way of doing it? Third question: is this in-place behavior of methods in effect restricted to lists, or are there other places I should be on the lookout for it? Cheers, Mike ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Unicode and regexes
Does Python support the Unicode-flavored class-specifications in regular expressions, e.g. \p{L} ? It doesn't work in the following code, any ideas? - #! /usr/local/bin/python """ usage: ./uni_read.py file """ import codecs import re text = codecs.open(sys.argv[1], mode='r', encoding='utf-8').read() unicode_property_pattern = re.compile(r"\p{L}") dot_pattern = re.compile(".") letters = unicode_property_pattern.findall(text) characters = dot_pattern.findall(text) print 'var letters =', letters print 'var characters = ', characters - The input file, encoded in utf-8 is abc The output is: var letters = [] var characters = [u'a', u'b', u'c', u' ', u'\u03b1', u'\u03b2', u'\u03b3'] ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unicode and regexes
Thanks Kent, for breaking the bad news. I'm not angry, just terribly, terribly disappointed. :) "From http://www.unicode.org/unicode/reports/tr18/ I see that \p{L} is intended to select Unicode letters, and it is part of a large number of selectors based on Unicode character properties." Yeah, that's the main cite, and yeah, a large, large number. The only sane way to use regexes with Unicode. Also see Friedl's 'Mastering Regular Expressions' Chapter 3: or actually, if you are a Python only person, don't: it will make you weep. "Python doesn't support this syntax. It has limited support for Unicode character properties [...]". Umm Earth to Python-guys, you *have heard* of Unicode, right? Call me crazy, but in this day and age, I assume a scripting language with regex support will implement standard Unicode conventions, unless there is a compelling reason not to. Very odd. Back to Perl. Right now. Just kidding. Not. Sheesh. This is a big flaw in Python, IMHO. I never saw it coming. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Bigrams and nested dictionaries
I'm playing with the whole idea of creating bigram (digram?) frequencies for text analysis and cryptographic and entropy analysis etc (this is as much an exercise in learning Python and programming as anything else, I realise everything has already been done somewhere somehow :) Though I *am* aiming to run this over unicoded phonetic representations of natural languages, which doesn't seem to be that common. I implemented a single character count using a dictionary, and it worked really well: Character_Count = {} for character in text: Character_Count[character] = Character_Count.get(character, 0) + 1 And so for bigrams I was thinking of creating a data-structure that was a nested dictionary, where the key was the character in position 1, and the value was a sub-dictionary that gave a count of each character in position 2. So the dictionary might begin something like: {'a': {'a':1, 'b':8, 'c':10,...}, 'b' : {'a':23, 'b':0, 'c': 1,...},..., 'z' : {'a':11, 'b':0, 'c':0,...}} The count of the character in position one could be retrieved by summing over the characters in position 2, so the bigram and single- character counts can be done in one pass. I don't want anyone to tell me how to do this :) I'd just like to get a feel for: (i) is this idea of a nested dictionary a good/efficient/tractable data-structure? (ii) is there a straightforward path to the goal of constructing this thing in a single pass over the input string (yes or no will suffice for the moment!), or is there a can of worms lurking in my future. I'd rather have less information than more, but there is something about grabbing characters two-at-a-time, but moving forward one-at-a- time, and sticking the count down a level inside the dictionary that's just a little baffling at the moment. On the other hand it's like moving a two-character window across the text and recording information as you go, which seems like it should be a good thing to do computationally. I like being baffled, I'd just kinda like to know if this is a good problem to work on to gain enlightenment, or a bad one and I should think about a totally different path with a less jazzy data structure. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Bigrams and nested dictionaries
Well I ran into an interesting glitch already. For a dictionary D, I can pull out a nested value using this syntax: >>> D['b']['a'] 23 and I can assign to this dictionary using >>> D['d'] = {'a':7, 'b':0'} but I can't assign like this: >>> D['d']['c'] = 1 TypeError: object does not support item assignment. hmmm. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Bigrams and nested dictionaries
Aha! John wrote: "Are you sure you haven't mistakenly assigned something other than a dict to D or D['d'] ?" Thanks for the tip! Yup that was it (and apologies for not reporting the problem more precisely). I hadn't initialized the nested dictionary before trying to assign to it. (I think Perl doesn't require initialization of dictionaries prior to assignment, which in this case, would be a nice thing...) >>> D['a'] = {'a':1, 'b':2}#oops Traceback (most recent call last): File "", line 1, in ? NameError: name 'D' is not defined >>> D = {} >>> D['a'] = {'a':1, 'b':2} >>> D {'a': {'a': 1, 'b': 2}} >>> D['c']['a'] = 1 #ooops Traceback (most recent call last): File "", line 1, in ? KeyError: 'c' >>> D['c'] = {} >>> D['c']['a'] = 1 >>> D {'a': {'a': 1, 'b': 2}, 'c': {'a': 1}} And Kent wrote: "Encapsulating this in a generator is probably a good plan." Yay, I get to play with generators... thanks for the suggestion, I would never have looked in that direction. -- Python: [x for S in L for x in S] Mathematica: Flatten[L] (but where's the fun in that?) ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Bigrams and nested dictionaries
Well coming up with this has made me really love Python. I worked on this with my online pythonpenpal Kyle, and here is what we came up with. Thanks to all for input so far. My first idea was to use a C-type indexing for-loop, to grab a two- element sequence [i, i+1]: dict = {} for i in range(len(t) - 1): if not dict.has_key(t[i]): dict[t[i]] = {} if not dict[t[i]].has_key(t[i+1]): dict[t[i]][t[i+1]] = 1 else: dict[t[i]][t[i+1]] += 1 Which works, but. Kyle had an alternative take, with no indexing, and after we worked on this strategy it seemed very Pythonesque, and ran almost twice as fast. #!/usr/local/bin/python import sys file = open(sys.argv[1], 'rb').read() # We imagine a 2-byte 'window' moving over the text from left to right # # +---+ # L o n | d o | n . M i c h a e l m a s t e r m ... # +---+ # # At any given point, we call the leftmost byte visible in the window 'L', and the # rightmost byte 'R'. # # +---+ # L o n | L=d R=o | n . M i c h a e l m a s t e r m ... # +---+ # # When the program begins, the first byte is preloaded into L, and we position R # at the second byte of the file. # dict = {} L = file[0] for R in file[1:]: # move right edge of window across the file if not L in dict: dict[L] = {} if not R in dict[L]: dict[L][R] = 1 else: dict[L][R] += 1 L = R # move character in R over to L # that's it. here's a printout strategy: for entry in dict: print entry, ':', sum(dict[entry].values()) print dict[entry] print ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor