On 07/06/2013 11:18 PM, bluepresley wrote:
I'm reading the book Programming Collective Intelligence by Toby Segaran.
I'm having a lot of difficulty understanding the some of the code from
chapter four (the code for this chapter is available online at
https://github.com/cataska/programming-collective-intelligence-code/blob/master/chapter4/searchengine.pyand
starts at line172), specifically staring with this function:
def getscoredlist(self,rows,wordids):
totalscores=dict([(row[0],0) for row in rows])# This is where you'll
later put the scoring functions
weights=[]
for (weight,scores) in weights:
for url in totalscores:
totalscores[url]+=weight*scores[url]
return totalscores
What does this mean?
totalscores=dict([(row[0],0) for row in rows])
Two different things might be confusing you. It'd help to refactor it,
and explain which line isn't clear.
mylist = [(row[0],0) for row in rows]
totalscores = dict(mylist)
del mylist
I'll guess it's the list comprehension that confuses you. That line is
roughly equivalent to:
mylist = []
for row in rows:
mylist.append( (row[0], row) )
So you end up with a list of tuples, each consisting of row[0] and row
The call to dict simply processes that list and makes a dict out of it.
The first item in each tuple is the key, and the second item is the value.
I have a lot of experience with PHP and Objective C. If anyone is familiar
with PHP or another language could you please provide the equivalent? I
think that would really help me understand better.
The function just before that, getmatchingrows, provides the arguments for
getscoredlist. "rows" is rows from a database query; "wordids" is a list of
word ids searched for that generated rows result set. For that function
(getmatchingrows) it returns 2 variables simultaneously. I'm unfamiliar
with this. What's going on there?
I didn't download the link, so I'll just make up a function as an example:
def myfunc():
return 12, 42
That comma between the two values creates a tuple out of them. The
tuple is normally written (12, 42), and is a immutable version of the
list [12, 42]
If somebody calls the function like:
a = myfunc()
then a will be that same tuple.
But if somebody calls the function like:
c, d = myfunc()
then a symmetric trick called "tuple unpacking" goes into effect. When
you have a bunch of variables on the left side of an assignment, it'll
unpack whatever iterable is being passed.
So in this case, c will be 12, and d will be 42. This same syntax works
in any other context, so you can do:
a, b = b, a
to swap two values.
Also, as far as I can tell from the getmatchingrows code, it returns a
multidimensional array of database results with row[0] being the urlid (NOT
the url), and other indices correspond to the word id location.
In getscoredlist, totalscores[url] doesn't make sense. Where is [url]
coming from? could they have meant to say urlid here?
This chapter is also available online for free from O'reilly. Here is the
page that talks specifically about this part.
Any help understanding what this code in this part of this book is doing
would be greatly appreciated.
Thanks,
Blue
http://my.safaribooksonline.com/book/web-development/9780596529321/4dot-searching-and-ranking/querying#X2ludGVybmFsX0h0bWxWaWV3P3htbGlkPTk3ODA1OTY1MjkzMjElMkZjb250ZW50YmFzZWRfcmFua2luZyZxdWVyeT0=
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
--
DaveA
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor