Re: Fuzzy Lookups
BBands wrote:
> I have some CDs and have been archiving them on a PC. I wrote a Python
> script that spans the archive and returns a list of its contents:
> [[genre, artist, album, song]...]. I wanted to add a search function to
> locate all the versions of a particular song. This is harder than you
> might think. For example the Cajun "national anthem" is Jolie Blond,
> but it can be spelled several different ways jolie, joli, blon, blond,
> etc... In addition the various online services that provide song info
> are riddled with typos, so an ordinary string match just doesn't get
> you there. What is needed is a fuzzy string match and it turns out that
> there is a very good one, the Levenshtein distance, which is the number
> of inserts, deletions and substitutions needed to morph one string into
> another. In my application I match the desired song title against all
> song titles in my list and return the ones with the lowest Levenshtein
> distances. This is remarkably, one might even say stunningly,
> effective, easily finding all the version of Jolie Blon in the list.
>
> I am using the following snippet (found on the web, proper attribution
> unsure), which calculates the Levenshtein distance.
>
> def distance(a,b):
> c = {}
> n = len(a); m = len(b)
>
> for i in range(0,n+1):
> c[i,0] = i
> for j in range(0,m+1):
> c[0,j] = j
>
> for i in range(1,n+1):
> for j in range(1,m+1):
> x = c[i-1,j]+1
> y = c[i,j-1]+1
> if a[i-1] == b[j-1]:
> z = c[i-1,j-1]
> else:
> z = c[i-1,j-1]+1
> c[i,j] = min(x,y,z)
> return c[n,m]
>
> As mentioned above this works quite well and I am happy with it, but I
> wonder if there is a more Pythonic way of doing this type of lookup?
>
> jab
Here is my stab at it, didn't fully test it so it may not work
correctly. The tuples could be avoided by using len(x), but the extra
lookups may cost in execution speed[1].
def distance(a, b):
"""
Computes the levenshtein distance between two strings
"""
m, n = (len(a),a), (len(b),b)
if(m[0] < n[0]):#ensure that the 'm' tuple holds
the longest string
m, n = n, m
dist = m[0] #assume distance = length of
longest string (worst case)
for i in range(0, n[0]): # reduce the distance for each char
match in shorter string
if m[1][i] == n[1][i]:
dist = dist - 1
return dist
[1] I have no if this is true or not. I can see arguments for both.
--
http://mail.python.org/mailman/listinfo/python-list
Re: Too Many if Statements?
slogging_away wrote: > Terry Reedy wrote: > > > The OP did not specify whether all of his if-tests were sequential as in > > your test or if some were nested. I vaguely remember there being an indent > > limit (40??). > > Most of the if statements are nested. Almost all of them fall under a > central 'for xxx in range(x,x,x)', (this is the statement that checks > thorugh each of the saved configuration files). Under that 'for' > statment are the bulk of the 'if' statements - some nested and some not > - some also fall under other 'for' statements. The indent level does > not exceed 10.. > Has anyone considered that this may be part of the issue? If he is stepping through a range this is not just X if statements but n * x where n is the number of loops. Possibly some variables that are not getting freed between loops? (my guess would be that it is related to logging) Anyways, no expert here, just wanted to point that out. -- http://mail.python.org/mailman/listinfo/python-list
Re: remote module importing (urlimport)
What plans do you have for security in this? I would think that in order to trust this over the network you would at least need a certificate identifying the server as well as some method of verifying package contents. Either way, cool stuff. -- http://mail.python.org/mailman/listinfo/python-list
Re: do design patterns still apply with Python?
John Salerno wrote: > Yeah, that's what I was wondering. I wonder if, after reading a DP book, > I might have to 'unlearn' some things when applying them to Python. I would say adjust instead of unlearn. This is probably true to a lesser or greater extent of any language for which your DP book was not written. > But I suppose I should just do it first and then try to implement them > myself. OOP is just so mind-bending for me that I've kind of put off > patterns right now until I get more comfortable with it. :) I would suggest getting a good grasp on OOP before you get into design patterns. When most people start with any new concept they tend to try and see everything in terms of their new toy, so sticking to one or two new concepts at a time will make things a little easier. Design patterns are kind of like sarcasm: hard to use well, not always appropriate, and disgustingly bad when applied to problems they are not meant to solve. You will do just fine without them until OOP is at least familiar to you, and by that time you should be a little better able to use them appropriately. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 2.5 licensing: stop this change
I say good riddence. Python's success has always been on its merits as an open source application platform. Corprate usage has always been relatively insignificant, and I suspect that many companies are overrepresenting the level of dependance they have on python in an attempt to steer their competitors into just this kind of open source license trap. I am all for this change. It is about time that free as in beer became a double entendre for python. -- http://mail.python.org/mailman/listinfo/python-list
