from:"ajones"

Re: Fuzzy Lookups

2006-01-30 Thread ajones


BBands wrote:
> I have some CDs and have been archiving them on a PC. I wrote a Python
> script that spans the archive and returns a list of its contents:
> [[genre, artist, album, song]...]. I wanted to add a search function to
> locate all the versions of a particular song. This is harder than you
> might think. For example the Cajun "national anthem" is Jolie Blond,
> but it can be spelled several different ways jolie, joli, blon, blond,
> etc... In addition the various online services that provide song info
> are riddled with typos, so an ordinary string match just doesn't get
> you there. What is needed is a fuzzy string match and it turns out that
> there is a very good one, the Levenshtein distance, which is the number
> of inserts, deletions and substitutions needed to morph one string into
> another. In my application I match the desired song title against all
> song titles in my list and return the ones with the lowest Levenshtein
> distances. This is remarkably, one might even say stunningly,
> effective, easily finding all the version of Jolie Blon in the list.
>
> I am using the following snippet (found on the web, proper attribution
> unsure), which calculates the Levenshtein distance.
>
> def distance(a,b):
> c = {}
> n = len(a); m = len(b)
>
> for i in range(0,n+1):
> c[i,0] = i
> for j in range(0,m+1):
> c[0,j] = j
>
> for i in range(1,n+1):
> for j in range(1,m+1):
> x = c[i-1,j]+1
> y = c[i,j-1]+1
> if a[i-1] == b[j-1]:
> z = c[i-1,j-1]
> else:
> z = c[i-1,j-1]+1
> c[i,j] = min(x,y,z)
> return c[n,m]
>
> As mentioned above this works quite well and I am happy with it, but I
> wonder if there is a more Pythonic way of doing this type of lookup?
>
> jab

Here is my stab at it, didn't fully test it so it may not work
correctly. The tuples could be avoided by using len(x), but the extra
lookups may cost in execution speed[1].

def distance(a, b):
"""
Computes the levenshtein distance between two strings
"""
m, n = (len(a),a), (len(b),b)
if(m[0] < n[0]):#ensure that the 'm' tuple holds
the longest string
m, n = n, m
dist = m[0] #assume distance = length of
longest string (worst case)
for i in range(0, n[0]):   # reduce the distance for each char
match in shorter string
if m[1][i] == n[1][i]:
dist = dist - 1
return dist

[1] I have no if this is true or not. I can see arguments for both.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Too Many if Statements?

2006-02-09 Thread ajones

slogging_away wrote:
> Terry Reedy wrote:
>
> > The OP did not specify whether all of his if-tests were sequential as in
> > your test or if some were nested.  I vaguely remember there being an indent
> > limit (40??).
>
> Most of the if statements are nested.  Almost all of them fall under a
> central 'for xxx in range(x,x,x)', (this is the statement that checks
> thorugh each of the saved configuration files).   Under that 'for'
> statment are the bulk of the 'if' statements - some nested and some not
> - some also fall under other 'for' statements.  The indent level does
> not exceed 10..
>

Has anyone considered that this may be part of the issue? If he is
stepping through a range this is not just X if statements but n * x
where n is the number of loops. Possibly some variables that are not
getting freed between loops? (my guess would be that it is related to
logging) Anyways, no expert here, just wanted to point that out.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: remote module importing (urlimport)

2006-02-24 Thread ajones

What plans do you have for security in this? I would think that in
order to trust this over the network you would at least need a
certificate identifying the server as well as some method of verifying
package contents.

Either way, cool stuff.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: do design patterns still apply with Python?

2006-03-02 Thread ajones

John Salerno wrote:
> Yeah, that's what I was wondering. I wonder if, after reading a DP book,
> I might have to 'unlearn' some things when applying them to Python.

I would say adjust instead of unlearn. This is probably true to a
lesser or greater extent of any language for which your DP book was not
written.

> But I suppose I should just do it first and then try to implement them
> myself. OOP is just so mind-bending for me that I've kind of put off
> patterns right now until I get more comfortable with it.  :)

I would suggest getting a good grasp on OOP before you get into design
patterns. When most people start with any new concept they tend to try
and see everything in terms of their new toy, so sticking to one or two
new concepts at a time will make things a little easier.

Design patterns are kind of like sarcasm: hard to use well, not always
appropriate, and disgustingly bad when applied to problems they are not
meant to solve. You will do just fine without them until OOP is at
least familiar to you, and by that time you should be a little better
able to use them appropriately.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python 2.5 licensing: stop this change

2006-04-01 Thread ajones

I say good riddence. Python's success has always been on its merits as
an open source application platform. Corprate usage has always been
relatively insignificant, and I suspect that many companies are
overrepresenting the level of dependance they have on python in an
attempt to steer their competitors into just this kind of open source
license trap.

I am all for this change. It is about time that free as in beer became
a double entendre for python.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Fuzzy Lookups

Re: Too Many if Statements?

Re: remote module importing (urlimport)

Re: do design patterns still apply with Python?

Re: Python 2.5 licensing: stop this change

5 matches

Site Navigation

Mail list logo

Footer information