[Tutor] print out lines that start with a word
Can anyone tell me what I've done wrong in this script. I'm trying to get only the lines that start with "This" for a text file. Here's what I wrote: >>> import re >>> f = open('c:/lines.txt').readlines() >>> for line in f: match = re.search('^This',f) if line == match: print match here's the error message I got: Traceback (most recent call last): File "", line 2, in -toplevel- match = re.search('^This',f) File "C:\Python24\lib\sre.py", line 134, in search return _compile(pattern, flags).search(string) TypeError: expected string or buffer Thanks in advance __ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] error message
I'm dping something very simple in RE. Lets say I'm trying to match an American Phone number I write the code this way and try to match it: import re string = 'My phone is 410-995-1155' pattern = r'\d{3}-\d{3}-\d{4}' re.match(pattern,string).group() but I get this error message Traceback (most recent call last): File "C:/Python24/findphone", line 4, in -toplevel- re.match(pattern,string).group() AttributeError: 'NoneType' object has no attribute 'group' __ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] writing list to new file
How would I save a list to a new file for example: If line.startswith('XXX'): save list to new file But I get errors saying only stings can be saved this way. __ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Value Error message
Trying to scrape some headlines off a newspaper with this code: import urllib, re pattern = re.compile("""(.*)""", re.DOTALL) page = urllib.urlopen("http://www.startribune.com";).read() for (headline, code, description) in pattern.findall(page): print (headline, code, description) I'm getting the error below and can't find anything in the documentation. Suggestions Traceback (most recent call last): File "C:/Python24/Stribwebscrape.py", line 13, in ? for (headline, code, description) in pattern.findall(page): ValueError: need more than 2 values to unpack __ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Value Error solved. Another question
Ignore my first posting. Here's what I'm trying to do. I want to extract headlines from a newspaper's website using this code. It works, but I want to match the second group in (.*) and print that out. Sugguestions import urllib, re pattern = re.compile("""(.*)""", re.DOTALL) page = urllib.urlopen("http://www.startribune.com";).read() for headline in pattern.findall(page): print headline __ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] count words
I know that you can do this to get a count of home many times a word appears in a file f = open('text.txt').read() print f.count('word') Other than using a several print statments to look for seperate words like this, is there a way to do it so that I get a individual count of each word: word1 xxx word2 xxx words xxx etc. __ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
RE: [Tutor] count words
Thanks to everyone who replied to my post. All of your suggestions seem to work. My thanks Ron --- Ryan Davis <[EMAIL PROTECTED]> wrote: > You could use split() to split the contents of the > file into a list of strings. > > ### > >>> x = 'asdf foo bar foo' > >>> x.split() > ['asdf', 'foo', 'bar', 'foo'] > ### > > Here's one way to iterate over that to get the > counts. I'm sure there are dozens. > ### > >>> x = 'asdf foo bar foo' > >>> counts = {} > >>> for word in x.split(): > ... counts[word] = x.count(word) > ... > >>> counts > {'foo': 2, 'bar': 1, 'asdf': 1} > ### > The dictionary takes care of duplicates. If you are > using a really big file, it might pay to eliminate > duplicates from the list > before running x.count(word) > > Thanks, > Ryan > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Ron > Nixon > Sent: Tuesday, February 15, 2005 11:22 AM > To: tutor@python.org > Subject: [Tutor] count words > > > I know that you can do this to get a count of home > many times a word appears in a file > > > f = open('text.txt').read() > print f.count('word') > > Other than using a several print statments to look > for > seperate words like this, is there a way to do it so > that I get a individual count of each word: > > word1 xxx > word2 xxx > words xxx > > etc. > > > > > > __ > Do you Yahoo!? > Yahoo! Mail - Find what you need with new enhanced > search. > http://info.mail.yahoo.com/mail_250 > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] RE help
Trying to scrape a newspaper site for articles using this code whic ws done with help from the list: import urllib, re pattern = re.compile("""(.*).""", re.DOTALL) page =urllib.urlopen("http://www.startribune.com";).read() for headline, body in pattern.findall(page): print body It should grab articles from this: Sid Hartman: Franchise could be movedIf Reggie Fowler and his business partners from New Jersey are approved to buy the Vikings franchise from Red McCombs, it is my opinion the franchise remains in danger of eventually being relocated. and give me this: Sid Hartman: Franchise could be movedIf Reggie Fowler and his business partners from New Jersey are approved to buy the Vikings franchise from Red McCombs, it is my opinion the franchise remains in danger of eventually being relocated. Instead it gives me this:Boxerjam. from this : href="http://www.startribune.com/stories/1559/4773140.html";>Boxerjam. I know the re works in other programs I've tried. Is there something different about re's in Python? __ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE help
Problem solved. Thanks --- Kent Johnson <[EMAIL PROTECTED]> wrote: > Try it with non-greedy matches. You are matching > everything from the first > in one match. Also I think you want to escape the . > before (you want just paragraphs that end > in a period?) > > pattern = re.compile(""" href="/(.*?)">(.*?)\.""", re.DOTALL) > > Kent > > Ron Nixon wrote: > > Trying to scrape a newspaper site for articles > using > > this code whic ws done with help from the list: > > > > import urllib, re > > pattern = re.compile(""" > href="/(.*)">(.*).""", re.DOTALL) > > page > > > =urllib.urlopen("http://www.startribune.com";).read() > > > > > for headline, body in pattern.findall(page): > > print body > > > > It should grab articles from this: > > > > Sid > Hartman: > > Franchise could be movedIf Reggie > Fowler > > and his business partners from New Jersey are > approved > > to buy the Vikings franchise from Red McCombs, it > is > > my opinion the franchise remains in danger of > > eventually being relocated. > > > > and give me this: Sid Hartman: Franchise could be > > movedIf Reggie Fowler and his business > > partners from New Jersey are approved to buy the > > Vikings franchise from Red McCombs, it is my > opinion > > the franchise remains in danger of eventually > being > > relocated. > > > > Instead it gives me this:Boxerjam. from > > this : > > > href="http://www.startribune.com/stories/1559/4773140.html";>Boxerjam. > > > > > > I know the re works in other programs I've tried. > Is > > there something different about re's in Python? > > > > > > > > > > > > __ > > Do you Yahoo!? > > Yahoo! Mail - Find what you need with new enhanced > search. > > http://info.mail.yahoo.com/mail_250 > > ___ > > Tutor maillist - Tutor@python.org > > http://mail.python.org/mailman/listinfo/tutor > > > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Better Search and replace method
I'm trying to figure out a better solution to do multiple search and replaces in a text file without having to type: import re s = open('filename') re.sub('vaule1','value2',s) re.sub('vaule3','value4',s) etc I've tried putting all the vaules in a list and doing the replace, but came up short. Any suggestions? Thanks in advance Ron __ Celebrate Yahoo!'s 10th Birthday! Yahoo! Netrospective: 100 Moments of the Web http://birthday.yahoo.com/netrospective/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] re help
The following program takes text data like this: Jimi Hendrix 2100 South Ave Seattle, WA 55408 and changes it to this Jimi Hendrix, 2100 South Ave,Seattle,WA,55488 and writes it to a file. The problem I'm running into is that it only writes this first address to a file and there are several others in the file. I believe it has something to do with using re.search instead of re.findall. But re.findall returns a error when I try using it. Suggestions? Thanks in advance. Here is the script: import re f = open('reformat.txt').read() pat = re.compile(r"([^\r\n]+)\n([^\r\n]*)\n([^\r\n]*) ([^\r\n]*) ([^\r\n]*)") x=re.search(pat,f) name = x.group(1) address = x.group(2) citystate = x.group(3)+x.group(4) zipcd = x.group(5) o= open('reformat1.txt','w') o.write("%s,%s,%s,%s\n" % (name, address, citystate,zipcd)) o.close() print("%s,%s,%s,%s\n" % (name, address, citystate,zipcd)) __ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] re.findall vs. re.search and re.match
Thanks to all who replied to my post earlier on re's. I'm still preplexed by why re.search and re.match works in the code below, but not re.findall. re.findall is suppose to return all non-voerlapping occurances of the pattern that matches in this example, but it returns errors. Looking through the docs and from what I can see it should work. Can anyone provide advice. import re f = open('reformat.txt').read() pat = re.compile(r"([^\r\n]+)\n([^\r\n]*)\n([^\r\n]*) ([^\r\n]*) ([^\r\n]*)") x=re.search(pat,f) name = x.group(1) address = x.group(2) citystate = x.group(3)+x.group(4) zipcd = x.group(5) o= open('reformat1.txt','w') o.write("%s,%s,%s,%s\n" % (name, address, citystate,zipcd)) o.close() print("%s,%s,%s,%s\n" % (name, address, citystate,zipcd)) __ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re.findall vs. re.search and re.match
Kent: The code is below. Here's the error message. Traceback (most recent call last): File "C:/Python24/reformat.py", line 5, in -toplevel- name = x.group(1) AttributeError: 'list' object has no attribute 'group' import re f = open('reformat.txt').read() pat = re.compile(r"([^\r\n]+)\n([^\r\n]*)\n([^\r\n]*) ([^\r\n]*) ([^\r\n]*)") x=re.findall(pat,f) name = x.group(1) address = x.group(2) citystate = x.group(3)+x.group(4) zipcd = x.group(5) o= open('reformat1.txt','w') o.write("%s,%s,%s,%s\n" % (name, address, citystate,zipcd)) o.close() print("%s,%s,%s,%s\n" % (name, address, citystate,zipcd)) --- Kent Johnson <[EMAIL PROTECTED]> wrote: > Ron Nixon wrote: > > Thanks to all who replied to my post earlier on > re's. > > I'm still preplexed by why re.search and re.match > > works in the code below, but not re.findall. > > re.findall is suppose to return all > non-voerlapping > > occurances of the pattern that matches in this > > example, but it returns errors. Looking through > the > > docs and from what I can see it should work. Can > > anyone provide advice. > > Please show the code that uses re.findall() and the > error you receive when you run it. > > Kent > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] CPAN for python
Is there a site like Perl's CPAN for Python? I've seen the stuff at ActiveState. Anything else? Ron Nixon __ Discover Yahoo! Have fun online with music videos, cool games, IM and more. Check it out! http://discover.yahoo.com/online.html ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] python mechanize examples
Anyone have or know where I can find working examples of python's mechanize modules. Try to reverse engineer a script to see how it works. Ron Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new and used cars.___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] urlretrieve
Is there a way to get the urlretrive module to grab multiple files similar to wget? Ron Nixon Expecting? Get great news right away with email Auto-Check. Try the Yahoo! Mail Beta. http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor