Re: [Tutor] Extracting data from HTML files

2005-12-31 Thread kevin7kal
my 2 cents: I agree, BeutifulSoup is great for HTML parsing. It's a little weird and unintuitive at first, but once you get it, it's becomes quit an asset. >When you have time, do try going through a few examples with >BeautifulSoup. The web page there comes with some interesting examples, >and

Re: [Tutor] Extracting data from HTML files

2005-12-30 Thread Danny Yoo
> > category = m.group(1) > > > > > > Traceback (most recent call last): > > > File "", line 1, in ? > > > AttributeError: 'NoneType' object has no attribute 'group' > > > > In this case the match failed, so m is None and m.group(1) gives an > >error. > > So my problem is in the actual regex?

Re: [Tutor] Extracting data from HTML files

2005-12-30 Thread Danny Yoo
> > > I also found that on some of the strings I want to extract, when > > > python reads them using file.read(), there are newline characters > > > and other stuff that doesn`t show up in the actual html source. > > > > Not certain that I understand what you mean there. Can you show us? > > read(

Re: [Tutor] Extracting data from HTML files

2005-12-30 Thread Oswaldo Martinez
Kent Johnson wrote: > > > import re > file = open("file1.html") > data = file.read() > catRe = re.compile(r'Title:(.*?)') > > Thi regex does not agree with the data you originally posted. Your > original data was > Category:Category1 > > Do you see the difference? Your regex has

Re: [Tutor] Extracting data from HTML files

2005-12-30 Thread Oswaldo Martinez
> From: Danny Yoo <[EMAIL PROTECTED]> > [...] > The Regular Expression HOWTO itself is pretty good and talks about some of > the stuff you've been running into, so here's a link to the base url that > you may want to look at: > > http://www.amk.ca/python/howto/regex/ Ah yes I`ve been readin

Re: [Tutor] Extracting data from HTML files

2005-12-30 Thread Kent Johnson
Oswaldo Martinez wrote: > OK before I got in to the loop in the script I decided to try first with one > file and I have some doubts with the some parts in the script,plus I got an > error: > > import re file = open("file1.html") data = file.read() catRe = re.compile(r'Title:(.*?)

Re: [Tutor] Extracting data from HTML files

2005-12-29 Thread Danny Yoo
> >>> import re > >>> file = open("file1.html") > >>> data = file.read() > >>> catRe = re.compile(r'Title:(.*?)') > > # I searched around the docs on regexes I have and found that the "r" > # after the re.compile(' will detect repeating words. Hi Oswaldo, Actually, no. What you're seeing is a "

Re: [Tutor] Extracting data from HTML files

2005-12-29 Thread Oswaldo Martinez
these in to account in the regex or will it automatically include them? > --- Ursprüngliche Nachricht --- > Von: Kent Johnson <[EMAIL PROTECTED]> > An: Python Tutor > Betreff: Re: [Tutor] Extracting data from HTML files > Datum: Thu, 29 Dec 2005 14:18:38 -0500 > > Try so

Re: [Tutor] Extracting data from HTML files

2005-12-29 Thread Kent Johnson
[EMAIL PROTECTED] wrote: > The HTML comes from a bunch of files which are saved in my computer.They > were generated by a php script and I want to extract certain fields for > insertion in to a MySQL db. > I`m trying to get the hang of correctly opening the files first :) > There are about a thousa

Re: [Tutor] Extracting data from HTML files

2005-12-29 Thread motorolaguy
loop in the script since the files are named article1.html,article2.html,etc. Thanks for the help! > --- Ursprüngliche Nachricht --- > Von: Kent Johnson <[EMAIL PROTECTED]> > An: unknown > Kopie: tutor@python.org > Betreff: Re: [Tutor] Extracting data from HTML files > Datum

Re: [Tutor] Extracting data from HTML files

2005-12-29 Thread Kent Johnson
[EMAIL PROTECTED] wrote: > I`ll also take a look at regexes as recommended by Kent Johnson to see if > it`ll work here.My guess is this is the way to go since the data I need is > always in the same line number in the HTML source.So I could just go to the > specific line numbers, look for my data a

Re: [Tutor] Extracting data from HTML files

2005-12-28 Thread motorolaguy
tips they are more than welcome :) Thanks again and happy holidays! > --- Ursprüngliche Nachricht --- > Von: Kent Johnson <[EMAIL PROTECTED]> > An: Python Tutor > Betreff: Re: [Tutor] Extracting data from HTML files > Datum: Wed, 28 Dec 2005 22:16:47 -0500 > > [EMAI

Re: [Tutor] Extracting data from HTML files

2005-12-28 Thread Kent Johnson
[EMAIL PROTECTED] wrote: > I`m trying to make a python script for extracting certain data from HTML > files.These files are from a template so they all have the same formatting.I > just want to extract the data from certain fields.It would also be nice to > insert it into a mysql database, but I`ll

Re: [Tutor] Extracting data from HTML files

2005-12-28 Thread bob
At 01:26 PM 12/28/2005, [EMAIL PROTECTED] wrote: >[snip] >I`m trying to make a python script for extracting certain data from HTML >filesSay for example the HTML file has the following format: >Category:Category1 >[...] >Name:Filename.exe >[...] >Description:Description1. > >Taking in to accoun