At 01:26 PM 12/28/2005, [EMAIL PROTECTED] wrote: >[snip] >I`m trying to make a python script for extracting certain data from HTML >files....Say for example the HTML file has the following format: ><strong>Category:</strong>Category1<br><br> >[...] ><strong>Name:</strong>Filename.exe<br><br> >[...] ><strong>Description:</strong>Description1.<br><br> > >Taking in to account that each HTML file has a load of code in between each >[...], what I want to do is extract the information for each field.In this >case what I want to do is the script to read Category1, filename.exe and >Description1.
Check out BeautifulSoup http://www.crummy.com/software/BeautifulSoup/ >And later on insert this in to a mysql database, or read the >info and generate a CSV file to make db insertion easier. >Since all the files are generated by a script each field I want to read >is,from what I`ve seen, in the same line number so this could make things >easier.But not all fields are of the same length. >I`ve read Chapter 8 of Dive in to Python so I`m basing my work on that. >I also thought regexes might be useful for this but I suck at using regexes >so that`s another problem. >Do any of you have an idea of where I could get a good start on this and if >there`s any modules (like sgmllib.py) that might come in handy for this. >Thanks! > >-- >Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko! >Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner > >_______________________________________________ >Tutor maillist - Tutor@python.org >http://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor