[EMAIL PROTECTED] wrote: > I`m trying to make a python script for extracting certain data from HTML > files.These files are from a template so they all have the same formatting.I > just want to extract the data from certain fields.It would also be nice to > insert it into a mysql database, but I`ll leave that for later since I`m > stuck in just reading the files. > Say for example the HTML file has the following format: > > <strong>Category:</strong>Category1<br><br> > [...] > <strong>Name:</strong>Filename.exe<br><br> > [...] > <strong>Description:</strong>Description1.<br><br>
Since your data is all in the same form, I think a regex will easily find this data. Something like import re catRe = re.compile(r'<strong>Category:</strong>(.*?)<br><br>') data = ...read the HTML file here m = catRe.search(data) category = m.group(1) > I also thought regexes might be useful for this but I suck at using regexes > so that`s another problem. Regexes take some effort to learn but it is worth it, they are a very useful tool in many contexts, not just Python. Have you read the regex HOW-TO? http://www.amk.ca/python/howto/regex/ Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor