Hello, I`m very new to Python and programming in general.I`ve been reading Dive in to Python as an introduction to the language and I think I`m doing pretty well,but I`m stuck on this problem. I`m trying to make a python script for extracting certain data from HTML files.These files are from a template so they all have the same formatting.I just want to extract the data from certain fields.It would also be nice to insert it into a mysql database, but I`ll leave that for later since I`m stuck in just reading the files. Say for example the HTML file has the following format:
<strong>Category:</strong>Category1<br><br> [...] <strong>Name:</strong>Filename.exe<br><br> [...] <strong>Description:</strong>Description1.<br><br> Taking in to account that each HTML file has a load of code in between each [...], what I want to do is extract the information for each field.In this case what I want to do is the script to read Category1, filename.exe and Description1.And later on insert this in to a mysql database, or read the info and generate a CSV file to make db insertion easier. Since all the files are generated by a script each field I want to read is,from what I`ve seen, in the same line number so this could make things easier.But not all fields are of the same length. I`ve read Chapter 8 of Dive in to Python so I`m basing my work on that. I also thought regexes might be useful for this but I suck at using regexes so that`s another problem. Do any of you have an idea of where I could get a good start on this and if there`s any modules (like sgmllib.py) that might come in handy for this. Thanks! -- Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko! Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor