Hi
I have a problem with a csv file from a supplier, so they export data to csv
however the last column in the record is a description which is marked up with
html.
trying to automate the processing of this csv to upload elsewhere in a useable
format. If i open the csv with csved it looks like all the records aren't
escaped correctly as after a while i find html tags and text on the next
line/record.
If I 'openwith' excel the description stays on the correct line/record?
I want to use python to read these records in and output a valid csv with the
descriptions intact preferably without the html tags so a string of text
formatted with newline/CR where appropriate.
So far I have this but don't know where to go from here can someone help me?
import csv
infile = open('c:\data\input.csv', 'rb')
outfile = open('c:\data\output.csv', 'wb')
reader = csv.reader(infile)
writer = csv.writer(outfile)
for line in reader:
print line
writer.writerow(line)
The input.csv is set out as follows;
HEADER ROW 1st
"FileDate","ProductID","Name","StandardPrice","DropshipPrice","SRP","Brand","Xline","InStock","Stock","Barcode","Weight","CategoryID","Category","SmallImage","LargeImage","Description"
A COMPLETE RECORD LOOKS LIKE THIS WITH THE DESCRIPTION FIELD POPULATED SOME
RECORDS DON'T HAVE THE DESCRIPTION FIELD POPULATED
"2012-11-01T18:28:45.25+00:00","10198","(Venom) PS2 DVD Remote Control
(Black)","3.7800","4.3500","12.9800","Venom","true","In
Stock","1","5031300025009","200","1339","PC/Games_Console / Playstation / PS2 /
Remote
Controls","http://www.atssitecentre.co.uk/images/products/1/10198.gif","http://www.atssitecentre.co.uk/images/products/1/10198f.jpg","Never
have to unplug your joypad / DVD user friendly / Works up to 30 feet from PS/2
/ IR wireless technology."
THIS IS AN EXAMPLE OF THE BAD RECORD CAUSING PROBLEMS, THE DESCRIPTION FIELD
STARTS ""features:
AS YOU CAN SEE CONTAINS HTML BUT BECAUSE OF THIS FORMAT SUBSEQUENT HTML TAGS ARE
ADDED AS NEW RECORDS ON SUBSEQUENT LINES.
"2012-11-01T18:28:45.25+00:00","6","3.5 inch Disk Drive
Lock","2.9500","2.9500","9.9500","None","true","In
Stock","3","077511994166","131","1332","PC/Games_Console / PC Security /
General","http://www.atssitecentre.co.uk/images/products/11000/6.gif","http://www.atssitecentre.co.uk/images/products/11000/6f.jpg","features:
3 1/2" FDD Lock.
Die casting housing and cylinder chrome plated.
Lock Cover : PBT + GF 15%. (PLASTIC)
2 Keys supplied per lock.
"
I know I am far from complete but don't know how to proceed :-)
As I said I want to reconstruct a clean record either strip out the html tags
or
at least escape the records appropriately..
Thanks all
--
http://mail.python.org/mailman/listinfo/python-list