On Mon, Jul 2, 2012 at 10:03 AM, Flynn, Stephen (L & P - IT) <steve.fl...@capita.co.uk> wrote: > Tutors, > > Whilst having a play around with reading in textfiles and reformatting them I > tried to write a python 3.2 script to read a CSV file, looking for any > records which were short (indicating that the data may well contain an > embedded CR/LF. I've attached a small sample file with a "split record" at > line 3, and my code. > > Call the code with > > Python pipesmoker.py MyFile.txt , > > (first parameter is the file being read, second parameter is the field > separator... a comma in this case) > > I can read the file in, I can determine that I'm looking for records which > have 13 fields and I can find a record which is too short (line 3). > > What I can't do is read the successive line to a short line in order to > append it onto the end of short line before writing the entire amended line > out. I'm still thinking about how to persuade the fileinput module to leap > over the successor line so it doesn't get processed again. > > When I run the code as it stands, I get a traceback as I'm obviously not > using fileinput.FileInput.readline() correctly. > > value of file is C:\myfile.txt > value of the delimiter is , > I'm looking for 13 , in each currentLine... > "1","0000000688 ","ABCD","930020854","34","0","1"," ","930020854 "," > ","0","0","0","0" > > "2","0000000688 ","ABCD","930020854","99","0","1"," ","930020854 "," > ","0","0","0","0" > > short line found at line 3 > Traceback (most recent call last): > File "C:\Documents and > Settings\flynns\workspace\PipeSmoker\src\pipesmoker\pipesmoker.py", line 35, > in <module> > nextLine = fileinput.FileInput.readline(args.file) > File "C:\Python32\lib\fileinput.py", line 301, in readline > line = self._buffer[self._bufindex] > AttributeError: 'str' object has no attribute '_buffer' > > > Can someone explain to me how I am supposed to make use of readline() to grab > the next line of a text file please? It may be that I should be using some > other module, but chose fileinput as I was hoping to make the little routine > as generic as possible; able to spot short lines in tab separated, comma > separated, pipe separated, ^~~^ separated and anything else which my clients > feel like sending me. > Take a look at csvreader http://docs.python.org/library/csv.html#csv.reader. It comes with python, and according to the text near this link, it will handle a situation where EOL characters are contained in quoted fields. Will that help you?
-- Joel Goldstick _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor