text file reformatting
I have this fixed width data file (data.txt) which I would like to reformat. Data is something like this, with hundreds of rows and columns, every row finishes to END: PRJ01001 4 00100END PRJ01002 3 00110END PRJ01003 3 00120END PRJ01004 2 00130END PRJ01005 1 00140END PRJ01006 1 00150END PRJ01007 3 00160END I would like to pick only some columns to a new file and put them to a certain places (to match previous data) - definition file (def.csv) could be something like this: VARIABLEFIELDSTARTS FIELD SIZE NEW PLACE IN NEW DATA FILE ProjID ; 1 ; 5 ; 1 CaseID ; 6 ; 3 ; 10 UselessV ; 10 ; 1 ; Zipcode ; 12 ; 5 ; 15 So the new datafile should look like this: PRJ01001 00100END PRJ01002 00110END PRJ01003 00120END PRJ01004 00130END PRJ01005 00140END PRJ01006 00150END PRJ01007 00160END If the datafile was as narrow as in this demofile, I would simply use notepad to fix this, but there are more than 100 variables I should pick and replace... I'm missing the logic here - how it should be done? -- http://mail.python.org/mailman/listinfo/python-list
Re: text file reformatting
On 31 loka, 21:48, Tim Chase wrote:
> > PRJ01001 4 00100END
> > PRJ01002 3 00110END
>
> > I would like to pick only some columns to a new file and put them to a
> > certain places (to match previous data) - definition file (def.csv)
> > could be something like this:
>
> > VARIABLE FIELDSTARTS FIELD SIZE NEW PLACE IN NEW DATA FILE
> > ProjID ; 1 ; 5 ; 1
> > CaseID ; 6 ; 3 ; 10
> > UselessV ; 10 ; 1 ;
> > Zipcode ; 12 ; 5 ; 15
>
> > So the new datafile should look like this:
>
> > PRJ01 001 00100END
> > PRJ01 002 00110END
>
> How flexible is the def.csv format? The difficulty I see with
> your def.csv format is that it leaves undefined gaps (presumably
> to be filled in with spaces) and that you also have a blank "new
> place in new file" value. If instead, you could specify the
> width to which you want to pad it and omit variables you don't
> want in the output, ordering the variables in the same order you
> want them in the output:
>
> Variable; Start; Size; Width
> ProjID; 1; 5; 10
> CaseID; 6; 3; 10
> Zipcode; 12; 5; 5
> End; 16; 3; 3
>
> (note that I lazily use the same method to copy the END from the
> source to the destination, rather than coding specially for it)
> you could do something like this (untested)
>
> import csv
> f = file('def.csv', 'rb')
> f.next() # discard the header row
> r = csv.reader(f, delimiter=';')
> fields = [
> (varname, slice(int(start), int(start)+int(size)), width)
> for varname, start, size, width
> in r
> ]
> f.close()
> out = file('out.txt', 'w')
> try:
> for row in file('data.txt'):
> for varname, slc, width in fields:
> out.write(row[slc].ljust(width))
> out.write('\n')
> finally:
> out.close()
>
> Hope that's fairly easy to follow and makes sense. There might
> be some fence-posting errors (particularly your use of "1" as the
> initial offset, while python uses "0" as the initial offset for
> strings)
>
> If you can't modify the def.csv format, then things are a bit
> more complex and I'd almost be tempted to write a script to try
> and convert your existing def.csv format into something simpler
> to process like what I describe.
>
> -tkc- Piilota siteerattu teksti -
>
> - Näytä siteerattu teksti -
Hi,
Thanks for your reply.
Def.csv could be modified so that every line has the same structure:
variable name, field start, field size and new place and would be
separated with semicolomns as you mentioned.
I tried your script (which seems quite logical) but I get this
Traceback (most recent call last):
File "testing.py", line 16, in
out.write (row[slc].ljust(width))
TypeError: an integer is required
Yes - you said it was untested, but I can't figure out how to
proceed...
--
http://mail.python.org/mailman/listinfo/python-list
Re: text file reformatting
On 1 marras, 09:59, "[email protected]" wrote: > On Oct 31, 11:46 pm, iwawi wrote: > > > > > > > On 31 loka, 21:48, Tim Chase wrote: > > > > > PRJ01001 4 00100END > > > > PRJ01002 3 00110END > > > > > I would like to pick only some columns to a new file and put them to a > > > > certain places (to match previous data) - definition file (def.csv) > > > > could be something like this: > > > > > VARIABLE FIELDSTARTS FIELD SIZE NEW PLACE IN NEW DATA FILE > > > > ProjID ; 1 ; 5 ; 1 > > > > CaseID ; 6 ; 3 ; 10 > > > > UselessV ; 10 ; 1 ; > > > > Zipcode ; 12 ; 5 ; 15 > > > > > So the new datafile should look like this: > > > > > PRJ01 001 00100END > > > > PRJ01 002 00110END > > > > How flexible is the def.csv format? The difficulty I see with > > > your def.csv format is that it leaves undefined gaps (presumably > > > to be filled in with spaces) and that you also have a blank "new > > > place in new file" value. If instead, you could specify the > > > width to which you want to pad it and omit variables you don't > > > want in the output, ordering the variables in the same order you > > > want them in the output: > > > > Variable; Start; Size; Width > > > ProjID; 1; 5; 10 > > > CaseID; 6; 3; 10 > > > Zipcode; 12; 5; 5 > > > End; 16; 3; 3 > > > > (note that I lazily use the same method to copy the END from the > > > source to the destination, rather than coding specially for it) > > > you could do something like this (untested) > > > > import csv > > > f = file('def.csv', 'rb') > > > f.next() # discard the header row > > > r = csv.reader(f, delimiter=';') > > > fields = [ > > > (varname, slice(int(start), int(start)+int(size)), width) > > > for varname, start, size, width > > > in r > > > ] > > > f.close() > > > out = file('out.txt', 'w') > > > try: > > > for row in file('data.txt'): > > > for varname, slc, width in fields: > > > out.write(row[slc].ljust(width)) > > > out.write('\n') > > > finally: > > > out.close() > > > > Hope that's fairly easy to follow and makes sense. There might > > > be some fence-posting errors (particularly your use of "1" as the > > > initial offset, while python uses "0" as the initial offset for > > > strings) > > > > If you can't modify the def.csv format, then things are a bit > > > more complex and I'd almost be tempted to write a script to try > > > and convert your existing def.csv format into something simpler > > > to process like what I describe. > > > > -tkc- Piilota siteerattu teksti - > > > > - Näytä siteerattu teksti - > > > Hi, > > > Thanks for your reply. > > > Def.csv could be modified so that every line has the same structure: > > variable name, field start, field size and new place and would be > > separated with semicolomns as you mentioned. > > > I tried your script (which seems quite logical) but I get this > > > Traceback (most recent call last): > > File "testing.py", line 16, in > > out.write (row[slc].ljust(width)) > > TypeError: an integer is required > > > Yes - you said it was untested, but I can't figure out how to > > proceed... > > The line > > (varname, slice(int(start), int(start)+int(size)), width) > > should instead be > > (varname, slice(int(start), int(start)+int(size)), int(width)) > > although you give an example where there is no width - what does that > imply? In the above case, it will throw an exception. > > Anyway, I think you'll find there's something a bit off in the output > loop with the parameter passed to ljust() as well. The value given in > your csv seems to be the absolute position, but as it's implemented by > Tim, it acts as the relative position. > > Given Tim's parsing into the list fields, I have a feeling that what > you really want instead of > > for varname, slc, width in fields: > out.write(row[slc].ljust(width)) > out.write
Re: text file reformatting
On Nov 1, 6:50 pm, "[email protected]" wrote: > On Nov 1, 1:58 am, iwawi wrote: > > > > > > > On 1 marras, 09:59, "[email protected]" > > > wrote: > > > On Oct 31, 11:46 pm, iwawi wrote: > > > > > On 31 loka, 21:48, Tim Chase wrote: > > > > > > > PRJ01001 4 00100END > > > > > > PRJ01002 3 00110END > > > > > > > I would like to pick only some columns to a new file and put them > > > > > > to a > > > > > > certain places (to match previous data) - definition file (def.csv) > > > > > > could be something like this: > > > > > > > VARIABLE FIELDSTARTS FIELD SIZE NEW PLACE IN NEW DATA > > > > > > FILE > > > > > > ProjID ; 1 ; 5 ; 1 > > > > > > CaseID ; 6 ; 3 ; 10 > > > > > > UselessV ; 10 ; 1 ; > > > > > > Zipcode ; 12 ; 5 ; 15 > > > > > > > So the new datafile should look like this: > > > > > > > PRJ01 001 00100END > > > > > > PRJ01 002 00110END > > > > > > How flexible is the def.csv format? The difficulty I see with > > > > > your def.csv format is that it leaves undefined gaps (presumably > > > > > to be filled in with spaces) and that you also have a blank "new > > > > > place in new file" value. If instead, you could specify the > > > > > width to which you want to pad it and omit variables you don't > > > > > want in the output, ordering the variables in the same order you > > > > > want them in the output: > > > > > > Variable; Start; Size; Width > > > > > ProjID; 1; 5; 10 > > > > > CaseID; 6; 3; 10 > > > > > Zipcode; 12; 5; 5 > > > > > End; 16; 3; 3 > > > > > > (note that I lazily use the same method to copy the END from the > > > > > source to the destination, rather than coding specially for it) > > > > > you could do something like this (untested) > > > > > > import csv > > > > > f = file('def.csv', 'rb') > > > > > f.next() # discard the header row > > > > > r = csv.reader(f, delimiter=';') > > > > > fields = [ > > > > > (varname, slice(int(start), int(start)+int(size)), width) > > > > > for varname, start, size, width > > > > > in r > > > > > ] > > > > > f.close() > > > > > out = file('out.txt', 'w') > > > > > try: > > > > > for row in file('data.txt'): > > > > > for varname, slc, width in fields: > > > > > out.write(row[slc].ljust(width)) > > > > > out.write('\n') > > > > > finally: > > > > > out.close() > > > > > > Hope that's fairly easy to follow and makes sense. There might > > > > > be some fence-posting errors (particularly your use of "1" as the > > > > > initial offset, while python uses "0" as the initial offset for > > > > > strings) > > > > > > If you can't modify the def.csv format, then things are a bit > > > > > more complex and I'd almost be tempted to write a script to try > > > > > and convert your existing def.csv format into something simpler > > > > > to process like what I describe. > > > > > > -tkc- Piilota siteerattu teksti - > > > > > > - Näytä siteerattu teksti - > > > > > Hi, > > > > > Thanks for your reply. > > > > > Def.csv could be modified so that every line has the same structure: > > > > variable name, field start, field size and new place and would be > > > > separated with semicolomns as you mentioned. > > > > > I tried your script (which seems quite logical) but I get this > > > > > Traceback (most recent call last): > > > > File "testing.py", line 16, in > > > > out.write (row[slc].ljust(width)) > > > > TypeError: an integer is required > > > > > Yes - you said it was untested, but I can't figure out
