text file reformatting

2010-10-31 Thread iwawi
I have this fixed width data file (data.txt) which I would like to
reformat. Data is something like this, with hundreds of rows and
columns, every row finishes to END:

PRJ01001 4 00100END
PRJ01002 3 00110END
PRJ01003 3 00120END
PRJ01004 2 00130END
PRJ01005 1 00140END
PRJ01006 1 00150END
PRJ01007 3 00160END

I would like to pick only some columns to a new file and put them to a
certain places (to match previous data) - definition file (def.csv)
could be something like this:

VARIABLEFIELDSTARTS FIELD SIZE  NEW PLACE IN NEW DATA FILE
ProjID  ;   1   ;   5   ;   1
CaseID  ;   6   ;   3   ;   10
UselessV  ; 10  ;   1   ;
Zipcode ;   12  ;   5   ;   15

So the new datafile should look like this:

PRJ01001   00100END
PRJ01002   00110END
PRJ01003   00120END
PRJ01004   00130END
PRJ01005   00140END
PRJ01006   00150END
PRJ01007   00160END

If the datafile was as narrow as in this demofile, I would simply use
notepad to fix this, but there are more
than 100 variables I should pick and replace...

I'm missing the logic here - how it should be done?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: text file reformatting

2010-10-31 Thread iwawi
On 31 loka, 21:48, Tim Chase  wrote:
> > PRJ01001 4 00100END
> > PRJ01002 3 00110END
>
> > I would like to pick only some columns to a new file and put them to a
> > certain places (to match previous data) - definition file (def.csv)
> > could be something like this:
>
> > VARIABLE   FIELDSTARTS     FIELD SIZE      NEW PLACE IN NEW DATA FILE
> > ProjID     ;       1       ;       5       ;       1
> > CaseID     ;       6       ;       3       ;       10
> > UselessV  ;        10      ;       1       ;
> > Zipcode    ;       12      ;       5       ;       15
>
> > So the new datafile should look like this:
>
> > PRJ01    001       00100END
> > PRJ01    002       00110END
>
> How flexible is the def.csv format?  The difficulty I see with
> your def.csv format is that it leaves undefined gaps (presumably
> to be filled in with spaces) and that you also have a blank "new
> place in new file" value.  If instead, you could specify the
> width to which you want to pad it and omit variables you don't
> want in the output, ordering the variables in the same order you
> want them in the output:
>
>   Variable; Start; Size; Width
>   ProjID; 1; 5; 10
>   CaseID; 6; 3; 10
>   Zipcode; 12; 5; 5
>   End; 16; 3; 3
>
> (note that I lazily use the same method to copy the END from the
> source to the destination, rather than coding specially for it)
> you could do something like this (untested)
>
>    import csv
>    f = file('def.csv', 'rb')
>    f.next() # discard the header row
>    r = csv.reader(f, delimiter=';')
>    fields = [
>      (varname, slice(int(start), int(start)+int(size)), width)
>      for varname, start, size, width
>      in r
>      ]
>    f.close()
>    out = file('out.txt', 'w')
>    try:
>      for row in file('data.txt'):
>        for varname, slc, width in fields:
>          out.write(row[slc].ljust(width))
>        out.write('\n')
>    finally:
>      out.close()
>
> Hope that's fairly easy to follow and makes sense.  There might
> be some fence-posting errors (particularly your use of "1" as the
> initial offset, while python uses "0" as the initial offset for
> strings)
>
> If you can't modify the def.csv format, then things are a bit
> more complex and I'd almost be tempted to write a script to try
> and convert your existing def.csv format into something simpler
> to process like what I describe.
>
> -tkc- Piilota siteerattu teksti -
>
> - Näytä siteerattu teksti -

Hi,

Thanks for your reply.

Def.csv could be modified so that every line has the same structure:
variable name, field start, field size and new place and would be
separated with semicolomns as you mentioned.

I tried your script (which seems quite logical) but I get this

Traceback (most recent call last):
  File "testing.py", line 16, in 
out.write (row[slc].ljust(width))
TypeError: an integer is required

Yes - you said it was untested, but I can't figure out how to
proceed...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: text file reformatting

2010-11-01 Thread iwawi
On 1 marras, 09:59, "[email protected]"
 wrote:
> On Oct 31, 11:46 pm, iwawi  wrote:
>
>
>
>
>
> > On 31 loka, 21:48, Tim Chase  wrote:
>
> > > > PRJ01001 4 00100END
> > > > PRJ01002 3 00110END
>
> > > > I would like to pick only some columns to a new file and put them to a
> > > > certain places (to match previous data) - definition file (def.csv)
> > > > could be something like this:
>
> > > > VARIABLE   FIELDSTARTS     FIELD SIZE      NEW PLACE IN NEW DATA FILE
> > > > ProjID     ;       1       ;       5       ;       1
> > > > CaseID     ;       6       ;       3       ;       10
> > > > UselessV  ;        10      ;       1       ;
> > > > Zipcode    ;       12      ;       5       ;       15
>
> > > > So the new datafile should look like this:
>
> > > > PRJ01    001       00100END
> > > > PRJ01    002       00110END
>
> > > How flexible is the def.csv format?  The difficulty I see with
> > > your def.csv format is that it leaves undefined gaps (presumably
> > > to be filled in with spaces) and that you also have a blank "new
> > > place in new file" value.  If instead, you could specify the
> > > width to which you want to pad it and omit variables you don't
> > > want in the output, ordering the variables in the same order you
> > > want them in the output:
>
> > >   Variable; Start; Size; Width
> > >   ProjID; 1; 5; 10
> > >   CaseID; 6; 3; 10
> > >   Zipcode; 12; 5; 5
> > >   End; 16; 3; 3
>
> > > (note that I lazily use the same method to copy the END from the
> > > source to the destination, rather than coding specially for it)
> > > you could do something like this (untested)
>
> > >    import csv
> > >    f = file('def.csv', 'rb')
> > >    f.next() # discard the header row
> > >    r = csv.reader(f, delimiter=';')
> > >    fields = [
> > >      (varname, slice(int(start), int(start)+int(size)), width)
> > >      for varname, start, size, width
> > >      in r
> > >      ]
> > >    f.close()
> > >    out = file('out.txt', 'w')
> > >    try:
> > >      for row in file('data.txt'):
> > >        for varname, slc, width in fields:
> > >          out.write(row[slc].ljust(width))
> > >        out.write('\n')
> > >    finally:
> > >      out.close()
>
> > > Hope that's fairly easy to follow and makes sense.  There might
> > > be some fence-posting errors (particularly your use of "1" as the
> > > initial offset, while python uses "0" as the initial offset for
> > > strings)
>
> > > If you can't modify the def.csv format, then things are a bit
> > > more complex and I'd almost be tempted to write a script to try
> > > and convert your existing def.csv format into something simpler
> > > to process like what I describe.
>
> > > -tkc- Piilota siteerattu teksti -
>
> > > - Näytä siteerattu teksti -
>
> > Hi,
>
> > Thanks for your reply.
>
> > Def.csv could be modified so that every line has the same structure:
> > variable name, field start, field size and new place and would be
> > separated with semicolomns as you mentioned.
>
> > I tried your script (which seems quite logical) but I get this
>
> > Traceback (most recent call last):
> >   File "testing.py", line 16, in 
> >     out.write (row[slc].ljust(width))
> > TypeError: an integer is required
>
> > Yes - you said it was untested, but I can't figure out how to
> > proceed...
>
> The line
>
>     (varname, slice(int(start), int(start)+int(size)), width)
>
> should instead be
>
>     (varname, slice(int(start), int(start)+int(size)), int(width))
>
> although you give an example where there is no width - what does that
> imply? In the above case, it will throw an exception.
>
> Anyway, I think you'll find there's something a bit off in the output
> loop with the parameter passed to ljust() as well. The value given in
> your csv seems to be the absolute position, but as it's implemented by
> Tim, it acts as the relative position.
>
> Given Tim's parsing into the list fields, I have a feeling that what
> you really want instead of
>
>     for varname, slc, width in fields:
>         out.write(row[slc].ljust(width))
>     out.write

Re: text file reformatting

2010-11-02 Thread iwawi
On Nov 1, 6:50 pm, "[email protected]"
 wrote:
> On Nov 1, 1:58 am, iwawi  wrote:
>
>
>
>
>
> > On 1 marras, 09:59, "[email protected]"
>
> >  wrote:
> > > On Oct 31, 11:46 pm, iwawi  wrote:
>
> > > > On 31 loka, 21:48, Tim Chase  wrote:
>
> > > > > > PRJ01001 4 00100END
> > > > > > PRJ01002 3 00110END
>
> > > > > > I would like to pick only some columns to a new file and put them 
> > > > > > to a
> > > > > > certain places (to match previous data) - definition file (def.csv)
> > > > > > could be something like this:
>
> > > > > > VARIABLE   FIELDSTARTS     FIELD SIZE      NEW PLACE IN NEW DATA 
> > > > > > FILE
> > > > > > ProjID     ;       1       ;       5       ;       1
> > > > > > CaseID     ;       6       ;       3       ;       10
> > > > > > UselessV  ;        10      ;       1       ;
> > > > > > Zipcode    ;       12      ;       5       ;       15
>
> > > > > > So the new datafile should look like this:
>
> > > > > > PRJ01    001       00100END
> > > > > > PRJ01    002       00110END
>
> > > > > How flexible is the def.csv format?  The difficulty I see with
> > > > > your def.csv format is that it leaves undefined gaps (presumably
> > > > > to be filled in with spaces) and that you also have a blank "new
> > > > > place in new file" value.  If instead, you could specify the
> > > > > width to which you want to pad it and omit variables you don't
> > > > > want in the output, ordering the variables in the same order you
> > > > > want them in the output:
>
> > > > >   Variable; Start; Size; Width
> > > > >   ProjID; 1; 5; 10
> > > > >   CaseID; 6; 3; 10
> > > > >   Zipcode; 12; 5; 5
> > > > >   End; 16; 3; 3
>
> > > > > (note that I lazily use the same method to copy the END from the
> > > > > source to the destination, rather than coding specially for it)
> > > > > you could do something like this (untested)
>
> > > > >    import csv
> > > > >    f = file('def.csv', 'rb')
> > > > >    f.next() # discard the header row
> > > > >    r = csv.reader(f, delimiter=';')
> > > > >    fields = [
> > > > >      (varname, slice(int(start), int(start)+int(size)), width)
> > > > >      for varname, start, size, width
> > > > >      in r
> > > > >      ]
> > > > >    f.close()
> > > > >    out = file('out.txt', 'w')
> > > > >    try:
> > > > >      for row in file('data.txt'):
> > > > >        for varname, slc, width in fields:
> > > > >          out.write(row[slc].ljust(width))
> > > > >        out.write('\n')
> > > > >    finally:
> > > > >      out.close()
>
> > > > > Hope that's fairly easy to follow and makes sense.  There might
> > > > > be some fence-posting errors (particularly your use of "1" as the
> > > > > initial offset, while python uses "0" as the initial offset for
> > > > > strings)
>
> > > > > If you can't modify the def.csv format, then things are a bit
> > > > > more complex and I'd almost be tempted to write a script to try
> > > > > and convert your existing def.csv format into something simpler
> > > > > to process like what I describe.
>
> > > > > -tkc- Piilota siteerattu teksti -
>
> > > > > - Näytä siteerattu teksti -
>
> > > > Hi,
>
> > > > Thanks for your reply.
>
> > > > Def.csv could be modified so that every line has the same structure:
> > > > variable name, field start, field size and new place and would be
> > > > separated with semicolomns as you mentioned.
>
> > > > I tried your script (which seems quite logical) but I get this
>
> > > > Traceback (most recent call last):
> > > >   File "testing.py", line 16, in 
> > > >     out.write (row[slc].ljust(width))
> > > > TypeError: an integer is required
>
> > > > Yes - you said it was untested, but I can't figure out