Legacy data parsing

2005-07-08 Thread gov
Hi,

I've just started to learn programming and was told this was a good
place to ask questions :)

Where I work, we receive large quantities of data which is currently
all printed on large, obsolete, dot matrix printers.  This is a problem
because the replacement parts will not be available for much longer.

So I'm trying to create a program which will capture the fixed width
text file data and convert as well as sort the data (there are several
different report types) into a different format which would allow it to
be printed normally, or viewed on a computer.

I've been reading up on the Regular Expression module and ways in which
to manipulate strings however it has been difficult to think of a way
in which to extract an address.

Here's an example of the raw text that I have to work with:


ADDRESS INFORMATION/RENSEIGNEMENTS SUR L'ADRESSE:


FOR/POUR AL/LA:  20
  CORR TYP:  A1B 2C3  P:3 CHNGD/CHANG
  LANG: E CONS/REGR: ###
  MRS XXX X XXX
  ### X ST  DD   TYP:   P:6
CHNGD/CHANG
  MONCTON NBLANG: E CONS/REGR:
###
MRS XXX X  XXX
#

###-###-#

ADDRESS INFORMATION/RENSEIGNEMENTS SUR L'ADRESSE:


FOR/POUR AL/LA:  30
  BOTH TYP:  A1B 2D3  P:3 CHNGD/CHANG
  LANG: E CONS/REGR: ###
  MISS  X
  ###  ST
  MONCTON NB

EARNINGS VITAL INFORMATION/RENSEIGNEMENTS ESSENTIELS SUR LES GAINS:
***

(the # = any number, and the X's are just regular text)
I would like to extract the address information, but the two different
text objects on the right hand side are difficult to remove.  I think
it would be easier if I could just extract a fixed square of
information, but I don't have a clue as to how to go about it.

If anyone could give me suggestions as to methods in sorting this type
of data, it would be appreciated.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Legacy data parsing

2005-07-11 Thread gov
Actually, we receive the data in the form of a text file.  The original
data is sent from an IBM mainframe then to Ottawa where it is captured
by an "SNA Print Server that receives the VPS print jobs, writes them
to disk and then runs a PERL script program on the disk file.  This
PERL script program scans the file's VPS banner page for key words
(e.g. JobName, Destination, Form) and then creates a Plain Text and a
Rich Text Format (RTF)."  This system is available Nationally for every
region in Canada.  It is unfortunate that our government has been so
slow in updating such an old process.

Since I don't really know (or have access to) the inner workings of the
mainframe or the conversion process, I can't really do much there.

The reason why I don't wish to simply replace the printer simply
convert it so it can be used on newer printers is because the data will
also be used to automate tasks (such as creating form letters to
clients).

-- 
http://mail.python.org/mailman/listinfo/python-list