Python Regular Expressions

2011-06-22 Thread Andy Barnes
Hi,

I am hoping someone here can help me with a problem I'd like to
resolve with Python. I have used it before for some other projects but
have never needed to use Regular Expressions before. It's quite
possible I am following completley the wrong tack for this task (so
any advice appreciated).

I have a source file in csv format that follows certain rules. I
basically want to parse the source file and spit out a second file
built from some rules and the content of the first file.

Source File Format:

Name, Type, Create, Study, Read, Teach, Prerequisite
# column headers

Distil Mana, Lore, n/a, 70, 38, 21
Theurgic Lore, Lore, n/a, 105, 70, 30, Distil Mana
Talismantic Lore, Lore, n/a, 150, 100, 50
Advanced Talismantic Lore, Lore, n/a, 100, 60, 30, Talismantic Lore,
Theurgic Lore

The input file I have has over 700 unique entries. I have tried to
cover the four main exceptions above. Before I detail them - this is
what I would like the above input file, to be output as (dot
diagramming language incase anyone recognises it):

Name, Type, Create, Study, Read, Teach, Prerequisite
# column headers

DistilMana [label="{ Distil Mana |{Type|7}|{70|38|21}}"];
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
DistilMana -> TheurgicLore;
TalismanticLore [label="{ Talismantic Lore |{Lore|n/a}|{150|100|
50}}"];
AdvanvedTalismanticLore [label="{ Advanced Talismantic Lore |{Lore|n/
a}|{100|60|30}}"];
TalismanticLore -> AdvanvedTalismanticLore;
TheurgicLore -> AdvanvedTalismanticLore;

It's quite a complicated find and replace operation that can be broken
down into some easy stages. The main thing the sample above showed was
that some of the entries won't list any prerequisits - these only need
the descriptor entry creating. Some of them have more than one
prerequisite. A line is needed for each prerequisite listed, linking
it to it's parent.

You can also see that the 'name' needs to have spaces removed and it's
repeated a few times in the process. I Hope it's easy to see what I am
trying to achieve from the above. I'd be very happy to accept
assistance in automating the conversion of my ever expanding csv file,
into the dot format described above.

Andy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Regular Expressions

2011-06-22 Thread Andy Barnes
to expand. I have parsed one of the lines manually to try and break
the process I'm trying to automate down.

source:
Theurgic Lore, Lore, n/a, 105, 70, 30, Distil Mana

output:
TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
DistilMana -> TheurgicLore;

This is the steps I would take to do this conversion manually:

1) Take everything prior to the first comma and remove all the spaces,
insert it into a newline:

TheurgicLore

2) append the following string ' [label="{ '

TheurgicLore [label="{

3) append everything prior to the first comma (this time we don't need
to remove the spaces)

TheurgicLore [label="{ Theurgic Lore

4) append the following string ' |{'

TheurgicLore [label="{ Theurgic Lore |{

5) append everything between the 1st and 2nd comma of the source file
followed by a '|'

TheurgicLore [label="{ Theurgic Lore |{Lore|

6) append everything between the 2nd and 3rd comma of the source file
followed by a '}|{'

TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{

7) append everything between the 3rd and 4th comma of the source file
followed by a '|'

TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|

8) append everything between the 4th and 5th comma of the source file
followed by a '|'

TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|

9) append everything between the 5th and 6th comma of the source file
followed by a '}}"];'

TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];

Those 9 steps spit out my fist line of output file as above
"TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];" I
now have to parse the dependancies onto a newline.

# this next process needs to be repeated for each prerequisite, so if
there are two pre-requisites it would need to keep parsing for more
comma's.
1a) take everything between the 6th and 7th comma and put it at the
start of a new line (remove spaces)

DistilMana

2a) append '-> '

DistilMana ->

3a) append everything prior to the first comma, with spaces removed

DistilMana -> TheurgicLore

This should now be all the steps to spit out:

TheurgicLore [label="{ Theurgic Lore |{Lore|n/a}|{105|70|30}}"];
DistilMana -> TheurgicLore;
-- 
http://mail.python.org/mailman/listinfo/python-list