Bo Li wrote:
Dear Python

I am new to Python and having questions about its usage. Currently I have to
read two .csv files INCT and INMRI which are similar to this

INCT
      NONAME 121.57 34.71 14.81 1.35 0 0 1  Cella 129.25 100.31 27.25 1.35 1
1 1  Chiasm 130.3 98.49 26.05 1.35 1 1 1  FMagnum 114.89 144.94 -15.74 1.35
1 1 1  Iz 121.57 198.52 30.76 1.35 1 1 1  LEAM 160.53 127.6 -1.14 1.35 1 1 1
LEAM 55.2 124.66 12.32 1.35 1 1 1  LPAF 180.67 128.26 -9.05 1.35 1 1 1  LTM
77.44 124.17 15.95 1.35 1 1 1  Leye 146.77 59.17 -2.63 1.35 1 0 0  Nz 121.57
34.71 14.81 1.35 1 1 1  Reye 91.04 57.59 6.98 1.35 0 1 0
INMRI
    NONAME 121.57 34.71 14.81 1.35 0 0 1  Cella 129.25 100.31 27.25 1.35 1 1
1  Chiasm 130.3 98.49 26.05 1.35 1 1 1  FMagnum 114.89 144.94 -15.74 1.35 1
1 1  Iz 121.57 198.52 30.76 1.35 1 1 1  LEAM 160.53 127.6 -1.14 1.35 1 1 1
LEAM 55.2 124.66 12.32 1.35 1 1 1  LPAF 180.67 128.26 -9.05 1.35 1 1 1  LTM
77.44 124.17 15.95 1.35 1 1 1  Leye 146.77 59.17 -2.63 1.35 1 0 0
My job is to match the name on the two files and combine the first three
attributes together. So far I tried to read two files. But when I tried to
match the pattern using nested loop, but Python stops me after 1 iteration.
Here is what I got so far.

INCT = open(' *.csv')
INMRI = open(' *.csv')

for row in INCT:
    name, x, y, z, a, b, c, d = row.split(",")
    print aaa,
    for row2 in INMRI:
        NAME, X, Y, Z, A, B, C, D = row2.split(",")
        if name == NAME:
            print aaa


The results are shown below

"NONAME" "NONAME" "Cella " "NONAME" "Chiasm" "NONAME" "FMagnum" "NONAME"
"Inion" "NONAME" "LEAM" "NONAME" "LTM" "NONAME" "Leye" "NONAME" "Nose"
"NONAME" "Nz" "NONAME" "REAM" "NONAME" "RTM" "NONAME" "Reye" "Cella"
"Chiasm" "FMagnum" "Iz" "LEAM" "LEAM" "LPAF" "LTM" "Leye" "Nz" "Reye"


I was a MATLAB user and am really confused by what happens with me. I wish
someone could help me with this intro problem and probably indicate a
convenient way for pattern matching. Thanks!

I'm wondering how Christian's quote of your message was formatted so much better. Your csv contents are word-wrapped when I see your email. Did you perhaps send it using html mail, instead of text?

The other thing I note (and this is the same with Christian's version of your message), is that the code you show wouldn't run, and also wouldn't produce the output you supplied, so you must have retyped it instead of copy/pasting it. That makes the job harder, for anybody trying to help.

Christian's analysis of your problem was spot-on. Files can only be iterated once, and thus the inner loop will fail the second time through the outer loop. However, there are two possible fixes that are both closer to what you have, and therefore perhaps more desirable.

Simplest change is to do a readlines() on the second file. This means you have to have enough memory for the whole file, stored as a list.

INCT = open('file1.csv')
INMRIlist = open('file2.csv').readlines()

for row in INCT:
   name, x, y, z, a, b, c, d = row.split(",")
   print name,
   for row2 in INMRIlist:
       NAME, X, Y, Z, A, B, C, D = row2.split(",")
       print NAME,
       if name == NAME:
           print "---matched---"



The other choice, somewhat slower, but saving of memory, is


INCT = open('file1.csv')
#INMRI = open('file2.csv')

for row in INCT:
   name, x, y, z, a, b, c, d = row.split(",")
   print name,
   for row2 in open('file2.csv'):
       NAME, X, Y, Z, A, B, C, D = row2.split(",")
       print NAME,
       if name == NAME:
           print "---matched---"

There are many other things I would change (probably eventually going to the dictionary that Christian mentioned), but these are the minimum changes to let you continue down the path you've envisioned.


(all code untested, I just typed it directly into the email, assuming Python2.6)


DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to