rahmad akbar wrote: > hey guys > > i have this file i wish to parse, the file looks something like bellow. > there are only four entry here (AaaI, AacLI, AaeI, AagI). the complete > file contains thousands of entries > > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > REBASE, The Restriction Enzyme Database http://rebase.neb.com > Copyright (c) Dr. Richard J. Roberts, 2014. All rights reserved. > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > Rich Roberts Jan 30 > 2014 > > AaaI (XmaIII) C^GGCCG > AacLI (BamHI) GGATCC > AaeI (BamHI) GGATCC > AagI (ClaI) AT^CGAT > > > the strategy was to mark the string 'Rich Roberts' as the start. i wrote > the following function. but then i realized i couldn't do something like > .next() to the var in_file which is a list. so i added a flag start = > False in which will be turned to True upon 'Rich Roberts' found. is the > any simpler way to move to the next element in the list. like built in > method or something like that. > > def read_bionet(bionetfile): > res_enzime_dict = {} > in_file = open(bionetfile, 'r').readlines() > start = False > for line in in_file: > if line.startswith('Rich Roberts'): > start = True > if start and len(line) >= 10: > line = line.split() > res_enzime_dict[line[0]] = line[-1] > return res_enzime_dict
As David says, don't call readlines() which reads the lines of the file into a list, iterate over the file directly: def read_bionet(bionetfile): with open(bionetfile) as in_file: # skip header for line in in_file: if line.startswith("Rich Roberts"): break # populate dict res_enzimes = {} for line in in_file: # continues after the line with R. R. if len(line) >= 10: parts = line.split() res_enzimes[parts[0]] = parts[-1] # file will be closed now rather than at # the garbage collector's discretion return res_enzimes _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor