On Thu, 14 Jun 2007, Lauren wrote: > Subseq AAAAAU can bind to UUUUUA (which is normal) and UUUUUG (not so > normal) and I want to know where UUUUUA, and UUUUUG are in the large > RNA sequence, and the locations to show up as one...thing.
How about something like this? ======================================================================== def seqsearch(seq, targets): """ return a list of match objects, each of which identifies where any of the targets are found in the string seq seq: string to be searched targets: list or tuple of alternate targets to be searched note: re.findall is not used, because it wont catch overlaps """ import re resultlist=[] pos=0 regext_text = "|".join(targets) regex = re.compile(regext_text) while True: result = regex.search(seq, pos) if result is None: break resultlist.append(result) pos = result.start()+1 return resultlist targets = ["UUUUUA", "UUUUUG"] sequence="UUCAAUUUGATACCAUUUUUAGCUUCCGUUUUUGCGATACCAUUUUAGCGU" # ++++++ ++++++ # 0 1 2 3 4 5 # 012345678901234567890123456789012345678901234567890 # note: matches at 15 & 28 matches = seqsearch(sequence, targets) for m in matches: print "match %s found at location %s" % (sequence[m.start():m.end()], m.start()) ======================================================================== This prints, as expected: match UUUUUA found at location 15 match UUUUUG found at location 28 _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor