On Sat, 24 Jul 2010 04:23:41 am Mary Morris wrote: > I'm trying to compile a list of decorators from the source code at my > office. > I did this by doing a > > candidate_line.find("@") > > because all of our decorators start with the @ symbol. The problem > I'm having is that the email addresses that are included in the > comments are getting included in the list that is getting returned.
First of all, to solve this problem *properly* you will need a proper parser to walk over the code and look for decorators, ignoring comments, skipping over strings, and similar. But that's hard, or at least I have no idea how to do it, so the alternative is a basic filter like you are doing. If you're using Linux, Mac or some other Unix, the fastest solution would be to use grep. But ignoring that, think about what a decorator line is. You suggest above that a candidate line is a decorator if it has a @ sign in it. But that's incorrect. This is not a decorator: # send an email to st...@something.net or geo...@example.gov.au But this might be: @decorator So let's start with a simple little generator to return lines as a candidate decorator only if it *starts* with an ampersand: def find_decorators(lines): """Return likely decorators from lines of text.""" for line in lines: line = line.lstrip() # ignore leading spaces if line.startswith('@'): yield line That's still not fool-proof, only a proper Python parser will be fool-proof. This will be fooled by the *second* line in something like: instructions = """If you have a problem with this, please call Fred @ accounts and tell him to reset the modem, then try again. If it still doesn't work blah blah blah """ So, not fool-proof, but it does the job. You use find_decorators like this: # Process them one at a time. for decorator_line in find_decorators(open("source.py")): print decorator_line To get them all at once, use: list_of_decorators = list(find_decorators(open("source.py"))) How can we improve this? At the moment, find_decorators happily returns a line like this: @decorator # This is a comment but you probably don't care about the comment. So let's make a second filter to throw it away: def remove_comments(lines): for line in lines: p = line.find('#') if p > -1: # Keep characters up to but not including p, # ignoring trailing spaces yield line[:p].rstrip() else: yield line And now apply this filter only to decorator lines: f = open("source.py") for decorator in remove_comments(find_decorators(f)): print decorator To get them all at once: f = open("source.py") results = list(remove_comments(find_decorators(f))) Again, this is not foolproof. If you have a decorator like this: @decorator("this takes a string argument with a # inside it") the filter will return: @decorator("this takes a string argument with a But, and I repeat myself like a broken record, if you want fool-proof, you need a proper parser, and that's hard. -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor