On Tue, Apr 14, 2015 at 10:00:47AM +0200, Peter Otten wrote: > Steven D'Aprano wrote:
> > I swear that Perl has been a blight on an entire generation of > > programmers. All they know is regular expressions, so they turn every > > data processing problem into a regular expression. Or at least they > > *try* to. As you have learned, regular expressions are hard to read, > > hard to write, and hard to get correct. > > > > Let's write some Python code instead. [...] > The tempter took posession of me and dictated: > > >>> pprint.pprint( > ... [(k, int(v)) for k, v in > ... re.compile(r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*").findall(line)]) > [('Input Read Pairs', 2127436), > ('Both Surviving', 1795091), > ('Forward Only Surviving', 17315), > ('Reverse Only Surviving', 6413), > ('Dropped', 308617)] Nicely done :-) I didn't say that it *couldn't* be done with a regex. Only that it is harder to read, write, etc. Regexes are good tools, but they aren't the only tool and as a beginner, which would you rather debug? The extract() function I wrote, or r"(.+?):\s+(\d+)(?:\s+\(.*?\))?\s*" ? Oh, and for the record, your solution is roughly 4-5 times faster than the extract() function on my computer. If I knew the requirements were not likely to change (that is, the maintenance burden was likely to be low), I'd be quite happy to use your regex solution in production code, although I would probably want to write it out in verbose mode just in case the requirements did change: r"""(?x) (?# verbose mode) (.+?): (?# capture one or more character, followed by a colon) \s+ (?# one or more whitespace) (\d+) (?# capture one or more digits) (?: (?# don't capture ... ) \s+ (?# one or more whitespace) \(.*?\) (?# anything inside round brackets) )? (?# ... and optional) \s* (?# ignore trailing spaces) """ That's a hint to people learning regular expressions: start in verbose mode, then "de-verbose" it if you must. -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor