Please post in plain text (not html) as otherwise the code gets screwed up. When I paste your code into a terminal this is what happens:
>>> junk_list = 'tmsh list net interface 1.3 media-ca \rpabilities\r\nnet >>> interface 1.3 {\r\n media-capabilities {\r\n none\r\n >>> auto\r\n 40000SR4-FD\r\n 10T-HD\r\n 100TX-FD\r\n >>> 100TX-HD\r\n 1000T-FD\r\n 40000LR4-FD\r\n 1000T-HD\r\n >>> }\r\n}\r\n' >>> junk_list 'tmsh list net interface 1.3 media-ca \rpabilities\r\nnet interface 1.3 {\r\n\xc2\xa0\xc2\xa0\xc2\xa0 media-capabilities {\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 none\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 auto\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 40000SR4-FD\r\n\xc2\xa0 10T-HD\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 100TX-FD\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 100TX-HD\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 1000T-FD\r\n\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 40000LR4-FD\r\n \xc2\xa0\xc2\xa0\xc2\xa0 1000T-HD\r\n\xc2\xa0\xc2\xa0\xc2\xa0 }\r\n}\r\n' Those \xc2\xa0 characters are non-breaking space characters. The trouble is that I don't know if they were added by your email client or are actually part of your junk string. I've assumed the former and replaced them with spaces in the code I show below. On 28 January 2013 19:15, Dave Wilder <d.wil...@f5.com> wrote: > Hello, > > I am trying using re.findall to parse the string below and then create a > list from the results. > junk_list = 'tmsh list net interface 1.3 media-ca \rpabilities\r\nnet > interface 1.3 {\r\n media-capabilities {\r\n none\r\n > auto\r\n 40000SR4-FD\r\n 10T-HD\r\n 100TX-FD\r\n > 100TX-HD\r\n 1000T-FD\r\n 40000LR4-FD\r\n 1000T-HD\r\n > }\r\n}\r\n' > > What I am doing now is obviously quite ugly, but I have not yet able to > manipulate it to work how I want but in a much more efficient and modular > way. > I did some research on re.findall but am still confused as to how to do > character repetition searches, which I guess is what I need to do here. >>> junk_list = >>> re.findall(r'(auto|[1|4]0+[A-Z]-[HF]D|[1|4]0+[A-Z][A-Z]-[HF]D|[1|4]0+[A-Z][A-Z][0-9])', >>> junk_list) >>> junk_list > ['auto', '40000SR4', '10T-HD', '100TX-FD', '100TX-HD', '40000LR4', > '1000T-FD', '1000T-HD'] This output doesn't match what I would expect from the string above. Why is '1000T-FD' after '40000LR4-FD'? Is that the problem with the code you posted? >>>> > > Basically, all I need to search on is: > > auto > anything that starts w/ ‘1’ or ‘4’ and then any number of subsequent zeroes > e.g. 10T-HD, 40000LR4-FD, 100TX-FD Does "any number" mean "one or more" or "zero or more"? Some people like to use regexes for everything. I prefer to try string methods first as I find them easier to understand. Here's my attempt: >>> junk_list = 'tmsh list net interface 1.3 media-ca \rpabilities\r\nnet >>> interface 1.3 {\r\n media-capabilities {\r\n none\r\n >>> auto\r\n 40000SR4-FD\r\n 10T-HD\r\n 100TX-FD\r\n >>> 100TX-HD\r\n 1000T-FD\r\n 40000LR4-FD\r\n 1000T-HD\r\n >>> }\r\n}\r\n' >>> junk_list = [s.strip() for s in junk_list.splitlines()] >>> junk_list = [s for s in junk_list if s == 'auto' or s[:2] in ('10', '40')] >>> junk_list ['auto', '40000SR4-FD', '10T-HD', '100TX-FD', '100TX-HD', '1000T-FD', '40000LR4-FD', '1000T-HD'] Does that do what you want? Oscar _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor