Re: Extracting subsequences composed of the same character

Roy Smith Thu, 31 Mar 2011 18:51:22 -0700

In article <[email protected]>,
 candide <[email protected]> wrote:


> Suppose you have a string, for instance
> 
> "pyyythhooonnn ---> ++++"
> 
> and you search for the subquences composed of the same character, here 
> you get :
> 
> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

I got the following. It's O(n) (with the minor exception that the string 
addition isn't, but that's trivial to fix, and in practice, the bunches 
are short enough it hardly matters).

#!/usr/bin/env python                                                           
                    

s = "pyyythhooonnn ---> ++++"
answer = ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']

last = None
bunches = []
bunch = ''
for c in s:
    if c == last:
        bunch += c
    else:
        if bunch:
            bunches.append(bunch)
        bunch = c
        last = c
bunches.append(bunch)

multiples = [bunch for bunch in bunches if len(bunch) > 1]
print multiples
assert(multiples == answer)


[eagerly awaiting a PEP for collections.bunch and 
collections.frozenbunch]
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Extracting subsequences composed of the same character

Reply via email to