In article <[email protected]>,
candide <[email protected]> wrote:
> Suppose you have a string, for instance
>
> "pyyythhooonnn ---> ++++"
>
> and you search for the subquences composed of the same character, here
> you get :
>
> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
I got the following. It's O(n) (with the minor exception that the string
addition isn't, but that's trivial to fix, and in practice, the bunches
are short enough it hardly matters).
#!/usr/bin/env python
s = "pyyythhooonnn ---> ++++"
answer = ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
last = None
bunches = []
bunch = ''
for c in s:
if c == last:
bunch += c
else:
if bunch:
bunches.append(bunch)
bunch = c
last = c
bunches.append(bunch)
multiples = [bunch for bunch in bunches if len(bunch) > 1]
print multiples
assert(multiples == answer)
[eagerly awaiting a PEP for collections.bunch and
collections.frozenbunch]
--
http://mail.python.org/mailman/listinfo/python-list