In <[EMAIL PROTECTED]>, brad wrote:
> I am developing a list of 3 character strings like this:
>
> and
> bra
> cam
> dom
> emi
> mar
> smi
> ...
>
> The goal of the list is to have enough strings to identify files that
> may contain the names of people. Missing a name in a file is unacceptable.
Then simply return `True` for any file that contains at least two or three
ASCII letters in a row. Easily written as a short re. ;-)
> I may end up with a thousand or so of these 3 character strings. Is that
> too much for an re.compile to handle? Also, is this a bad way to
> approach this problem? Any ideas for improvement are welcome!
Unless you can come up with some restrictions to the names, just follow
the advice above or give up. I saw a documentation about someone with the
name "Scary Guy" in his ID papers recently. What about names with letters
not in the ASCII range?
Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list