Josep M. Fontana wrote:
[...]
I guess this is because the character encoding was not specified but
accented characters in the languages I'm dealing with should be
treated as a-z or A-Z, shouldn't they?
No. a-z means a-z. If you want the localized set of alphanumeric
characters, you need \w.
On Sun, Nov 28, 2010 at 6:14 PM, Steven D'Aprano wrote:
> Have you considered just using the isalnum() method?
>
'¿de'.isalnum()
> False
Mmm. No, I didn't consider it because I didn't even know such a method
existed. This can turn out to be very handy but I don't think it would
help me at t
Sorry, something went wrong and my message got sent before I could
finish it. I'll try again.
On Tue, Nov 30, 2010 at 2:19 PM, Josep M. Fontana
wrote:
> On Sun, Nov 28, 2010 at 6:03 PM, Evert Rol wrote:
>
>> -
>> with open('output_tokens.txt', 'a') as out_tokens:
>>with open('text
Josep M. Fontana wrote:
I'm trying to use regular expressions to extract strings that match
certain patterns in a collection of texts. Basically these texts are
edited versions of medieval manuscripts that use certain symbols to
mark information that is useful for filologists.
I'm interested in
> Here's what I do. This was just a first attempt to get strings
> starting with a non alpha-numeric symbol. If this had worked, I would
> have continued to build the regular expression to get words with non
> alpha-numeric symbols in the middle and in the end. Alas, even this
> first attempt did