Re: [Tutor] A regular expression problem

2010-12-01 Thread Steven D'Aprano
Josep M. Fontana wrote: [...] I guess this is because the character encoding was not specified but accented characters in the languages I'm dealing with should be treated as a-z or A-Z, shouldn't they? No. a-z means a-z. If you want the localized set of alphanumeric characters, you need \w.

Re: [Tutor] A regular expression problem

2010-11-30 Thread Josep M. Fontana
On Sun, Nov 28, 2010 at 6:14 PM, Steven D'Aprano wrote: > Have you considered just using the isalnum() method? > '¿de'.isalnum() > False Mmm. No, I didn't consider it because I didn't even know such a method existed. This can turn out to be very handy but I don't think it would help me at t

Re: [Tutor] A regular expression problem

2010-11-30 Thread Josep M. Fontana
Sorry, something went wrong and my message got sent before I could finish it. I'll try again. On Tue, Nov 30, 2010 at 2:19 PM, Josep M. Fontana wrote: > On Sun, Nov 28, 2010 at 6:03 PM, Evert Rol wrote: > >> - >> with open('output_tokens.txt', 'a') as out_tokens: >>with open('text

Re: [Tutor] A regular expression problem

2010-11-28 Thread Steven D'Aprano
Josep M. Fontana wrote: I'm trying to use regular expressions to extract strings that match certain patterns in a collection of texts. Basically these texts are edited versions of medieval manuscripts that use certain symbols to mark information that is useful for filologists. I'm interested in

Re: [Tutor] A regular expression problem

2010-11-28 Thread Evert Rol
> Here's what I do. This was just a first attempt to get strings > starting with a non alpha-numeric symbol. If this had worked, I would > have continued to build the regular expression to get words with non > alpha-numeric symbols in the middle and in the end. Alas, even this > first attempt did