Nice solution indeed! Will it also work with accented characters? And how 
should one incorporate the collating sequence into the solution? By explicitly 
setting the locale? It might be nice if the outcome is always the same, 
whereever you are in the world.

 
Cheers!!
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have 
the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


>________________________________
>From: Terry Carroll <carr...@tjc.com>
>To: tutor@python.org
>Sent: Sunday, November 6, 2011 8:21 PM
>Subject: Re: [Tutor] regexp
>
>On Sat, 5 Nov 2011, Dinara Vakhitova wrote:
>
>> I need to find the words in a corpus, which letters are in the alphabetical
>> order ("almost", "my" etc.)
>> I started with matching two consecutive letters in a word, which are in
>> the alphabetical order, and tried to use this expression: ([a-z])[\1-z], but
>> it won't work, it's matching any sequence of two letters. I can't figure out
>> why... Evidently I can't refer to a group like this, can I? But how in this
>> case can I achieve what I need?
>
>First, I agree with the others that this is a lousy task for regular 
>expressions.  It's not the tool I would use.  But, I do think it's doable, 
>provided the requirement is not to check with a single regular expression. For 
>simplicity's sake, I'll construe the problem as determining whether a given 
>string consists entirely of lower-case alphabetic characters, arranged in 
>alphabetical order.
>
>What I would do is set a variable to the lowest permissible character, i.e., 
>"a", and another to the highest permissible character, i.e., "z" (actually, 
>you could just use a constant, for the highest, but I like the symmetry.
>
>Then construct a regex to see if a character is within the lowest-permissible 
>to highest-permissible range.
>
>Now, iterate through the string, processing one character at a time.  On each 
>iteration:
>
>- test if your character meets the regexp; if not, your answer is
>   "false"; on pass one, this means it's not lower-case alphabetic; on
>   subsequent passes, it means either that, or that it's not in sorted
>   order.
>- If it passes, update your lowest permissible character with the
>   character you just processed.
>- regenerate your regexp using the updated lowest permissible character.
>- iterate.
>
>I assumed lower case alphabetic for simplicity, but you could modify this 
>basic approach with mixed case (e.g., first transforming to all-lower-case 
>copy) or other complications.
>
>I don't think there's a problem with asking for help with homework on this 
>list; but you should identify it as homework, so the responders know not to 
>just give you a solution to your homework, but instead provide you with hints 
>to help you solve it.
>_______________________________________________
>Tutor maillist  -  Tutor@python.org
>To unsubscribe or change subscription options:
>http://mail.python.org/mailman/listinfo/tutor
>
>
>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to