Where to contribute Unicode General Category encoding/decoding
Hi all, I have created some handy code to encode and decode Unicode General Categories. To which Python Package should I contribute this? Regards, Pander -- http://mail.python.org/mailman/listinfo/python-list
Re: Where to contribute Unicode General Category encoding/decoding
On Thursday, December 13, 2012 2:22:57 PM UTC+1, Bruno Dupuis wrote: > On Thu, Dec 13, 2012 at 01:51:00AM -0800, Pander Musubi wrote: > > > Hi all, > > > > > > I have created some handy code to encode and decode Unicode General > > Categories. To which Python Package should I contribute this? > > > > > > > Hi, > > > > As said in a recent thread (a graph data structure IIRC), talking about > > new features is far better if we see the code, so anyone can figure what > > the code really does. > > > > Can you provide a public repository uri or something? > > > > Standard lib inclusions are not trivial, it most likely happens for > well-known, > > mature, PyPI packages, or battle-tested code patterns. Therefore, it's > > often better to make a package on PyPI, or, if the code is too short, to > submit > > your handy chunks on ActiveState. If it deserves a general approbation, it > > may be included in Python stdlib. I was expecting PyPI. Here is the code, please advise on where to submit it: http://pastebin.com/dbzeasyq > Cheers > > > > -- > > Bruno -- http://mail.python.org/mailman/listinfo/python-list
Re: Where to contribute Unicode General Category encoding/decoding
On Friday, December 14, 2012 1:06:23 AM UTC+1, Steven D'Aprano wrote:
> On Thu, 13 Dec 2012 07:30:57 -0800, Pander Musubi wrote:
>
>
>
> > I was expecting PyPI. Here is the code, please advise on where to submit
>
> > it:
>
> > http://pastebin.com/dbzeasyq
>
>
>
> If anywhere, either a third-party module, or the unicodedata standard
>
> library module.
>
>
>
>
>
> Some unanswered questions:
>
>
>
> - when would somebody need this function?
>
When working with Unicode metedata, see below.
>
>
> - why is is called "decodeUnicodeGeneralCategory" when it
>
> doesn't seem to have anything to do with decoding?
It is actually a simple LUT. I like your improvements below.
> - why is the parameter "sortable" called sortable, when it
>
> doesn't seem to have anything to do with sorting?
The values return are alphabetically sortable.
>
>
>
>
>
> If this is useful at all, it would be more useful to just expose the data
>
> as a dict, and forget about an unnecessary wrapper function:
>
>
>
>
>
> from collections import namedtuple
>
> r = namedtuple("record", "other name desc") # better field names needed!
>
>
>
> GC = {
>
> 'C' : r('Other', 'Other', 'Cc | Cf | Cn | Co | Cs'),
>
> 'Cc': r('Control', 'Control',
>
> 'a C0 or C1 control code'), # a.k.a. cntrl
>
> 'Cf': r('Format', 'Format', 'a format control character'),
>
> 'Cn': r('Unassigned', 'Unassigned',
>
> 'a reserved unassigned code point or a noncharacter'),
>
> 'Co': r('Private Use', 'Private_Use', 'a private-use character'),
>
> 'Cs': r('Surrogate', 'Surrogate', 'a surrogate code point'),
>
> 'L' : r('Letter', 'Letter', 'Ll | Lm | Lo | Lt | Lu'),
>
> 'LC': r('Letter, Cased', 'Cased_Letter', 'Ll | Lt | Lu'),
>
> 'Ll': r('Letter, Lowercase', 'Lowercase_Letter',
>
> 'a lowercase letter'),
>
> 'Lm': r('Letter, Modifier', 'Modifier_Letter', 'a modifier letter'),
>
> 'Lo': r('Letter, Other', 'Other_Letter',
>
> 'other letters, including syllables and ideographs'),
>
> 'Lt': r('Letter, Titlecase', 'Titlecase_Letter',
>
> 'a digraphic character, with first part uppercase'),
>
> 'Lu': r('Letter, Uppercase', 'Uppercase_Letter',
>
> 'an uppercase letter'),
>
> 'M' : r('Mark', 'Mark', 'Mc | Me | Mn '), # a.k.a. Combining_Mark
>
> 'Mc': r('Mark, Spacing', 'Spacing_Mark',
>
> 'a spacing combining mark (positive advance width)'),
>
> 'Me': r('Mark, Enclosing', 'Enclosing_Mark',
>
> 'an enclosing combining mark'),
>
> 'Mn': r('Mark, Nonspacing', 'Nonspacing_Mark',
>
> 'a nonspacing combining mark (zero advance width)'),
>
> 'N' : r('Number', 'Number', 'Nd | Nl | No'),
>
> 'Nd': r('Number, Decimal', 'Decimal_Number',
>
> 'a decimal digit'), # a.k.a. digit
>
> 'Nl': r('Number, Letter', 'Letter_Number',
>
> 'a letterlike numeric character'),
>
> 'No': r('Number, Other', 'Other_Number',
>
> 'a numeric character of other type'),
>
> 'P' : r('Punctuation', 'Punctuation',
>
> 'Pc | Pd | Pe | Pf | Pi | Po | Ps'), # a.k.a. punct
>
> 'Pc': r('Punctuation, Connector', 'Connector_Punctuation',
>
> 'a connecting punctuation mark, like a tie'),
>
> 'Pd': r('Punctuation, Dash', 'Dash_Punctuation',
>
> 'a dash or hyphen punctuation mark'),
>
> 'Pe': r('Punctuation, Close', 'C
Re: Where to contribute Unicode General Category encoding/decoding
On Friday, December 14, 2012 2:07:51 PM UTC+1, Pander Musubi wrote:
> On Friday, December 14, 2012 1:06:23 AM UTC+1, Steven D'Aprano wrote:
>
> > On Thu, 13 Dec 2012 07:30:57 -0800, Pander Musubi wrote:
>
> >
>
> >
>
> >
>
> > > I was expecting PyPI. Here is the code, please advise on where to submit
>
> >
>
> > > it:
>
> >
>
> > > http://pastebin.com/dbzeasyq
>
> >
>
> >
>
> >
>
> > If anywhere, either a third-party module, or the unicodedata standard
>
> >
>
> > library module.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Some unanswered questions:
>
> >
>
> >
>
> >
>
> > - when would somebody need this function?
>
> >
>
>
>
> When working with Unicode metedata, see below.
>
>
>
> >
>
> >
>
> > - why is is called "decodeUnicodeGeneralCategory" when it
>
> >
>
> > doesn't seem to have anything to do with decoding?
>
>
>
> It is actually a simple LUT. I like your improvements below.
>
>
>
> > - why is the parameter "sortable" called sortable, when it
>
> >
>
> > doesn't seem to have anything to do with sorting?
>
>
>
> The values return are alphabetically sortable.
>
>
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > If this is useful at all, it would be more useful to just expose the data
>
> >
>
> > as a dict, and forget about an unnecessary wrapper function:
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > from collections import namedtuple
>
> >
>
> > r = namedtuple("record", "other name desc") # better field names needed!
>
> >
>
> >
>
> >
>
> > GC = {
>
> >
>
> > 'C' : r('Other', 'Other', 'Cc | Cf | Cn | Co | Cs'),
>
> >
>
> > 'Cc': r('Control', 'Control',
>
> >
>
> > 'a C0 or C1 control code'), # a.k.a. cntrl
>
> >
>
> > 'Cf': r('Format', 'Format', 'a format control character'),
>
> >
>
> > 'Cn': r('Unassigned', 'Unassigned',
>
> >
>
> > 'a reserved unassigned code point or a noncharacter'),
>
> >
>
> > 'Co': r('Private Use', 'Private_Use', 'a private-use character'),
>
> >
>
> > 'Cs': r('Surrogate', 'Surrogate', 'a surrogate code point'),
>
> >
>
> > 'L' : r('Letter', 'Letter', 'Ll | Lm | Lo | Lt | Lu'),
>
> >
>
> > 'LC': r('Letter, Cased', 'Cased_Letter', 'Ll | Lt | Lu'),
>
> >
>
> > 'Ll': r('Letter, Lowercase', 'Lowercase_Letter',
>
> >
>
> > 'a lowercase letter'),
>
> >
>
> > 'Lm': r('Letter, Modifier', 'Modifier_Letter', 'a modifier letter'),
>
> >
>
> > 'Lo': r('Letter, Other', 'Other_Letter',
>
> >
>
> > 'other letters, including syllables and ideographs'),
>
> >
>
> > 'Lt': r('Letter, Titlecase', 'Titlecase_Letter',
>
> >
>
> > 'a digraphic character, with first part uppercase'),
>
> >
>
> > 'Lu': r('Letter, Uppercase', 'Uppercase_Letter',
>
> >
>
> > 'an uppercase letter'),
>
> >
>
> > 'M' : r('Mark', 'Mark', 'Mc | Me | Mn '), # a.k.a. Combining_Mark
>
> >
>
> > 'Mc': r('Mark, Spacing', 'Spacing_Mark',
>
> >
>
> > 'a spacing combining mark (positive advance width)'),
>
> >
>
> > 'Me': r('Mark, Enclosing', 'Enclosing_Mark',
>
> >
>
> > 'an enclosing combining mark'),
>
> >
>
> > 'Mn': r('M
Re: Where to contribute Unicode General Category encoding/decoding
On Friday, December 14, 2012 5:22:31 PM UTC+1, Pander Musubi wrote:
> On Friday, December 14, 2012 2:07:51 PM UTC+1, Pander Musubi wrote:
>
> > On Friday, December 14, 2012 1:06:23 AM UTC+1, Steven D'Aprano wrote:
>
> >
>
> > > On Thu, 13 Dec 2012 07:30:57 -0800, Pander Musubi wrote:
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > > I was expecting PyPI. Here is the code, please advise on where to submit
>
> >
>
> > >
>
> >
>
> > > > it:
>
> >
>
> > >
>
> >
>
> > > > http://pastebin.com/dbzeasyq
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > If anywhere, either a third-party module, or the unicodedata standard
>
> >
>
> > >
>
> >
>
> > > library module.
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > Some unanswered questions:
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > - when would somebody need this function?
>
> >
>
> > >
>
> >
>
> >
>
> >
>
> > When working with Unicode metedata, see below.
>
> >
>
> >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > - why is is called "decodeUnicodeGeneralCategory" when it
>
> >
>
> > >
>
> >
>
> > > doesn't seem to have anything to do with decoding?
>
> >
>
> >
>
> >
>
> > It is actually a simple LUT. I like your improvements below.
>
> >
>
> >
>
> >
>
> > > - why is the parameter "sortable" called sortable, when it
>
> >
>
> > >
>
> >
>
> > > doesn't seem to have anything to do with sorting?
>
> >
>
> >
>
> >
>
> > The values return are alphabetically sortable.
>
> >
>
> >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > If this is useful at all, it would be more useful to just expose the data
>
> >
>
> > >
>
> >
>
> > > as a dict, and forget about an unnecessary wrapper function:
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > from collections import namedtuple
>
> >
>
> > >
>
> >
>
> > > r = namedtuple("record", "other name desc") # better field names needed!
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > GC = {
>
> >
>
> > >
>
> >
>
> > > 'C' : r('Other', 'Other', 'Cc | Cf | Cn | Co | Cs'),
>
> >
>
> > >
>
> >
>
> > > 'Cc': r('Control', 'Control',
>
> >
>
> > >
>
> >
>
> > > 'a C0 or C1 control code'), # a.k.a. cntrl
>
> >
>
> > >
>
> >
>
> > > 'Cf': r('Format', 'Format', 'a format control character'),
>
> >
>
> > >
>
> >
>
> > > 'Cn': r('Unassigned', 'Unassigned',
>
> >
>
> > >
>
> >
>
> > > 'a reserved unassigned code point or a noncharacter'),
>
> >
>
> > >
>
> >
>
> > > 'Co': r('Private Use', 'Private_Use', 'a private-use character'),
>
> >
>
> > >
>
> >
>
> > > 'Cs': r('Surrogate', 'Surrogate', 'a surrogate code point'),
>
&g
Custom alphabetical sort
Hi all,
I would like to sort according to this order:
(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç',
'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g',
'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k',
'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô',
'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u',
'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y',
'Y', 'z', 'Z')
How can I do this? The default sorted() does not give the desired result.
Thanks,
Pander
--
http://mail.python.org/mailman/listinfo/python-list
Re: Custom alphabetical sort
On Monday, December 24, 2012 5:11:03 PM UTC+1, Thomas Bach wrote:
> On Mon, Dec 24, 2012 at 07:32:56AM -0800, Pander Musubi wrote:
>
> > I would like to sort according to this order:
>
> >
>
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
> > 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c',
> > 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
> > 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì',
> > 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O',
> > 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r',
> > 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù',
> > 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> >
>
>
>
> One option is to use sorted's key parameter with an appropriate
>
> mapping in a dictionary:
>
>
>
> >>> cs = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8',
> >>> '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b',
> >>> 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê',
> >>> 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í',
> >>> 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n',
> >>> 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø',
> >>> 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü',
> >>> 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y',
> >>> 'Y', 'z', 'Z')
>
>
>
> >>> d = { k: v for v, k in enumerate(cs) }
>
>
>
> >>> import random
>
>
>
> >>> ''.join(sorted(random.sample(cs, 20), key=d.get))
>
> '5aAàÀåBCçËÉíÎLÖøquùx'
This doesn't work for words with more than one character:
>>> test=('øasdf', 'áá', 'aa', 'a123','á1234', 'Aaa', )
>>> sorted(test, key=d.get)
['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']
>
>
>
> Regards,
>
> Thomas.
--
http://mail.python.org/mailman/listinfo/python-list
Re: Custom alphabetical sort
> > Hi all,
>
> >
>
> > I would like to sort according to this order:
>
> >
>
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
>
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
>
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
>
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
>
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
>
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
>
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
>
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> >
>
> > How can I do this? The default sorted() does not give the desired result.
>
>
>
> I'm assuming that doesn't correspond to some standard locale's collating
>
> order, so we really do need to roll our own encoding (and that you have
>
> a good reason for wanting to do this).
It is for creating a Dutch dictionary. This sorting order is not to be found in
an existing locale.
> I'm also assuming that what I'm
>
> seeing as question marks are really accented characters in some encoding
>
> that my news reader just isn't dealing with (it seems to think your post
>
> was in ISO-2022-CN (Simplified Chinese).
>
>
>
> I'm further assuming that you're starting with a list of unicode
>
> strings, the contents of which are limited to the above alphabet.
Correct.
> I'm
>
> even further assuming that the volume of data you need to sort is small
>
> enough that efficiency is not a huge concern.
Well, it is for 200,000 - 450,000 words but the code is allowed be slow. It
will not be used for web application or something which requires a quick
response.
> Given all that, I would start by writing some code which turned your
>
> alphabet into a pair of dicts. One maps from the code point to a
>
> collating sequence number (i.e. ordinals), the other maps back.
>
> Something like (for python 2.7):
>
>
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
>
> '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
>
> [...]
>
> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
>
>
> map1 = {c: n for n, c in enumerate(alphabet)}
>
> map2 = {n: c for n, c in enumerate(alphabet)}
OK, similar to Thomas' proposal.
> Next, I would write some functions which encode your strings as lists of
>
> ordinals (and back again)
>
>
>
> def encode(s):
>
>"encode('foo') ==> [34, 19, 19]" # made-up ordinals
>
>return [map1[c] for c in s]
>
>
>
> def decode(l):
>
>"decode([34, 19, 19]) ==> 'foo'"
>
> return ''.join(map2[i] for i in l)
>
>
>
> Use these to convert your strings to lists of ints which will sort as
>
> per your specified collating order, and then back again:
>
>
>
> encoded_strings = [encode(s) for s in original_list]
>
> encoded_strings.sort()
>
> sorted_strings = [decode(l) for l in encoded_strings]
>
>
>
> That's just a rough sketch, and completely untested, but it should get
>
> you headed in the right direction. Or at least one plausible direction.
>
> Old-time perl hackers will recognize this as the Schwartzian Transform.
I will test it and let you know. :) Pander
--
http://mail.python.org/mailman/listinfo/python-list
Re: Custom alphabetical sort
> > > > > > I'm assuming that doesn't correspond to some standard locale's collating > > > > order, so we really do need to roll our own encoding (and that you have > > > > a good reason for wanting to do this). > > > > > > It is for creating a Dutch dictionary. > > > > Wait a minute. You're telling me that Python, of all languages, doesn't > > have a built-in way to sort Dutch words??? Not when you want Roman characters with diacritics to be sorted in the normal a-Z range. -- http://mail.python.org/mailman/listinfo/python-list
Re: Custom alphabetical sort
On Monday, December 24, 2012 7:12:43 PM UTC+1, Joshua Landau wrote: > On 24 December 2012 16:18, Roy Smith wrote: > > > > > In article <[email protected]>, > > Pander Musubi wrote: > > > > > Hi all, > > > > > > > I would like to sort according to this order: > > > > > > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', > > > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C', > > > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f', > > > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?', > > > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?', > > > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R', > > > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v', > > > > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') > > > > > > > How can I do this? The default sorted() does not give the desired result. > > > > > > > > > Given all that, I would start by writing some code which turned your > > alphabet into a pair of dicts. One maps from the code point to a > > collating sequence number (i.e. ordinals), the other maps back. > > Something like (for python 2.7): > > > > alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', > > '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?', > > [...] > > > 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') > > > > map1 = {c: n for n, c in enumerate(alphabet)} > > map2 = {n: c for n, c in enumerate(alphabet)} > > > > Next, I would write some functions which encode your strings as lists of > > ordinals (and back again) > > > > def encode(s): > > "encode('foo') ==> [34, 19, 19]" # made-up ordinals > > return [map1[c] for c in s] > > > > def decode(l): > > "decode([34, 19, 19]) ==> 'foo'" > > return ''.join(map2[i] for i in l) > > > > Use these to convert your strings to lists of ints which will sort as > > per your specified collating order, and then back again: > > > > encoded_strings = [encode(s) for s in original_list] > > encoded_strings.sort() > > sorted_strings = [decode(l) for l in encoded_strings] > > > > This isn't needed and the not-so-new way to do this is through .sort's key > attribute. > > > > > encoded_strings = [encode(s) for s in original_list] > encoded_strings.sort() > sorted_strings = [decode(l) for l in encoded_strings] > > > > changes to > > > > > encoded_strings.sort(key=encode) > > > > [Which happens to be faster ] > > > > > Hence you neither need map2 or decode: > > > ## CODE ## > > > > > > alphabet = ( > ' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', >
Re: Custom alphabetical sort
On Monday, 24 December 2012 16:32:56 UTC+1, Pander Musubi wrote:
> Hi all,
>
> I would like to sort according to this order:
>
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C',
> 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f',
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì',
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö',
> 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R',
> 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v',
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> How can I do this? The default sorted() does not give the desired result.
>
> Thanks,
>
> Pander
Meanwhile Python 3 supports locale aware sorting, see
https://docs.python.org/3/howto/sorting.html
--
https://mail.python.org/mailman/listinfo/python-list
