Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-22 Thread Michael Torrie via Python-list
rward matter to explain the Unicode standard and its > requirements. Instead, we get a series of slop posts that, no doubt, > will be crawled by further AI trainers, reinforcing the fiction. The other day one of my own forum posts from years ago was cited by Google's AI summary when

Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-22 Thread Chris Angelico via Python-list
m not > > even going to bother reading further, because every post you've > > written smells like AI slop. > > Interesting. Even a google search AI summary says (quite confidently of > course) that Java 9 and Swift both follow the unicode standard and use > "SS."

Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-22 Thread Michael Torrie via Python-list
en smells like AI slop. Interesting. Even a google search AI summary says (quite confidently of course) that Java 9 and Swift both follow the unicode standard and use "SS." Maybe he asked the wrong AI about that! -- https://mail.python.org/mailman3//lists/python-list.python.org

Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-19 Thread Chris Angelico via Python-list
On Mon, 20 Oct 2025 at 02:01, wrote: > > Thanks again for your detailed reply — I really appreciate it. I have to > admit, I wasn’t 100% sure about my data, which is why I submitted it for > discussion before opening a bug report to the Python developers. > Don't. Don't open a discussion based

Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-19 Thread python
Thanks again for your detailed reply — I really appreciate it. I have to admit, I wasn’t 100% sure about my data, which is why I submitted it for discussion before opening a bug report to the Python developers. I alredy checked Unicode tables, I saw that the capital ß (U+1E9E) was already

Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-18 Thread Chris Angelico via Python-list
On Sun, 19 Oct 2025 at 11:03, wrote: > > Thanks Chris for the response! > > As The Unicode Standard does define an uppercase form for the German sharp S > (U+00DF → U+1E9E), and this has been part of Unicode since version 5.1 > (2008), with the German orthography officially a

Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-18 Thread python
Thanks Chris for the response! As The Unicode Standard does define an uppercase form for the German sharp S (U+00DF → U+1E9E), and this has been part of Unicode since version 5.1 (2008), with the German orthography officially adopting it in 2017. The relevant case mappings are clearly

Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-18 Thread Mashaal Al Hammdi via Python-list
on to an inconsistency and legacy behavior > regarding the handling of the German sharp S characters in Python’s string > case conversion methods. > > > > This isn't Python's decision. The definition of Unicode case > conversion is laid out in the Unicode standard; for U+0

Re: Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-17 Thread Chris Angelico via Python-list
7;s decision. The definition of Unicode case conversion is laid out in the Unicode standard; for U+00DF, you can find it in this page: https://www.unicode.org/charts/PDF/U0080.pdf > This would help align Python with current Unicode standards Do you have a reference for this? Is there a curren

Proposal to update Unicode handling for German sharp S (ß / ẞ) in Python’s case conversion methods

2025-10-17 Thread python
officially recognized in German orthography and Unicode standards, providing a direct uppercase counterpart to the lowercase “ß” (U+00DF). Many modern programming languages and libraries—such as Java (from Java 9+), ICU, and recent versions of Swift—support this direct uppercase/lowercase mapping

Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?

2022-11-13 Thread Eryk Sun
On 11/13/22, Jessica Smith <[email protected]> wrote: > Consider the following code ran in Powershell or cmd.exe: > > $ python -c "print('└')" > └ > > $ python -c "print('└')" > test_file.txt > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python38\

Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?

2022-11-13 Thread Thomas Passin
t;C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to Is this a known limit

Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?

2022-11-13 Thread Barry
t > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode >return codecs.charmap_encode(input,self.errors,encoding_table)[0] > UnicodeEncodeError: 'charmap' codec can&#

Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?

2022-11-13 Thread Jessica Smith
2.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to Is this a known limitation of Windows + Unicode? I understand th

[Python-announce] ANN: unicode 2.9

2022-06-03 Thread garabik-news-2005-05
unicode is a simple python command line utility that displays properties for a given unicode character, or searches unicode database for a given name. It was written with Linux in mind, but should work almost everywhere (including MS Windows and MacOSX), UTF-8 console is recommended. ˙pɹɐpuɐʇs

Re: Printing Unicode strings in a list

2022-04-30 Thread Chris Angelico
On Sun, 1 May 2022 at 00:03, Vlastimil Brom wrote: > (Even the redundant u prefix from your python2 sample is apparently > accepted, maybe for compatibility reasons.) Yes, for compatibility reasons. It wasn't accepted in Python 3.0, but 3.3 re-added it to make porting easier. It doesn't do anythi

Re: Printing Unicode strings in a list

2022-04-30 Thread Vlastimil Brom
have good reasons for doing so and > will be moving to Python 3.x in due course. > > I have the following questions arising from the log: > > 1. Why does the second print statement not produce [ ║] or ["║"] ? > > 2. Should the second print statement produce [ ║] or [

Re: Printing Unicode strings in a list

2022-04-28 Thread Rob Cliffe via Python-list
On 28/04/2022 14:27, Stephen Tucker wrote: To Cameron Simpson, Thanks for your in-depth and helpful reply. I have noted it and will be giving it close attention when I can. The main reason why I am still using Python 2.x is that my colleagues are still using a GIS system that has a Python pro

Re: Printing Unicode strings in a list

2022-04-28 Thread Jon Ribbens via Python-list
don't have their own str converter, so fall back to repr instead, which outputs '[', followed by the repr of each list item separated by ', ', followed by ']'. > 2. Should the second print statement produce [ ║] or ["║"] ? There's certainly n

Re: Printing Unicode strings in a list

2022-04-28 Thread Stephen Tucker
tement not produce [ ║] or ["║"] ? > > Because print() prints the str() or each of its arguments, and str() of > a list if the same as its repr(), which is a list of the repr()s of > every item in the list. Repr of a Unicode string looks like what you > have in Python 2.

Re: Printing Unicode strings in a list

2022-04-28 Thread Cameron Simpson
course. Love to hear those reasons. Not suggesting that they are invalid. >I have the following questions arising from the log: >1. Why does the second print statement not produce [ ║] or ["║"] ? Because print() prints the str() or each of its arguments, and str() of a list

Printing Unicode strings in a list

2022-04-28 Thread Stephen Tucker
oes the second print statement not produce [ ║] or ["║"] ? 2. Should the second print statement produce [ ║] or ["║"] ? 3. Given that I want to print a list of Unicode strings so that their characters are displayed (instead of their Unicode codepoint definitions), is there a more P

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-04-07 Thread Anssi Saari
Dennis Lee Bieber writes: > On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico > declaimed the following: > > >>That's jmf. Ignore him. He knows nothing about Unicode and is >>determined to make everyone aware of that fact. >> >>He got blocked from the

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-04-01 Thread Chris Angelico
On Fri, 1 Apr 2022 at 11:16, Dennis Lee Bieber wrote: > > On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico > declaimed the following: > > > >That's jmf. Ignore him. He knows nothing about Unicode and is > >determined to make everyone aware of that fact. > &

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-03-31 Thread Dennis Lee Bieber
On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico declaimed the following: >That's jmf. Ignore him. He knows nothing about Unicode and is >determined to make everyone aware of that fact. > >He got blocked from the mailing list ages ago, and I don't think >anyone's

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-03-31 Thread Chris Angelico
gt;>> len('äÄöÖüÜ'.encode('utf-8')) > >12 > >>>> > >>>> ? > > Is there a question in there somewhere? > > Crystal ball is hazy... > > However... Note that once you encode the Unicode literal, you h

Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-03-31 Thread Dennis Lee Bieber
;> >>>> ? Is there a question in there somewhere? Crystal ball is hazy... However... Note that once you encode the Unicode literal, you have a BYTE string. There are 12 bytes in that binary -- it is NOT considered Unicode at that point (only when you decode it with th

Re: ANN: unicode 2.8

2021-01-02 Thread Chris Angelico
On Sun, Jan 3, 2021 at 10:28 AM Terry Reedy wrote: > > And when implementing this, it was a no-brainer to include also the > > brexit varian (verbatim). > > I assume you meant 'variation' and not Varian, the maker of scientific > instruments. I assumed simple typo for "variant" ChrisA -- https:

Re: ANN: unicode 2.8

2021-01-02 Thread Terry Reedy
On 1/1/2021 3:48 PM, [email protected] wrote: Terry Reedy wrote: On 12/31/2020 9:36 AM, [email protected] wrote: unicode is a simple python command line utility that displays properties for a given unicode character, or searches unicode

Re: ANN: unicode 2.8

2021-01-01 Thread garabik-news-2005-05
Terry Reedy wrote: > On 12/31/2020 9:36 AM, [email protected] wrote: >> unicode is a simple python command line utility that displays >> properties for a given unicode character, or searches >> unicode database for a given name. > ... >> C

Re: ANN: unicode 2.8

2020-12-31 Thread Terry Reedy
On 12/31/2020 9:36 AM, [email protected] wrote: unicode is a simple python command line utility that displays properties for a given unicode character, or searches unicode database for a given name. ... Changes since previous versions: * display ASCII table

ANN: unicode 2.8

2020-12-31 Thread garabik-news-2005-05
unicode is a simple python command line utility that displays properties for a given unicode character, or searches unicode database for a given name. It was written with Linux in mind, but should work almost everywhere (including MS Windows and MacOSX), UTF-8 console is recommended. ˙pɹɐpuɐʇs

Re: Friday Finking: Beyond implementing Unicode

2020-06-17 Thread Terry Reedy
On 6/16/2020 7:45 PM, DL Neil via Python-list wrote: On 13/06/20 4:47 AM, Terry Reedy wrote: There was a recent thread on python-ideas discussing this.  It started with arrow characters.  There have been others. Am pleased to hear that it's neither 'new' nor 'way out there'... The idea has b

Re: Friday Finking: Beyond implementing Unicode

2020-06-16 Thread DL Neil via Python-list
There was a recent thread on python-ideas discussing this.  It started with arrow characters.  There have been others. Am pleased to hear that it's neither 'new' nor 'way out there'... Am not subscribed to that list. Went looking for its archives, but failed - there's no "ideas" on (https://

Re: Friday Finking: Beyond implementing Unicode

2020-06-16 Thread DL Neil via Python-list
On 13/06/20 5:11 AM, Dennis Lee Bieber wrote: On Fri, 12 Jun 2020 18:03:55 +1200, DL Neil via Python-list declaimed the following: There is/was a language called "APL" (and yes the acronym means "A Programming Language", and yes it started the craze, through "B" (and BCPL), and yes, that brou

Re: Friday Finking: Beyond implementing Unicode

2020-06-16 Thread DL Neil via Python-list
On 13/06/20 4:47 AM, Terry Reedy wrote: On 6/12/2020 2:03 AM, DL Neil via Python-list wrote: Unicode has given us access to a wealth of mathematical and other symbols. Hardware and soft-/firm-ware flexibility enable us to move beyond and develop new 'standards'. Do we have opport

Re: Friday Finking: Beyond implementing Unicode

2020-06-12 Thread Terry Reedy
On 6/12/2020 2:03 AM, DL Neil via Python-list wrote: Unicode has given us access to a wealth of mathematical and other symbols. Hardware and soft-/firm-ware flexibility enable us to move beyond and develop new 'standards'. Do we have opportunities to make computer programming

Re: Friday Finking: Beyond implementing Unicode

2020-06-12 Thread Chris Angelico
On Fri, Jun 12, 2020 at 9:11 PM Elliott Roper wrote: > > On 12 Jun 2020 at 09:47:04 BST, "moi" wrote: > i) Who cares? Don't bother responding to him. He's somehow gotten the idea that Python's Unicode support is broken, and he spews his vomit out onto the ne

Re: Friday Finking: Beyond implementing Unicode

2020-06-12 Thread Elliott Roper
On 12 Jun 2020 at 09:47:04 BST, "moi" wrote: > i) Today there people, who are still not understanding this: > 'Å'.encode('utf-8') > b'\xc3\x85' 'Å'.encode('utf-16-le') > b'\xc5\x00' 'Å'.encode('utf-32-le') > b'\xc5\x00\x00\x00' > > ii) On a Western Europen Windows, Py 3 is not ev

Friday Finking: Beyond implementing Unicode

2020-06-11 Thread DL Neil via Python-list
Unicode has given us access to a wealth of mathematical and other symbols. Hardware and soft-/firm-ware flexibility enable us to move beyond and develop new 'standards'. Do we have opportunities to make computer programming more math-familiar and/or more logically-expressive, and t

Re: ÿ in Unicode

2020-03-07 Thread Grant Edwards
On 2020-03-07, Jon Ribbens via Python-list wrote: > On 2020-03-06, Jon Ribbens wrote: >> What's the bug, or source of amusement? > > Oh, that's fun. There's a Russian Fidonet gateway, that somehow > still exists, that's re-injecting usenet posts back into the group. Last time I think it was one

Re: ÿ in Unicode

2020-03-07 Thread Richard Damon
On 3/7/20 12:52 PM, Ben Bacarisse wrote: > moi writes: > >> Le samedi 7 mars 2020 16:41:10 UTC+1, R.Wieser a écrit : >>> Moi, >>> Fortunately, UTF-8 has not been created the Python devs. >>> >>> And there we go again, making vague statements/accusations - without >>> /anything/ to back it u

Re: ÿ in Unicode

2020-03-07 Thread Ben Bacarisse
moi writes: > Le samedi 7 mars 2020 16:41:10 UTC+1, R.Wieser a écrit : >> Moi, >> >> > Fortunately, UTF-8 has not been created the Python devs. >> >> And there we go again, making vague statements/accusations - without >> /anything/ to back it up ofcourse >> >> Kiddo, you have posted a couple

Re: ÿ in Unicode

2020-03-07 Thread R.Wieser
Moi, > Fortunately, UTF-8 has not been created the Python devs. And there we go again, making vague statements/accusations - without /anything/ to back it up ofcourse Kiddo, you have posted a couple of messages now, but have said exactly nothing. Are you sure you do not want to go into polit

Re: ÿ in Unicode

2020-03-07 Thread R.Wieser
Moi, > - Today, there are still people who do not understand a > "ÿ' can not be *safely* encoded with a single byte. It can (and has been done for ages), just not in the character encoding method you've choosen to use. > - Python == Latin-1 mess (as somebody wrote on a mailing list). Putting b

Re: ÿ in Unicode

2020-03-06 Thread Jon Ribbens via Python-list
On 2020-03-06, Jon Ribbens wrote: > What's the bug, or source of amusement? Oh, that's fun. There's a Russian Fidonet gateway, that somehow still exists, that's re-injecting usenet posts back into the group. -- https://mail.python.org/mailman/listinfo/python-list

Re: ÿ in Unicode

2020-03-06 Thread Chris Angelico
On Fri, Mar 6, 2020 at 9:31 PM Ben Bacarisse wrote: > > moi writes: > > > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a ÄäCcrit : > >> moi writes: > >> > >> 'Ääâ¿'.encode('utf-8') > >> > b'\xc3\xbf' > >> 'Ääâ¿'.encode('utf-16-le') > >> > b'\xff\x00' > >> 'Ääâ¿'.encode('utf-

Re: ÿ in Unicode

2020-03-06 Thread Ben Bacarisse
moi writes: > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a ÄCcrit : >> moi writes: >> >> 'Ä¿'.encode('utf-8') >> > b'\xc3\xbf' >> 'Ä¿'.encode('utf-16-le') >> > b'\xff\x00' >> 'Ä¿'.encode('utf-32-le') >> > b'\xff\x00\x00\x00' >> > >> That all looks as expected. > Yes > >>Is

Re: ÿ in Unicode

2020-03-06 Thread Jon Ribbens via Python-list
t;>> >>>> That all looks as expected. >>> Yes >>> >>>>Is there something about the output that puzzles you? >>> No >>> >>>>Did you have a question? >>> No, only a comment >>> >>> This buggy language

Re: ÿ in Unicode

2020-03-06 Thread Pieter van Oostrum
Jon Ribbens writes: > On 2020-03-06, moi wrote: >> Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a ÄäCcritÄø : >>> moi writes: >>> 'Ääâ¿'.encode('utf-8') >>> > b'\xc3\xbf' >>> 'Ääâ¿'.encode('utf-16-le') >>> > b'\xff\x00' >>> 'Ääâ¿'.encode('utf-32-le') >>> > b'\xff\x00\x00\x0

Re: ÿ in Unicode

2020-03-06 Thread Jon Ribbens via Python-list
On 2020-03-06, moi wrote: > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a ÄäCcritÄø : >> moi writes: >> 'Ääâ¿'.encode('utf-8') >> > b'\xc3\xbf' >> 'Ääâ¿'.encode('utf-16-le') >> > b'\xff\x00' >> 'Ääâ¿'.encode('utf-32-le') >> > b'\xff\x00\x00\x00' > >> That all looks as expect

Re: ÿ in Unicode

2020-03-06 Thread Pieter van Oostrum
Jon Ribbens writes: > On 2020-03-06, moi wrote: >> Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a ÄCcritâ : >>> moi writes: >>> 'Ä¿'.encode('utf-8') >>> > b'\xc3\xbf' >>> 'Ä¿'.encode('utf-16-le') >>> > b'\xff\x00' >>> 'Ä¿'.encode('utf-32-le') >>> > b'\xff\x00\x00\x00' >> >>

Re: ÿ in Unicode

2020-03-06 Thread Ben Bacarisse
moi writes: > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a écrit : >> moi writes: >> >> 'ÿ'.encode('utf-8') >> > b'\xc3\xbf' >> 'ÿ'.encode('utf-16-le') >> > b'\xff\x00' >> 'ÿ'.encode('utf-32-le') >> > b'\xff\x00\x00\x00' >> > >> That all looks as expected. > Yes > >>Is the

Re: ÿ in Unicode

2020-03-06 Thread Chris Angelico
On Fri, Mar 6, 2020 at 9:31 PM Ben Bacarisse wrote: > > moi writes: > > > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a ÄCcrit : > >> moi writes: > >> > >> 'Ä¿'.encode('utf-8') > >> > b'\xc3\xbf' > >> 'Ä¿'.encode('utf-16-le') > >> > b'\xff\x00' > >> 'Ä¿'.encode('utf-32-le')

Re: ÿ in Unicode

2020-03-06 Thread Jon Ribbens via Python-list
On 2020-03-06, moi wrote: > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a ÄCcritâ : >> moi writes: >> 'Ä¿'.encode('utf-8') >> > b'\xc3\xbf' >> 'Ä¿'.encode('utf-16-le') >> > b'\xff\x00' >> 'Ä¿'.encode('utf-32-le') >> > b'\xff\x00\x00\x00' > >> That all looks as expected. > Ye

Re: ÿ in Unicode

2020-03-06 Thread Jon Ribbens via Python-list
>>>> That all looks as expected. >>> Yes >>> >>>>Is there something about the output that puzzles you? >>> No >>> >>>>Did you have a question? >>> No, only a comment >>> >>> This buggy language is ver

Re: ÿ in Unicode

2020-03-06 Thread Jon Ribbens via Python-list
>>>> That all looks as expected. >>> Yes >>> >>>>Is there something about the output that puzzles you? >>> No >>> >>>>Did you have a question? >>> No, only a comment >>> >>> This buggy language is very amus

Re: ÿ in Unicode

2020-03-06 Thread Pieter van Oostrum
Jon Ribbens writes: > On 2020-03-06, moi wrote: >> Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a écrit : >>> moi writes: >>> 'ÿ'.encode('utf-8') >>> > b'\xc3\xbf' >>> 'ÿ'.encode('utf-16-le') >>> > b'\xff\x00' >>> 'ÿ'.encode('utf-32-le') >>> > b'\xff\x00\x00\x00' >> >>> Tha

Re: ÿ in Unicode

2020-03-06 Thread Jon Ribbens via Python-list
On 2020-03-06, moi wrote: > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a écrit : >> moi writes: >> 'ÿ'.encode('utf-8') >> > b'\xc3\xbf' >> 'ÿ'.encode('utf-16-le') >> > b'\xff\x00' >> 'ÿ'.encode('utf-32-le') >> > b'\xff\x00\x00\x00' > >> That all looks as expected. > Yes > >

Re: ÿ in Unicode

2020-03-06 Thread Chris Angelico
On Fri, Mar 6, 2020 at 9:31 PM Ben Bacarisse wrote: > > moi writes: > > > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a écrit : > >> moi writes: > >> > >> 'ÿ'.encode('utf-8') > >> > b'\xc3\xbf' > >> 'ÿ'.encode('utf-16-le') > >> > b'\xff\x00' > >> 'ÿ'.encode('utf-32-le') > >

Re: ÿ in Unicode

2020-03-06 Thread Ben Bacarisse
moi writes: > Le jeudi 5 mars 2020 13:20:38 UTC+1, Ben Bacarisse a écrit : >> moi writes: >> >> 'ÿ'.encode('utf-8') >> > b'\xc3\xbf' >> 'ÿ'.encode('utf-16-le') >> > b'\xff\x00' >> 'ÿ'.encode('utf-32-le') >> > b'\xff\x00\x00\x00' >> > >> That all looks as expected. > Yes > >>Is t

Re: ÿ in Unicode

2020-03-05 Thread Ben Bacarisse
moi writes: 'ÿ'.encode('utf-8') > b'\xc3\xbf' 'ÿ'.encode('utf-16-le') > b'\xff\x00' 'ÿ'.encode('utf-32-le') > b'\xff\x00\x00\x00' That all looks as expected. Is there something about the output that puzzles you? Did you have a question? -- Ben. -- https://mail.python.org/mail

Re: Unicode filenames

2019-12-07 Thread Chris Angelico
; many, many years! If they're that short and people are depending on them, it won't be too much work to port them. And you gain a huge measure of reliability: you no longer have to worry about "Unicode filenames" - or, to be more precise, "non-ASCII filenames" -

Re: Unicode filenames

2019-12-07 Thread Bob van der Poel
> >>> I have some files which came off the net with, I'm assuming, unicode > >>> characters in the names. I have a very short program which takes the > >>> filename and puts into an emacs buffer, and then lets me add > information > >> to > &

Re: Unicode filenames

2019-12-07 Thread DL Neil via Python-list
On 8/12/19 5:50 AM, Bob van der Poel wrote: On Sat, Dec 7, 2019 at 4:00 AM Barry Scott wrote: On 6 Dec 2019, at 18:17, Bob van der Poel wrote: I have some files which came off the net with, I'm assuming, unicode characters in the names. I have a very short program which takes the fil

Re: Unicode filenames

2019-12-07 Thread Bob van der Poel
On Sat, Dec 7, 2019 at 4:00 AM Barry Scott wrote: > > > > On 6 Dec 2019, at 18:17, Bob van der Poel wrote: > > > > I have some files which came off the net with, I'm assuming, unicode > > characters in the names. I have a very short program which takes the

Re: Unicode filenames

2019-12-07 Thread Barry Scott
> On 6 Dec 2019, at 18:17, Bob van der Poel wrote: > > I have some files which came off the net with, I'm assuming, unicode > characters in the names. I have a very short program which takes the > filename and puts into an emacs buffer, and then lets me add informatio

Re: Unicode filenames

2019-12-07 Thread Peter Otten
Bob van der Poel wrote: > I have some files which came off the net with, I'm assuming, unicode > characters in the names. I have a very short program which takes the > filename and puts into an emacs buffer, and then lets me add information > to that new file (it's a poor m

Re: Unicode filenames

2019-12-06 Thread Terry Reedy
On 12/6/2019 1:17 PM, Bob van der Poel wrote: I have some files which came off the net with, I'm assuming, unicode characters in the names. I have a very short program which takes the filename and puts into an emacs buffer, and then lets me add information to that new file (it's a poo

Re: Unicode filenames

2019-12-06 Thread DL Neil via Python-list
On 7/12/19 7:17 AM, Bob van der Poel wrote: I have some files which came off the net with, I'm assuming, unicode characters in the names. I have a very short program which takes the filename and puts into an emacs buffer, and then lets me add information to that new file (it's a poo

Unicode filenames

2019-12-06 Thread Bob van der Poel
I have some files which came off the net with, I'm assuming, unicode characters in the names. I have a very short program which takes the filename and puts into an emacs buffer, and then lets me add information to that new file (it's a poor man's DB). Next, I can look up text in th

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-19 Thread MRAB
On 2019-09-19 09:55, Gregory Ewing wrote: Eli the Bearded wrote: There isn't anything called UCS1. Apparently there is, but it's not a character set, it's a loudspeaker. https://www.bhphotovideo.com/c/product/1205978-REG/yorkville_sound_ucs1_1200w_15_horn_loaded.html The OP might mean Py_UCS

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-19 Thread Gregory Ewing
Eli the Bearded wrote: There isn't anything called UCS1. Apparently there is, but it's not a character set, it's a loudspeaker. https://www.bhphotovideo.com/c/product/1205978-REG/yorkville_sound_ucs1_1200w_15_horn_loaded.html -- Greg -- https://mail.python.org/mailman/listinfo/python-list

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-17 Thread Chris Angelico
On Wed, Sep 18, 2019 at 6:51 AM Eli the Bearded <*@eli.users.panix.com> wrote: > > In comp.lang.python, moi wrote: > > I hope, one day, for those who are interested in Unicode, > > they find a book, publication, ... which will explain > > what is UCS1. > > The

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-17 Thread Eli the Bearded
In comp.lang.python, moi wrote: > I hope, one day, for those who are interested in Unicode, > they find a book, publication, ... which will explain > what is UCS1. There isn't anything called UCS1. There is a UTF-1, but don't use it. UTF-8 is better in every way. https://en

Re: unicode mail list archeology

2019-04-20 Thread Luuk
On 20-4-2019 12:47, Luuk wrote: On 20-4-2019 11:26, [email protected] wrote: http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML018/0594.html [quoot] > It is simple to make a compacter version of UTF-8 using the base > 256 character codes were possible (comacter for many lan

Re: unicode mail list archeology

2019-04-20 Thread Luuk
On 20-4-2019 11:26, [email protected] wrote: http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML018/0594.html [quoot] > It is simple to make a compacter version of UTF-8 using the base > 256 character codes were possible (comacter for many languages). No. If you think otherwis

Re: Python2.7 unicode conundrum

2018-11-26 Thread Robert Latest via Python-list
Richard Damon wrote: > Why do you say it has been convert to 'Latin'. The string prints as > being Unicode. Internally Python doesn't store strings as UTF-8, but as > plain Unicode (UCS-2 or UCS-4 as needed), and code-point E4 is the > character you want. You'r

Re: Python2.7 unicode conundrum

2018-11-25 Thread Richard Damon
8 20 2d 2a 2d 0a 0a 73 20 3d 20 75 27 |utf8 -*-..s = u'| > 0020 c3 a4 27 0a 0a 70 72 69 6e 74 28 73 29 0a 70 72 |..'..print(s).pr| > 0030 69 6e 74 28 28 73 2c 20 29 29 0a 0a |int((s,))..| > 003c > dh@jenna:~/python$ python unicode.py > ä &g

Re: Python2.7 unicode conundrum

2018-11-25 Thread Thomas Jollans
4' in the > third line of the hexdump). When just printed, the string "s" is > displayed correctly as 'ä' (a umlaut), but the string representation > shows that it seems to have been converted to latin-1 'e4' somewhere on > the way. It's not being con

Python2.7 unicode conundrum

2018-11-25 Thread Robert Latest via Python-list
Hi folks, what semmingly started out as a weird database character encoding mix-up could be boiled down to a few lines of pure Python. The source-code below is real utf8 (as evidenced by the UTF code point 'c3 a4' in the third line of the hexdump). When just printed, the string "s" is displayed cor

Re: Email parsing and unicode/utf8

2018-10-15 Thread dieter
Thomas Jollans writes: > I just stumbled over some curious behaviour of the stdlib email parsing > APIs which accept strings rather than bytes. It appears that you can't > parse an 8-bit UTF-8 message you have as a str without first encoding it. The primary purpose of an email parser is likely th

Email parsing and unicode/utf8

2018-10-15 Thread Thomas Jollans
Hi, I just stumbled over some curious behaviour of the stdlib email parsing APIs which accept strings rather than bytes. It appears that you can't parse an 8-bit UTF-8 message you have as a str without first encoding it. The docs

Re: Non-unicode file names

2018-08-09 Thread Thomas Jollans
On 09/08/18 05:13, INADA Naoki wrote: > Please use Python 3.7. > > Python 3.7 has several improvements on this area. Thanks! Darkly remembering something about UTF-8 mode, I suspected it might... > > * When PEP 538 or 540 is used, default error handler for stdio is > surrogateescape > * You can

Re: Non-unicode file names

2018-08-08 Thread Marko Rauhamaa
INADA Naoki : > For Python 3.6, I think best way to allow arbitrary bytes on stdout is > using `PYTHONIOENCODING=utf-8:surrogateescape` environment variable. Good info! Marko -- https://mail.python.org/mailman/listinfo/python-list

Re: Non-unicode file names

2018-08-08 Thread INADA Naoki
Please use Python 3.7. Python 3.7 has several improvements on this area. * When PEP 538 or 540 is used, default error handler for stdio is surrogateescape * You can sys.stdout.reconfigure(errors='surrogateescape') For Python 3.6, I think best way to allow arbitrary bytes on stdout is using `PYTH

Re: Non-unicode file names

2018-08-08 Thread Cameron Simpson
On 09Aug2018 03:14, MRAB wrote: [...] Is it true that Unix filenames can contain control characters, e.g. \x07? Yep. They're just byte strings. You can't have \0 (NUL) because the API uses NUL terminated strings, and you can't use slash '/' in the filename components because that is the comp

Re: Non-unicode file names

2018-08-08 Thread MRAB
On 2018-08-09 01:14, Thomas Jollans wrote: On 09/08/18 01:48, MRAB wrote: On 2018-08-08 23:16, Thomas Jollans wrote: On *nix, file names are bytes. In real life, we prefer to think of file names as strings. How non-ASCII file names are created is determined by the locale, and on most systems th

Re: Non-unicode file names

2018-08-08 Thread Thomas Jollans
On 09/08/18 01:48, MRAB wrote: > On 2018-08-08 23:16, Thomas Jollans wrote: >> On *nix, file names are bytes. In real life, we prefer to think of file >> names as strings. How non-ASCII file names are created is determined by >> the locale, and on most systems these days, every locale uses UTF-8 an

Re: Non-unicode file names

2018-08-08 Thread MRAB
On 2018-08-08 23:16, Thomas Jollans wrote: On *nix, file names are bytes. In real life, we prefer to think of file names as strings. How non-ASCII file names are created is determined by the locale, and on most systems these days, every locale uses UTF-8 and everybody's happy. Of course this does

Non-unicode file names

2018-08-08 Thread Thomas Jollans
On *nix, file names are bytes. In real life, we prefer to think of file names as strings. How non-ASCII file names are created is determined by the locale, and on most systems these days, every locale uses UTF-8 and everybody's happy. Of course this doesn't mean you'll never run into and old direct

Re: Unicode [was Re: Cult-like behaviour]

2018-07-17 Thread Tim Chase
On 2018-07-17 08:37, Marko Rauhamaa wrote: > Tim Chase : > > Wait, but now you're talking about vendors. Much of the crux of > > this discussion has been about personal scripts that don't need to > > marshal Unicode strings in and out of various functions/objec

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Marko Rauhamaa
Unless a consortium is >> erected to support Python2, no vendor will be able to use it in the >> medium term. > > Wait, but now you're talking about vendors. Much of the crux of this > discussion has been about personal scripts that don't need to > marshal Unicode strin

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Tim Chase
rt Python2, no vendor will be able to use it in the > medium term. Wait, but now you're talking about vendors. Much of the crux of this discussion has been about personal scripts that don't need to marshal Unicode strings in and out of various functions/objects. If you have a py2 scri

Unicode is not UTF-32 [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
UTF-32 is implementation, not semantics: it specifies how to represent Unicode code points as bytes in memory, not what Unicode code points are. Python 3 strings are sequences of abstract characters ("code points") with no mandatory implementation. In CPython, some string objects are

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence
y read my words with *intent* rather than *reaction*, you would notice that I suggested the *option* of turning off Unicode.  I didn't say get *rid* of Unicode.  I didn't say make it *harder* to use Unicode.  Once again - reaction rather than reading. Obviously, the most vocal represent

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread MRAB
On 2018-07-16 21:59, Marko Rauhamaa wrote: Tim Chase : While the python world has moved its efforts into improving Python3, Python2 hasn't suddenly stopped working. The sword of Damocles is hanging on its head. Unless a consortium is erected to support Python2, no vendor will be able to use it

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Chris Angelico
On Tue, Jul 17, 2018 at 6:32 AM, Tim Chase wrote: > On 2018-07-16 18:31, Steven D'Aprano wrote: >> You say that all you want is a switch to turn off Unicode (and >> replace it with what? Kanji strings? Cyrillic? Shift_JS? no of >> course not, I'm being absurd -- r

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Marko Rauhamaa
Tim Chase : > While the python world has moved its efforts into improving Python3, > Python2 hasn't suddenly stopped working. The sword of Damocles is hanging on its head. Unless a consortium is erected to support Python2, no vendor will be able to use it in the medium term. Given the recent even

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Tim Chase
On 2018-07-16 18:31, Steven D'Aprano wrote: > You say that all you want is a switch to turn off Unicode (and > replace it with what? Kanji strings? Cyrillic? Shift_JS? no of > course not, I'm being absurd -- replace it with ASCII, what else > could any right-thinking person

  1   2   3   4   5   6   7   8   9   10   >