date:20140502

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano

On Thu, 01 May 2014 21:42:21 -0700, Rustom Mody wrote:

> Whats the best cure for headache?
> 
> Cut off the head

o_O

I don't think so.

> Whats the best cure for Unicode?
> 
> Ascii

Unicode is not a problem to be solved.

The inability to write standard human text in ASCII is a problem, e.g. 
one cannot write

“ASCII For Dummies” © 2014 by Zöe Smith, now on sale 99¢

so even *Americans* cannot represent all their common characters in 
ASCII, let alone specialised characters from mathematics, science, the 
printing industry, and law. And even Americans sometimes need to write 
text in Foreign. Where is your ASCII now?

The solution is to have at least one encoding which contains the 
additional characters needed.

The plethora of such additional encodings is a problem. The solution is a 
single encoding that covers all needed characters, like Unicode, so that 
there is no need to handle multiple encodings.

The inability for plain text files to record metadata of what encoding 
they use is a problem. The solution is to standardize on a single, world-
wide encoding, like Unicode.

> Saying however that there is no headache in unicode does not make the
> headache go away:
> 
> http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/
> 
> No I am not saying that the contents/style/tone are right. However
> people are evidently suffering the transition. Denying it is not a help.

Transitions are always more painful than after the transition has settled 
down. As I have said repeatedly, I look forward for the day when nobody 
but document archivists and academics need care about legacy encodings. 
But we're not there yet.

> And unicode consortium's ways are not exactly helpful to its own cause:
> Imagine the C standard committee deciding that adding mandatory garbage
> collection to C is a neat idea
> 
> Unicode consortium's going from old BMP to current (6.0) SMPs to
> who-knows-what in the future is similar.

I don't see the connection.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano

On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:

> I dont know how one causally connects the 'headaches' but Ive seen -
> mojibake

Mojibake is certainly more common with multiple encodings, but the 
solution to that is Unicode, not ASCII.

In fact, in your blog post you even link to a post of mine where I 
explain that ASCII has gone through multiple backwards incompatible 
changes over the decades, which means you can have a limited form of 
mojibake even in pure ASCII. Between changes over various versions of 
ASCII, and ambiguous characters allowed by the standard, you needed some 
sort of out-of-band metadata to tell you whether they intended an @ or a 
`, a | or a ¬, a £ or a #, to mention only a few.

It's only since the 1980s that ASCII, actual 7-bit US ASCII, has become 
an unambiguous standard. But that's okay, because that merely allowed 
people to create dozens of 7-bit and 8-bit variations on ASCII, all 
incompatible with each other, and *call them ASCII* regardless of the 
actual standard name.

Between ambiguities in actual ASCII, and common practice to label non-
ASCII as ASCII, I can categorically say that mojibake has always been 
possible in so-called "plain text". If you haven't noticed it, it was 
because you were only exchanging documents with people who happened to 
use the same set of characters as you.

> - unicode 'number-boxes' (what are these called?) 

They are missing character glyphs, and they have nothing to do with 
Unicode. They are due to deficiencies in the text font you are using.

Admittedly with Unicode's 0x10 possible characters (actually more, 
since a single code point can have multiple glyphs) it isn't surprising 
that most font designers have neither the time, skill or desire to create 
a glyph for every single code point. But then the same applies even for 
more restrictive 8-bit encodings -- sometimes font designers don't even 
bother providing glyphs for *ASCII* characters.

(E.g. they may only provide glyphs for uppercase A...Z, not lowercase.)

> - Worst of all what we
> *dont* see -- how many others dont see what we see?

Again, this a deficiency of the font. There are very few code points in 
Unicode which are intended to be invisible, e.g. space, newline, zero-
width joiner, control characters, etc., but they ought to be equally 
invisible to everyone. No printable character should ever be invisible in 
any decent font.

> I never knew of any of this in the good ol days of ASCII

You must have been happy with a very impoverished set of symbols, then.

> ¶ Passive voice is often the best choice in the interests of political
> correctness
> 
> It would be a pleasant surprise if everyone sees a pilcrow at start of
> line above

I do.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Off-topic circumnavigating the earth in a mile or less

2014-05-02 Thread alister

On Thu, 01 May 2014 21:57:57 +0100, Adam Funk wrote:

> On 2014-05-01, Terry Reedy wrote:
> 
>> On 4/30/2014 7:46 PM, Ian Kelly wrote:
>>
>>> It also works if your starting point is (precisely) the north pole.  I
>>> believe that's the canonical answer to the riddle, since there are no
>>> bears in Antarctica.
>>
>> For the most part, there are no bears within a mile of the North Pole
>> either. "they are rare north of 88°" (ie, 140 miles from pole).
>> https://en.wikipedia.org/wiki/Polar_bears They mostly hunt in or near
>> open water, near the coastlines.
>>
>> I find it amusing that someone noticed and posted an alternate,
>> non-canonical  solution. How might a bear be near the south pole? As
>> long as we are being creative, suppose some jokester mounts a near
>> life-size stuffed black bear, made of cold-tolerant artificial
>> materials, near but not at the South Pole. The intent is to give fright
>> to naive newcomers. Someone walking in a radius 1/2pi circle about the
>> pole might easily see it.
> 
> OK, change bear to bird & the question to "What kind of bird is it?"


Arctic Turn is a valid answer for all locations :-)


-- 
Pardon me, but do you know what it means to be TRULY ONE with your BOOTH!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Chris Angelico

On Fri, May 2, 2014 at 6:08 PM, Steven D'Aprano
 wrote:
> ... even *Americans* cannot represent all their common characters in
> ASCII, let alone specialised characters from mathematics, science, the
> printing industry, and law.

Aside: What additional characters does law use that aren't in ASCII?
Section § and paragraph ¶ are used frequently, but you already
mentioned the printing industry. Are there other symbols?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Chris Angelico

On Fri, May 2, 2014 at 6:45 PM, Steven D'Aprano
 wrote:
>> - unicode 'number-boxes' (what are these called?)
>
> They are missing character glyphs, and they have nothing to do with
> Unicode. They are due to deficiencies in the text font you are using.
>
> Admittedly with Unicode's 0x10 possible characters (actually more,
> since a single code point can have multiple glyphs) it isn't surprising
> that most font designers have neither the time, skill or desire to create
> a glyph for every single code point. But then the same applies even for
> more restrictive 8-bit encodings -- sometimes font designers don't even
> bother providing glyphs for *ASCII* characters.
>
> (E.g. they may only provide glyphs for uppercase A...Z, not lowercase.)

This is another area where Unicode has given us "a great improvement
over the old method of giving satisfaction". Back in the 1990s on
OS/2, DOS, and Windows, a missing glyph might be (a) blank, (b) a
simple square with no information, or (c) copied from some other font
(common with dingbats fonts). With Unicode, the standard is to show a
little box *with the hex digits in it*. Granted, those boxes are a LOT
more readable for BMP characters than SMP (unless your text is huge,
six digits in the space of one character will make them pretty tiny),
and a "Unicode" font will generally include all (or at least most) of
the BMP, but it's still better than having no information at all.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Ben Finney

Chris Angelico  writes:

> On Fri, May 2, 2014 at 6:08 PM, Steven D'Aprano
>  wrote:
> > ... even *Americans* cannot represent all their common characters in
> > ASCII, let alone specialised characters from mathematics, science,
> > the printing industry, and law.
>
> Aside: What additional characters does law use that aren't in ASCII?
> Section § and paragraph ¶ are used frequently, but you already
> mentioned the printing industry. Are there other symbols?

ASCII does not contain “©” (U+00A9 COPYRIGHT SIGN) nor “®” (U+00AE
REGISTERED SIGN), for instance.

-- 
 \ “I got some new underwear the other day. Well, new to me.” —Emo |
  `\   Philips |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Chris Angelico

On Fri, May 2, 2014 at 7:16 PM, Ben Finney  wrote:
> Chris Angelico  writes:
>
>> On Fri, May 2, 2014 at 6:08 PM, Steven D'Aprano
>>  wrote:
>> > ... even *Americans* cannot represent all their common characters in
>> > ASCII, let alone specialised characters from mathematics, science,
>> > the printing industry, and law.
>>
>> Aside: What additional characters does law use that aren't in ASCII?
>> Section § and paragraph ¶ are used frequently, but you already
>> mentioned the printing industry. Are there other symbols?
>
> ASCII does not contain “©” (U+00A9 COPYRIGHT SIGN) nor “®” (U+00AE
> REGISTERED SIGN), for instance.

Heh! I forgot about those. U+00A9 in particular has gone so mainstream
that it's easy to think of it not as "I'm going to switch to my
'British English + Legal' dictionary now" and just as "This is a
critical part of the basic dictionary".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Designing a network in Python

2014-05-02 Thread varun7rs

On Wednesday, 30 April 2014 20:38:07 UTC+2, Joseph L. Casale  wrote:
> > I don't know how to do that stuff in python. Basically, I'm trying to pull 
> > certain data from the
> > xml file like the node-name, source, destination and the capacity. Since, I 
> > am done with that
> > part, I now want to have a link between source and destination and assign 
> > capacity to it.
> 
> I dont mind writing you an SQLite schema and accessor class, can you define 
> your data in a tabular
> format and mail it to me offline, we add relationships etc as we go.
> 
> Hopefully it inspires you to adopt this approach in the future as it often 
> proves powerful.
> 
> jlc

Thanks a lot for your help. But, how do I mail you? I can't find your mail id 
here
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Jussi Piitulainen

Chris Angelico writes:

> (common with dingbats fonts). With Unicode, the standard is to show
> a little box *with the hex digits in it*. Granted, those boxes are a
> LOT more readable for BMP characters than SMP (unless your text is
> huge, six digits in the space of one character will make them pretty
> tiny), and a "Unicode" font will generally include all (or at least
> most) of the BMP, but it's still better than having no information

I needed to see such tiny numbers just today, just the four of them in
the tiny box. So I pressed C-+ a few times to _make_ the text huge,
obtained my information, and returned to my normal text size with C--.

Perfect. Usually all I need to know is that I have a character for
which I don't have a glyph, but this time I wanted to record the
number because I was testing things rather than reading the text.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Marko Rauhamaa

Ben Finney :

>> Aside: What additional characters does law use that aren't in ASCII?
>> Section § and paragraph ¶ are used frequently, but you already
>> mentioned the printing industry. Are there other symbols?
>
> ASCII does not contain “©” (U+00A9 COPYRIGHT SIGN) nor “®” (U+00AE
> REGISTERED SIGN), for instance.

The em-dash is mapped on my keyboard — I use it quite often.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Rustom Mody

On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote:
> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:
> > - Worst of all what we
> > *dont* see -- how many others dont see what we see?

> Again, this a deficiency of the font. There are very few code points in 
> Unicode which are intended to be invisible, e.g. space, newline, zero-
> width joiner, control characters, etc., but they ought to be equally 
> invisible to everyone. No printable character should ever be invisible in 
> any decent font.

Thats not what I meant.

I wrote http://blog.languager.org/2014/04/unicoded-python.html
 – mostly on a debian box.
Later on seeing it on a less heavily setup ubuntu box, I see
 ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
have become 'missing-glyph' boxes.

It leads me ask, how much else of what I am writing, some random reader 
has simply not seen?
Quite simply we can never know – because most are going to go away saying
"mojibaked/garbled rubbish"

Speaking of what you understood of what I said:
Yes invisible chars is another problem I was recently bitten by.
I pasted something from google into emacs' org mode.
Following that link again I kept getting a broken link.

Until I found that the link had an invisible char

The problem was that emacs was faithfully rendering that char according
to standard, ie invisibly!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano

On Fri, 02 May 2014 19:01:44 +1000, Chris Angelico wrote:

> On Fri, May 2, 2014 at 6:08 PM, Steven D'Aprano
>  wrote:
>> ... even *Americans* cannot represent all their common characters in
>> ASCII, let alone specialised characters from mathematics, science, the
>> printing industry, and law.
> 
> Aside: What additional characters does law use that aren't in ASCII?
> Section § and paragraph ¶ are used frequently, but you already mentioned
> the printing industry. Are there other symbols?

I was thinking of copyright, trademark, registered mark, and similar. I 
think these are all of relevant characters:

py> for c in '©®℗™':
... unicodedata.name(c)
...
'COPYRIGHT SIGN'
'REGISTERED SIGN'
'SOUND RECORDING COPYRIGHT'
'TRADE MARK SIGN'



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano

On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote:

> On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote:
>> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:
>> > - Worst of all what we
>> > *dont* see -- how many others dont see what we see?
> 
>> Again, this a deficiency of the font. There are very few code points in
>> Unicode which are intended to be invisible, e.g. space, newline, zero-
>> width joiner, control characters, etc., but they ought to be equally
>> invisible to everyone. No printable character should ever be invisible
>> in any decent font.
> 
> Thats not what I meant.
> 
> I wrote http://blog.languager.org/2014/04/unicoded-python.html
>  – mostly on a debian box.
> Later on seeing it on a less heavily setup ubuntu box, I see
>  ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
> have become 'missing-glyph' boxes.
> 
> It leads me ask, how much else of what I am writing, some random reader
> has simply not seen?
> Quite simply we can never know – because most are going to go away
> saying "mojibaked/garbled rubbish"
> 
> Speaking of what you understood of what I said: Yes invisible chars is
> another problem I was recently bitten by. I pasted something from google
> into emacs' org mode. Following that link again I kept getting a broken
> link.
> 
> Until I found that the link had an invisible char
> 
> The problem was that emacs was faithfully rendering that char according
> to standard, ie invisibly!

And you've never been bitten by an invisible control character in ASCII 
text? You've lived a sheltered life!

Nothing you are describing is unique to Unicode.


-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Marko Rauhamaa

Steven D'Aprano :

> And you've never been bitten by an invisible control character in
> ASCII text? You've lived a sheltered life!

That reminds me: " " (nonbreakable space) is often used between numbers
and units, for example.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Tim Chase

On 2014-05-02 19:08, Chris Angelico wrote:
> This is another area where Unicode has given us "a great improvement
> over the old method of giving satisfaction". Back in the 1990s on
> OS/2, DOS, and Windows, a missing glyph might be (a) blank, (b) a
> simple square with no information, or (c) copied from some other
> font (common with dingbats fonts). With Unicode, the standard is to
> show a little box *with the hex digits in it*. Granted, those boxes
> are a LOT more readable for BMP characters than SMP (unless your
> text is huge, six digits in the space of one character will make
> them pretty tiny), and a "Unicode" font will generally include all
> (or at least most) of the BMP, but it's still better than having no
> information at all.

I'm pleased when applications & fonts work properly, using both the
placeholder fonts for "this character is legitimate but I can't
display it with a font, so here, have a box with the codepoint
numbers in it until I'm directed to use a more appropriate font at
which point you'll see it correctly" and the "somebody crammed garbage
in here, so I'll display it with "�" (U+FFFD) which is designated for
exactly this purpose".

-tkc

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Rustom Mody

On Friday, May 2, 2014 5:25:37 PM UTC+5:30, Steven D'Aprano wrote:
> On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote:

> > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote:
> >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:
> >> > - Worst of all what we
> >> > *dont* see -- how many others dont see what we see?
> >> Again, this a deficiency of the font. There are very few code points in
> >> Unicode which are intended to be invisible, e.g. space, newline, zero-
> >> width joiner, control characters, etc., but they ought to be equally
> >> invisible to everyone. No printable character should ever be invisible
> >> in any decent font.
> > Thats not what I meant.
> > I wrote http://blog.languager.org/2014/04/unicoded-python.html
> >  – mostly on a debian box.
> > Later on seeing it on a less heavily setup ubuntu box, I see
> >  ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
> > have become 'missing-glyph' boxes.
> > It leads me ask, how much else of what I am writing, some random reader
> > has simply not seen?
> > Quite simply we can never know – because most are going to go away
> > saying "mojibaked/garbled rubbish"
> > Speaking of what you understood of what I said: Yes invisible chars is
> > another problem I was recently bitten by. I pasted something from google
> > into emacs' org mode. Following that link again I kept getting a broken
> > link.
> > Until I found that the link had an invisible char
> > The problem was that emacs was faithfully rendering that char according
> > to standard, ie invisibly!

> And you've never been bitten by an invisible control character in ASCII 
> text? You've lived a sheltered life!

For control characters Ive seen:
- garbage (the ASCII equiv of mojibake)
- Straight ^A^B^C
- Maybe their names NUL,SOH,STX,ETX,EOT,ENQ,ACK…
- Or maybe just a little dot .
- More pathological behavior: a control sequence putting the
  terminal into some other mode

But I dont ever remember seeing a control character become
invisible (except [ \t\n\f])
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread MRAB


On 2014-05-02 03:39, Ben Finney wrote:

Rustom Mody  writes:


Yes, the headaches go a little further back than Unicode.


Okay, so can you change your article to reflect the fact that the
headaches both pre-date Unicode, and are made much easier by Unicode?


There is a certain large old book...


Ah yes, the neo-Sumerian story “Enmerkar_and_the_Lord_of_Aratta”
https://en.wikipedia.org/wiki/Enmerkar_and_the_Lord_of_Aratta>.
Probably inspired by stories older than that, of course.


In which is described the building of a 'tower that reached up to heaven'...
At which point 'it was decided'¶ to do something to prevent that.
And our headaches started.


And other myths with fantastic reasons for the diversity of language
https://en.wikipedia.org/wiki/Mythical_origins_of_language>.


I never knew of any of this in the good ol days of ASCII


Yes, by ignoring all other writing systems except one's own – and
thereby excluding most of the world's people – the system can be made
simpler.


ASCII lacked even £. I can remember assembly listings in magazines
containing lines such as:

LDA £0

I even (vaguely) remember an advert with a character that looked like
Ł, presumably because they didn't have £. In a UK magazine? Very
strange!


Hopefully the proportion of programmers who still feel they can make
such a parochial choice is rapidly shrinking.



--
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Rustom Mody

On Friday, May 2, 2014 5:25:37 PM UTC+5:30, Steven D'Aprano wrote:
> On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote:

> > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote:
> >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote:
> >> > - Worst of all what we
> >> > *dont* see -- how many others dont see what we see?
> >> Again, this a deficiency of the font. There are very few code points in
> >> Unicode which are intended to be invisible, e.g. space, newline, zero-
> >> width joiner, control characters, etc., but they ought to be equally
> >> invisible to everyone. No printable character should ever be invisible
> >> in any decent font.
> > Thats not what I meant.
> > I wrote http://blog.languager.org/2014/04/unicoded-python.html
> >  – mostly on a debian box.
> > Later on seeing it on a less heavily setup ubuntu box, I see
> >  ⟮ ⟯ ⟬ ⟭ ⦇ ⦈ ⦉ ⦊
> > have become 'missing-glyph' boxes.
> > It leads me ask, how much else of what I am writing, some random reader
> > has simply not seen?
> > Quite simply we can never know – because most are going to go away
> > saying "mojibaked/garbled rubbish"
> > Speaking of what you understood of what I said: Yes invisible chars is
> > another problem I was recently bitten by. I pasted something from google
> > into emacs' org mode. Following that link again I kept getting a broken
> > link.
> > Until I found that the link had an invisible char
> > The problem was that emacs was faithfully rendering that char according
> > to standard, ie invisibly!

> And you've never been bitten by an invisible control character in ASCII 
> text? You've lived a sheltered life!

> Nothing you are describing is unique to Unicode.

Just noticed a small thing in which python does a bit better than haskell:
$ ghci
let (ﬁne, fine) = (1,2)
Prelude> (ﬁne, fine)
(1,2)
Prelude> 

In case its not apparent, the fi in the first fine is a ligature.

Python just barfs:

>>> ﬁne = 1
  File "", line 1
ﬁne = 1
^
SyntaxError: invalid syntax
>>> 

The point of that example is to show that unicode gives all kind of 
"Aaah! Gotcha!!" opportunities that just dont exist in the old world.
Python may have got this one right but there are surely dozens of others.

On the other hand I see more eagerness for unicode source-text there
eg.

https://github.com/i-tu/Hasklig
http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#unicode-syntax
http://www.haskell.org/haskellwiki/Unicode-symbols
http://hackage.haskell.org/package/base-unicode-symbols

Some music 𝄞 𝄢 ♭ 𝄱 to appease the utf-8 gods 



-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread MRAB


On 2014-05-02 09:08, Steven D'Aprano wrote:

On Thu, 01 May 2014 21:42:21 -0700, Rustom Mody wrote:



Whats the best cure for headache?

Cut off the head


o_O

I don't think so.



Whats the best cure for Unicode?

Ascii


Unicode is not a problem to be solved.

The inability to write standard human text in ASCII is a problem, e.g.
one cannot write

“ASCII For Dummies” © 2014 by Zöe Smith, now on sale 99¢


[snip]

Shouldn't that be "Zoë"?

--
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Michael Torrie

On 05/02/2014 10:50 AM, Rustom Mody wrote:
> Python just barfs:
> 
 ﬁne = 1
>   File "", line 1
> ﬁne = 1
> ^
> SyntaxError: invalid syntax

> 
> The point of that example is to show that unicode gives all kind of 
> "Aaah! Gotcha!!" opportunities that just dont exist in the old world.
> Python may have got this one right but there are surely dozens of others.

Except that it doesn't.  This has nothing to do with unicode handling.
It has everything to do with what defines an identifier in Python.  This
is no different than someone wondering why they can't start an
identifier in Python 1.x with a number or punctuation mark.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Ned Batchelder

On 5/2/14 12:50 PM, Rustom Mody wrote:

Just noticed a small thing in which python does a bit better than haskell:
$ ghci
let (ﬁne, fine) = (1,2)
Prelude> (ﬁne, fine)
(1,2)
Prelude>

In case its not apparent, the fi in the first fine is a ligature.

Python just barfs:

>>>ﬁne = 1

   File "", line 1
 ﬁne = 1
 ^
SyntaxError: invalid syntax

>>>

Surely by now we could at least be explicit about which version of 
Python we are talking about?

  $ python2.7
  Python 2.7.2 (default, Oct 11 2012, 20:14:37)
  [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on 
darwin

  Type "help", "copyright", "credits" or "license" for more information.
  >>> ﬁne = 1
File "", line 1
  ﬁne = 1
  ^
  SyntaxError: invalid syntax
  >>> ^D
  $ python3.4
  Python 3.4.0b1 (default, Dec 16 2013, 21:05:22)
  [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> ﬁne = 1
  >>> ﬁne
  1

In Python 2 identifiers must be ASCII.  Python 3 allows many Unicode 
characters in identifiers (see PEP 3131 for details: 
http://legacy.python.org/dev/peps/pep-3131/)

--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Peter Otten

Rustom Mody wrote:

> Just noticed a small thing in which python does a bit better than haskell:
> $ ghci
> let (ﬁne, fine) = (1,2)
> Prelude> (ﬁne, fine)
> (1,2)
> Prelude>
> 
> In case its not apparent, the fi in the first fine is a ligature.
> 
> Python just barfs:

Not Python 3:

Python 3.3.2+ (default, Feb 28 2014, 00:52:16) 
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> (ﬁne, fine) = (1,2)
>>> (ﬁne, fine)
(2, 2)

No copy-and-paste errors involved:

>>> eval("\ufb01ne")
2
>>> eval(b"fine".decode("ascii"))
2


-- 
https://mail.python.org/mailman/listinfo/python-list

Hi. I want to create a script to read a file placed in a remote linux server using python..need help..?

2014-05-02 Thread Bhawani Singh

I have created the script till here ..

import os

os.chdir("/var/log")
fd = open("t1.txt", "r")
for line in fd:
if re.match("(.*)(file1)(.*)", line):
print line,

Output :

file1


this script i ran on the linux server, but now i want to run this script from 
another linux server and get the output displayed there..how can i do that...

i tried to use : pexpect
but getting no help..


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Cookie not retrieving as it should in some cases

2014-05-02 Thread alister

On Fri, 02 May 2014 01:11:05 -0700, Ferrous Cranus wrote:

> # retrieve cookie from client's browser otherwise set it try:
>   cookie = cookies.SimpleCookie( os.environ.get('HTTP_COOKIE', '') )
>   cookieID = cookie['ID'].value
> except:
>   cookieID = str( time.time() )
>   cookieID = cookieID[-3:]
> 
>   cookie['ID'] = cookieID
> 
> 
> Many times i noticed that the script instead of retrieving the cookie ID
> value so to identify each visitor uniquely it insteads set its again.
> The same think also happens when someone comes to superhost.gr via a
> link from anothwe webpage
> 
> can somebody tell me why this is happening?
> is there some flaw in my code? Perhaps it can be written more
> efficiently?

I had a similar issue when using Beaker middleware for WSGI which was 
caused by me not specifying a location for the storage of the cookie 
database.



-- 
There is a multi-legged creature crawling on your shoulder.
-- Spock, "A Taste of Armageddon", stardate 3193.9
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Ben Finney

Marko Rauhamaa  writes:

> That reminds me: " " [U+00A0 NON-BREAKING SPACE] is often used between
> numbers and units, for example.

The non-breaking space (“ ” U+00A0) is frequently used in text to keep
conceptually inseparable text such as “100 km” from automatic word
breaks https://en.wikipedia.org/wiki/Non-breaking_space>.

Because of established, conflicting conventions for separating groups of
digits (“1,234.00” in many countries; “1.234,00” in many others)
https://en.wikipedia.org/wiki/Thousands_separator#Digit_grouping>,
the “ ” U+2009 THIN SPACE https://en.wikipedia.org/wiki/Thin_Space>
is recommended for separating digit groups (e.g. “1 234 567 m”)
https://en.wikipedia.org/wiki/SI_units#General_rules>.

-- 
 \   “We spend the first twelve months of our children's lives |
  `\  teaching them to walk and talk and the next twelve years |
_o__)   telling them to sit down and shut up.” —Phyllis Diller |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Roy Smith

In article ,
 Ben Finney  wrote:

> The non-breaking space (âÂ â U+00A0) is frequently used in text to keep
> conceptually inseparable text such as â100Â kmâ from automatic word
> breaks https://en.wikipedia.org/wiki/Non-breaking_space>.

Which, by the way, argparse doesn't honor...

http://bugs.python.org/issue16623
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Hi. I want to create a script to read a file placed in a remote linux server using python..need help..?

2014-05-02 Thread Denis McMahon

On Fri, 02 May 2014 12:55:18 -0700, Bhawani Singh wrote:

> I have created the script till here ..
> 
> import os
> 
> os.chdir("/var/log")
> fd = open("t1.txt", "r")
> for line in fd:
> if re.match("(.*)(file1)(.*)", line):
> print line,
> 
> Output :
> 
> file1
> 
> 
> this script i ran on the linux server, but now i want to run this script
> from another linux server and get the output displayed there..how can i
> do that...
> 
> i tried to use : pexpect but getting no help..

Method a:

Go and sit in front of the keyboard on the other linux server, run the 
script and read the screen.

Method b:

Use telnet to login to your account on the other server, run the script.

To run your script on someone elses machine usually needs you to be able 
to access their machine somehow. Either you are permitted to do it, in 
which case you should already know how to do it, or you're not permitted 
to do it, in which case we're not going to teach you how to do it here.

-- 
Denis McMahon, [email protected]
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Hi. I want to create a script to read a file placed in a remote linux server using python..need help..?

2014-05-02 Thread Roy Smith

In article ,
 Denis McMahon  wrote:

> Method b:
> 
> Use telnet to login to your account on the other server, run the script.

Ugh.  I hope nobody is using telnet anymore.  Passwords send in plain 
text over the network.  Bad.  All uses of telnet should have long since 
been replaced with ssh.

One of the cool thinks about ssh is that not only does it give you 
remote shell connectivity, but it can be used to execute commands 
remotely, over the same secure channel.  There is an awesome python 
package called fabric (http://www.fabfile.org/) which makes it trivial 
to do this inside of a python program.  You can use it as a command-line 
tool, or as a library embedded in another python script.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Rustom Mody

On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:
> Rustom Mody wrote:

> > Just noticed a small thing in which python does a bit better than haskell:
> > $ ghci
> > let (ﬁne, fine) = (1,2)
> > Prelude> (ﬁne, fine)
> > (1,2)
> > In case its not apparent, the fi in the first fine is a ligature.
> > Python just barfs:

> Not Python 3:

> Python 3.3.2+ (default, Feb 28 2014, 00:52:16) 
> [GCC 4.8.1] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> (ﬁne, fine) = (1,2)
> >>> (ﬁne, fine)
> (2, 2)

> No copy-and-paste errors involved:

> >>> eval("\ufb01ne")
> 2
> >>> eval(b"fine".decode("ascii"))
> 2

Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.

I am confused about the tone however:
You think this

>>> (ﬁne, fine) = (1,2) # and no issue about it

is fine?


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Chris Angelico

On Sat, May 3, 2014 at 10:58 AM, Rustom Mody  wrote:
> You think this
>
 (ﬁne, fine) = (1,2) # and no issue about it
>
> is fine?

Not sure which part you're objecting to. Are you saying that this
should be an error:

>>> a, a = 1, 2 # simple ASCII identifier used twice

or that Python should take the exact sequence of codepoints, rather
than normalizing?

Python 3.5.0a0 (default:6a0def54c63d, Mar 26 2014, 01:11:09)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ﬁne = 1
>>> vars()
{'__package__': None, '__spec__': None, '__doc__': None, 'fine': 1,
'__loader__': ,
'__builtins__': , '__name__':
'__main__'}

As regards normalization, I would be happy with either "keep it
exactly as you provided" or "normalize according to ", as long as it's consistent. It's like
what happens with SQL identifiers: according to the standard, an
unquoted name should be uppercased, but some databases instead
lowercase them. It doesn't break code (modulo quoted names, not
applicable here), as long as it's consistent.

(My reading of PEP 3131 is that NFKC is used; is that what's
implemented, or was that a temporary measure and/or something for Py2
to consider?)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Ned Batchelder


On 5/2/14 8:58 PM, Rustom Mody wrote:

On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:

Rustom Mody wrote:



Just noticed a small thing in which python does a bit better than haskell:
$ ghci
let (ﬁne, fine) = (1,2)
Prelude> (ﬁne, fine)
(1,2)
In case its not apparent, the fi in the first fine is a ligature.
Python just barfs:



Not Python 3:



Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.

(ﬁne, fine) = (1,2)
(ﬁne, fine)

(2, 2)



No copy-and-paste errors involved:



eval("\ufb01ne")

2

eval(b"fine".decode("ascii"))

2


Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.

I am confused about the tone however:
You think this


(ﬁne, fine) = (1,2) # and no issue about it


is fine?




Can you be more explicit?  It seems like you think it isn't fine.  Why 
not?  What bothers you about it?  Should there be an issue?


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Rustom Mody

On Saturday, May 3, 2014 6:48:21 AM UTC+5:30, Ned Batchelder wrote:
> On 5/2/14 8:58 PM, Rustom Mody wrote:
> > On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote:
> >> Rustom Mody wrote:
> >>> Just noticed a small thing in which python does a bit better than haskell:
> >>> $ ghci
> >>> let (ﬁne, fine) = (1,2)
> >>> Prelude> (ﬁne, fine)
> >>> (1,2)
> >>> In case its not apparent, the fi in the first fine is a ligature.
> >>> Python just barfs:
> >> Not Python 3:
> >> Python 3.3.2+ (default, Feb 28 2014, 00:52:16)
> >> [GCC 4.8.1] on linux
> >> Type "help", "copyright", "credits" or "license" for more information.
> > (ﬁne, fine) = (1,2)
> > (ﬁne, fine)
> >> (2, 2)
> >> No copy-and-paste errors involved:
> > eval("\ufb01ne")
> >> 2
> > eval(b"fine".decode("ascii"))
> >> 2
> > Aah! Thanks Peter (and Ned and Michael) — 2-3 confusion — my bad.
> > I am confused about the tone however:
> > You think this
>  (ﬁne, fine) = (1,2) # and no issue about it
> > is fine?

> Can you be more explicit?  It seems like you think it isn't fine.  Why 
> not?  What bothers you about it?  Should there be an issue?

Two identifiers that to some programmers
- can look the same
- and not to others
- and that the language treats as different

is not fine (or ﬁne) to me.

Putting them together as I did is summarizing the problem.

Think of them textually widely separated.
And the code (un)serendipitously 'working' (ie not giving NameErrors)


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Chris Angelico

On Sat, May 3, 2014 at 11:42 AM, Rustom Mody  wrote:
> Two identifiers that to some programmers
> - can look the same
> - and not to others
> - and that the language treats as different
>
> is not fine (or ﬁne) to me.

The language treats them as the same, though.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano

On Fri, 02 May 2014 17:58:51 -0700, Rustom Mody wrote:

> I am confused about the tone however: You think this
> 
 (ﬁne, fine) = (1,2) # and no issue about it
> 
> is fine?

It's no worse than any other obfuscated variable name:

MOOSE, MO0SE, M0OSE = 1, 2, 3
xl, x1 = 1, 2

If you know your victim is reading source code in Ariel font, "rn" and 
"m" are virtually indistinguishable except at very large sizes.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Rustom Mody

On Saturday, May 3, 2014 7:24:08 AM UTC+5:30, Chris Angelico wrote:
> On Sat, May 3, 2014 at 11:42 AM, Rustom Mody wrote:
> > Two identifiers that to some programmers
> > - can look the same
> > - and not to others
> > - and that the language treats as different
> > is not fine (or ﬁne) to me.

> The language treats them as the same, though.

Whoops! I seem to be goofing a lot today

Saw Peter's

>>> (ﬁne, fine) = (1,2) 

Didn't notice his next line
>>> (ﬁne, fine)
(2, 2) 

So then I am back to my original point:

Python is giving better behavior than Haskell in this regard!

[Earlier reached this conclusion via a wrong path]
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano

On Sat, 03 May 2014 02:02:32 +, Steven D'Aprano wrote:

> On Fri, 02 May 2014 17:58:51 -0700, Rustom Mody wrote:
> 
>> I am confused about the tone however: You think this
>> 
> (ﬁne, fine) = (1,2) # and no issue about it
>> 
>> is fine?
> 
> 
> It's no worse than any other obfuscated variable name:
> 
> MOOSE, MO0SE, M0OSE = 1, 2, 3
> xl, x1 = 1, 2
> 
> If you know your victim is reading source code in Ariel font, "rn" and
> "m" are virtually indistinguishable except at very large sizes.


Ooops! I too missed that Python normalises the name ﬁne to fine, so in 
fact this is not a case of obfuscation. 



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Chris Angelico

On Sat, May 3, 2014 at 12:02 PM, Steven D'Aprano
 wrote:
> If you know your victim is reading source code in Ariel font, "rn" and
> "m" are virtually indistinguishable except at very large sizes.

I kinda like the idea of naming it after a bratty teenager who rebels
against her father and runs away from home, but normally the font's
called Arial. :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Terry Reedy


On 5/2/2014 9:15 PM, Chris Angelico wrote:


(My reading of PEP 3131 is that NFKC is used; is that what's
implemented, or was that a temporary measure and/or something for Py2
to consider?)


The 3.4 docs say "The syntax of identifiers in Python is based on the 
Unicode standard annex UAX-31, with elaboration and changes as defined 
below; see also PEP 3131 for further details."

...
"All identifiers are converted into the normal form NFKC while parsing; 
comparison of identifiers is based on NFKC."


Without reading UAX-31, I don't know how much was changed, but I suspect 
not much. In any case, the current rules are intended and very unlikely 
to change as that would break code going either forward or back for 
little purpose.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

38 matches

Mail list logo