date:20090311

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Scott David Daniels


A.M. Kuchling wrote:

  With zipfile, you could at least access the .fp attribute
to sync it (though is the .fp documented as part of the interface?).


For this one, I'd like to add the sync as a method (so that Zip-inside-
Zip is eventually possible).  In fact, a sync on an exposed writable
for a single file should probably push back out to a full sync.  This
would be trickier to accomplish if the using code had to suss out how
to get to the fp.  Clearly I have plans for a ZipFile expansion, but
this could only conceivably hit 2.7, and 2.8 / 3.2 is a lot more likely.

--Scott David Daniels
scott.dani...@acm.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Martin v. Löwis

>> We already have os.fsync() and os.fdatasync(). Should the sync() (and
>> datasync()?) method be added as an object-oriented convenience?
> 
> It's more than an object oriented convenience. fsync() takes a file
> descriptor as argument. Therefore I assume fsync() only syncs the data
> to disk that was written to the file descriptor. [*] 
[...]
> [*] Is my assumption correct, anybody?

Not necessarily. In Linux, for many releases, fsync() was really
equivalent to sync() (i.e. flushing all data for all files on all
file systems to disk). It may be that some systems still implement
it that way today.

However, even it it was true, I don't see why a .sync method would
be more than a convenience. An application wishing to sync a file
before close can do

f.flush()
os.fsync(f.fileno)
f.close()

With a sync method, it would become

f.flush()
f.sync()
f.close()

which is *really* nothing more than convenience.

O'd also like to point to the O_SYNC/O_DSYNC/O_RSYNC open(2)
flags. Applications that require durable writes can also chose
to set those on open, and be done.

Regrds,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Joachim König


Guido van Rossum wrote:

On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes  wrote:
  

[...]
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.
[...]


If I understand the post properly, it's up to the app to call fsync(),
and it's only necessary when you're doing one of the rename dances, or
updating a file in place. Basically, as he explains, fsync() is a very
heavyweight operation; I'm against calling it by default anywhere.

  
To me, the flaw seem to be in the close() call (of the operating 
system). I'd expect the data to be
in a persistent state once the close() returns. So there would be no 
need to fsync if the file gets closed anyway.


Of course the close() call could take a while (up to 30 seconds in 
laptop mode), but if one does
not want to wait that long, than one can continue without calling 
close() and take the risk.


Of course, if the data should be on a persistant storage without closing 
the file (e.g. for database
applications), than one has to carefully call the different sync 
methods, but that's an other story.


Why has this ext4 problem not come up for other filesystems?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Neil Hodgson

Antoine Pitrou:

> How about shutil.copystat()?

   shutil.copystat does not copy over the owner, group or ACLs.

   Modeling a copymetadata method on copystat would provide an easy to
understand API and should be implementable on Windows and POSIX.
Reading the OS X documentation shows a set of low-level POSIX
functions for ACLs. Since there are multiple pieces of metadata and an
application may not want to copy all pieces there could be multiple
methods (copygroup ...) or one method with options
shutil.copymetadata(src, dst, group=True, resource_fork=False)

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Hrvoje Niksic


Joachim König wrote:
To me, the flaw seem to be in the close() call (of the operating 
system). I'd expect the data to be

in a persistent state once the close() returns.


I wouldn't, because that would mean that every cp -r would effectively 
do an fsync() for each individual file it copies, which would bog down 
in the case of copying many small files.  Operating systems aggressively 
buffer file systems for good reason: performance of the common case.



Why has this ext4 problem not come up for other filesystems?


It has come up for XFS many many times, for example 
https://launchpad.net/ubuntu/+bug/37435


ext3 was resillient to the problem because of its default allocation 
policy; now that ext4 has implemented the same optimization XFS had 
before, it shares the problems.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Neil Hodgson  gmail.com> writes:
> 
>shutil.copystat does not copy over the owner, group or ACLs.

It depends on what you call "ACLs". It does copy the chmod permission bits.
As for owner and group, I think there is a very good reason that it doesn't copy
them: under Linux, only root can change these properties.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Oleg Broytmann

On Wed, Mar 11, 2009 at 11:43:33AM +, Antoine Pitrou wrote:
> As for owner and group, I think there is a very good reason that it doesn't 
> copy
> them: under Linux, only root can change these properties.

   Only root can change file ownership - and yes, there are scripts that
run with root privileges, so why not copy? As for group ownership - any
user can change group if [s]he belongs to the group.

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Christian Heimes  cheimes.de> writes:
> 
> It's more than an object oriented convenience. fsync() takes a file
> descriptor as argument. Therefore I assume fsync() only syncs the data
> to disk that was written to the file descriptor.

Ok, I agree that a .sync() method makes sense.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Oleg Broytmann  phd.pp.ru> writes:
> 
>Only root can change file ownership - and yes, there are scripts that
> run with root privileges, so why not copy?

Because the new function would then be useless for non-root scripts, and
encouraging people to run their scripts as root would be rather bad.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Oleg Broytmann

On Wed, Mar 11, 2009 at 11:56:13AM +, Antoine Pitrou wrote:
> Oleg Broytmann  phd.pp.ru> writes:
> >Only root can change file ownership - and yes, there are scripts that
> > run with root privileges, so why not copy?
> 
> Because the new function would then be useless for non-root scripts

   That's easy to fix - only copy ownership if the effective user id == 0.

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/p...@phd.pp.ru
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Mark Hammond


On 11/03/2009 1:55 PM, Guido van Rossum wrote:

On Tue, Mar 10, 2009 at 7:45 PM, Christian Heimes  wrote:

Antoine Pitrou wrote:

Christian Heimes  cheimes.de>  writes:

...

Let's not think too Unix-specific. If we add such an API it should do
something on Windows too -- the app shouldn't have to test for the
presence of the API. (And thus the API probably shouldn't be called
fsync.)


This is especially true given Windows has recently introduced a 
transactional API for NTFS.  Although the tone is - err - gushing - it 
(a) should give some information on what is available, and (b) was high 
on my google search list 


http://msdn.microsoft.com/en-us/magazine/cc163388.aspx
http://msdn.microsoft.com/en-us/library/aa363764(VS.85).aspx

Cheers,

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Oleg Broytmann  phd.pp.ru> writes:
> 
>That's easy to fix - only copy ownership if the effective user id == 0.

But errors should not pass silently. If the user intended the function to copy
ownership information and the function fails to do so, it should raise an 
exception.
Having implicit special cases in an API is usually bad, especially when it has
an impact on security.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Christian Heimes

Guido van Rossum wrote:
> Let's not think too Unix-specific. If we add such an API it should do
> something on Windows too -- the app shouldn't have to test for the
> presence of the API. (And thus the API probably shouldn't be called
> fsync.)

In my initial proposal one and a half hour earlier I suggested 'sync()'
as the name of the method and 'synced' as the name of the flag that
forces a fsync() call during the close operation.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Hrvoje Niksic


Christian Heimes wrote:

Guido van Rossum wrote:

Let's not think too Unix-specific. If we add such an API it should do
something on Windows too -- the app shouldn't have to test for the
presence of the API. (And thus the API probably shouldn't be called
fsync.)


In my initial proposal one and a half hour earlier I suggested 'sync()'
as the name of the method and 'synced' as the name of the flag that
forces a fsync() call during the close operation.


Maybe it would make more sense for "synced" to force fsync() on each 
flush, not only on close.  I'm not sure how useful it is, but that's 
what "synced" would imply to me.  Maybe it would be best to avoid having 
such a variable, and expose a close_sync() method instead?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Christian Heimes  cheimes.de> writes:
> 
> In my initial proposal one and a half hour earlier I suggested 'sync()'
> as the name of the method and 'synced' as the name of the flag that
> forces a fsync() call during the close operation.

I think your "synced" flag is too vague. Some applications may need the file to
be synced on close(), but some others may need it to be synced at regular
intervals, or after each write(), etc.

Calling the flag "sync_on_close" would be much more explicit. Also, given the
current API I think it should be an argument to open() rather than a writable
attribute.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

After Hrvoje's message, let me rephrase my suggestion. Let's instead allow:
   open(..., sync_on="close")
   open(..., sync_on="flush")

with a default of None meaning no implicit syncs.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Christian Heimes

Antoine Pitrou schrieb:
> After Hrvoje's message, let me rephrase my suggestion. Let's instead allow:
>open(..., sync_on="close")
>open(..., sync_on="flush")
> 
> with a default of None meaning no implicit syncs.

And sync_on="flush" implies sync_on="close"? Your suggestion sounds like
the right way to me!

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Christian Heimes  cheimes.de> writes:
> 
> And sync_on="flush" implies sync_on="close"?

close() implies flush(), so by construction yes.

> Your suggestion sounds like
> the right way to me!

I'm glad I brought something constructive to the discussion :-))


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Aahz

On Wed, Mar 11, 2009, Antoine Pitrou wrote:
>
> After Hrvoje's message, let me rephrase my suggestion. Let's instead allow:
>open(..., sync_on="close")
>open(..., sync_on="flush")
> 
> with a default of None meaning no implicit syncs.

That looks good, though I'd prefer using named constants rather than
strings.
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

"All problems in computer science can be solved by another level of 
indirection."  --Butler Lampson
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Scott Dial

Aahz wrote:
> On Wed, Mar 11, 2009, Antoine Pitrou wrote:
>> After Hrvoje's message, let me rephrase my suggestion. Let's instead allow:
>>open(..., sync_on="close")
>>open(..., sync_on="flush")
>>
>> with a default of None meaning no implicit syncs.
> 
> That looks good, though I'd prefer using named constants rather than
> strings.

I would agree, but where do you put them? Since open is a built-in,
where would you suggest placing such constants (assuming we don't want
to pollute the built-in namespace)?

-- 
Scott Dial
sc...@scottdial.com
scod...@cs.indiana.edu
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Aahz

On Wed, Mar 11, 2009, Scott Dial wrote:
> Aahz wrote:
>> On Wed, Mar 11, 2009, Antoine Pitrou wrote:
>>> After Hrvoje's message, let me rephrase my suggestion. Let's instead allow:
>>>open(..., sync_on="close")
>>>open(..., sync_on="flush")
>>>
>>> with a default of None meaning no implicit syncs.
>> 
>> That looks good, though I'd prefer using named constants rather than
>> strings.
> 
> I would agree, but where do you put them? Since open is a built-in,
> where would you suggest placing such constants (assuming we don't want
> to pollute the built-in namespace)?

The os module, of course, like the existing O_* constants.
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

"All problems in computer science can be solved by another level of 
indirection."  --Butler Lampson
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Eric Smith


Antoine Pitrou wrote:

I think your "synced" flag is too vague. Some applications may need the file to
be synced on close(), but some others may need it to be synced at regular
intervals, or after each write(), etc.


Why wouldn't sync just be an optional argument to close(), at least for 
the "sync_on_close" case?


Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Eric Smith  trueblade.com> writes:
> 
> Why wouldn't sync just be an optional argument to close(), at least for 
> the "sync_on_close" case?

It wouldn't work with the "with" statement.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Lie Ryan


Scott Dial wrote:

Aahz wrote:

On Wed, Mar 11, 2009, Antoine Pitrou wrote:

After Hrvoje's message, let me rephrase my suggestion. Let's instead allow:
   open(..., sync_on="close")
   open(..., sync_on="flush")

with a default of None meaning no implicit syncs.

That looks good, though I'd prefer using named constants rather than
strings.


I would agree, but where do you put them? Since open is a built-in,
where would you suggest placing such constants (assuming we don't want
to pollute the built-in namespace)?



I actually prefer strings. Just like 'w' or 'r' in open().

Or why not add "f" "c" as modes?

open('file.txt', 'wf')

open for writing, sync on flush

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Eric Smith


Antoine Pitrou wrote:

Eric Smith  trueblade.com> writes:
Why wouldn't sync just be an optional argument to close(), at least for 
the "sync_on_close" case?


It wouldn't work with the "with" statement.



Well, that is a good reason, then!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Martin v. Löwis

> This is especially true given Windows has recently introduced a
> transactional API for NTFS.  Although the tone is - err - gushing - it
> (a) should give some information on what is available, and (b) was high
> on my google search list 
> 
> http://msdn.microsoft.com/en-us/magazine/cc163388.aspx
> http://msdn.microsoft.com/en-us/library/aa363764(VS.85).aspx

Of course, we don't have to go to transactional NTFS to find an
equivalent to fsync: applications can call FlushFileBuffers.
Likewise, if applications want every write call to be synchronized,
they can pass FILE_FLAG_WRITE_THROUGH to CreateFile (similar to
what O_SYNC does on POSIX).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Martin v. Löwis

> Maybe it would make more sense for "synced" to force fsync() on each
> flush, not only on close.  I'm not sure how useful it is, but that's
> what "synced" would imply to me.

That should be implement by passing O_SYNC on open, rather than
explicitly calling fsync.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Can I modify string.Formatter._vformat?

2009-03-11 Thread Eric Smith

I'm implementing support for auto-numbering of str.format() fields (see 
http://bugs.python.org/issue5237). I'm reasonably sure that when I'm 
done modifying the C implementation I'll need to change the signatures 
of string.Formatter._vformat, str._formatter_parser, and/or 
str._formatter_field_name_split. (They need to support the state needed 
to track the auto-number field counter.)


I've always considered these internal implementation details of 
str.format and string.Formatter. They begin with underscores and are not 
documented.


Is there any problem with modifying these in 2.7 and 3.1? I assume not, 
but I want to make sure it doesn't give anyone heartburn.


Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Can I modify string.Formatter._vformat?

2009-03-11 Thread Benjamin Peterson

2009/3/11 Eric Smith :
> I'm implementing support for auto-numbering of str.format() fields (see
> http://bugs.python.org/issue5237). I'm reasonably sure that when I'm done
> modifying the C implementation I'll need to change the signatures of
> string.Formatter._vformat, str._formatter_parser, and/or
> str._formatter_field_name_split. (They need to support the state needed to
> track the auto-number field counter.)
>
> I've always considered these internal implementation details of str.format
> and string.Formatter. They begin with underscores and are not documented.
>
> Is there any problem with modifying these in 2.7 and 3.1? I assume not, but
> I want to make sure it doesn't give anyone heartburn.

Certainly sounds fine with me.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Can I modify string.Formatter._vformat?

2009-03-11 Thread Brett Cannon

On Wed, Mar 11, 2009 at 13:20, Benjamin Peterson wrote:

> 2009/3/11 Eric Smith :
> > I'm implementing support for auto-numbering of str.format() fields (see
> > http://bugs.python.org/issue5237). I'm reasonably sure that when I'm
> done
> > modifying the C implementation I'll need to change the signatures of
> > string.Formatter._vformat, str._formatter_parser, and/or
> > str._formatter_field_name_split. (They need to support the state needed
> to
> > track the auto-number field counter.)
> >
> > I've always considered these internal implementation details of
> str.format
> > and string.Formatter. They begin with underscores and are not documented.
> >
> > Is there any problem with modifying these in 2.7 and 3.1? I assume not,
> but
> > I want to make sure it doesn't give anyone heartburn.
>
> Certainly sounds fine with me.


Even though the Great Release Manager of 3.1 said it was fine, I will toss
in my support with it being okay to modify them.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Neil Hodgson

Antoine Pitrou:

> It depends on what you call "ACLs". It does copy the chmod permission bits.

Access Control Lists are fine grained permissions. Perhaps you
want to allow Sam to read a file and for Ted to both read and write
it. These permissions should not need to be reset every time you
modify the file.

> As for owner and group, I think there is a very good reason that it doesn't 
> copy
> them: under Linux, only root can change these properties.

   Since I am a member of both "staff" and "everyone", I can set group
on one of my files from "staff" to "everyone" or back again:

$ chown :everyone x.pl
$ ls -la x.pl
-rwxrwxrwx  1 nyamatongwe  everyone  269 Mar 11  2008 x.pl
$ chown :staff x.pl
$ ls -la x.pl
-rwxrwxrwx  1 nyamatongwe  staff  269 Mar 11  2008 x.pl

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Greg Ewing


Barry Warsaw wrote:

Of course, a careful *nix application can ensure that the file owners  
and mod bits are set the way it needs them to be set.  A convenience  
function might be useful though.


A specialised function would also provide a place for
dealing with platform-specific extensions, such as
MacOSX Finder attributes.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Cameron Simpson

On 11Mar2009 10:09, Joachim K?nig  wrote:
> Guido van Rossum wrote:
>> On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes  wrote:
>>> [...]
>>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.
>>> [...]
>> If I understand the post properly, it's up to the app to call fsync(),
>> and it's only necessary when you're doing one of the rename dances, or
>> updating a file in place. Basically, as he explains, fsync() is a very
>> heavyweight operation; I'm against calling it by default anywhere.
>>   
> To me, the flaw seem to be in the close() call (of the operating  
> system). I'd expect the data to be
> in a persistent state once the close() returns. So there would be no  
> need to fsync if the file gets closed anyway.

Not really. On the whole, flush() means "the object has handed all data
to the OS".  close() means "the object has handed all data to the OS
and released the control data structures" (OS file descriptor release;
like the OS, the python interpreter may release python stuff later too).

By contrast, fsync() means "the OS has handed filesystem changes to the
disc itself". Really really slow, by comparison with memory. It is Very
Expensive, and a very different operation to close().

[...snip...]
> Why has this ext4 problem not come up for other filesystems?

The same problems exist for all disc based filesystems to a greater of
lesser degree; the OS always does some buffering and therefore there
is a gap between what the OS has accepted from you (and thus made
visible to other apps using the OS) and the physical data structures
on disc. Ext2/3/4 tend to do whole disc sync when just asked to fsync,
probably because it really is only feasible to say "get to a particular
checkpoint in the journal". Many other filesystems will have similar
degrees of granularity, perhaps not all.

Anyway, fsync is a much bigger ask than close, and should be used very
sparingly.

Cheers,
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

If I repent anything, it is very likely to be my good behavior.
What demon possessed me that I behaved so well? - Henry David Thoreau
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Steven D'Aprano

On Thu, 12 Mar 2009 01:21:25 am Antoine Pitrou wrote:
> Christian Heimes  cheimes.de> writes:
> > In my initial proposal one and a half hour earlier I suggested
> > 'sync()' as the name of the method and 'synced' as the name of the
> > flag that forces a fsync() call during the close operation.
>
> I think your "synced" flag is too vague. Some applications may need
> the file to be synced on close(), but some others may need it to be
> synced at regular intervals, or after each write(), etc.
>
> Calling the flag "sync_on_close" would be much more explicit. Also,
> given the current API I think it should be an argument to open()
> rather than a writable attribute.

Perhaps we should have a module containing rich file tools, e.g. classes 
FileSyncOnWrite, FileSyncOnClose, functions for common file-related 
operations, etc. This will make it easy for conscientious programmers 
to do the right thing for their app without needing to re-invent the 
wheel all the time, but without handcuffing them into a single "one 
size fits all" solution.

File operations are *hard*, because many error conditions are uncommon, 
and consequently many (possibly even the majority) of programmers never 
learn that something like this:

f = open('myfile', 'w')
f.write(data)
f.close()

(or the equivalent in whatever language they use) may cause data loss. 
Worse, we train users to accept that data loss as normal instead of 
reporting it as a bug -- possibly because it is unclear whether it is a 
bug in the application, the OS, the file system, or all three. (It's 
impossible to avoid *all* risk of data loss, of course -- what if the 
computer loses power in the middle of a write? But we can minimize that 
risk significantly.)

Even when programmers try to do the right thing, it is hard to know what 
the right thing is: there are trade-offs to be made, and having made a 
trade-off, the programmer then has to re-invent what usually turns out 
to be a quite complicated wheel. To do the right thing in Python often 
means delving into the world of os.O_* constants and file descriptors, 
which is intimidating and unpythonic. They're great for those who 
want/need them, but perhaps we should expose a Python interface to the 
more common operations? To my mind, that means classes instead of magic 
constants.

Would there be interest in a filetools module? Replies and discussion to 
python-ideas please.

-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-ideas] Adding a test discovery into Python

2009-03-11 Thread Guilherme Polo

On Wed, Mar 11, 2009 at 7:37 PM, Raymond Hettinger  wrote:
> [Christian Heimes]

 I'm +1 for a simple (!) test discovery system. I'm emphasizing on simple
 because there are enough frameworks for elaborate unit testing.
>
> Test discovery is not the interesting part of the problem.

Interesting or not, it is a problem that is asking for a solution,
this kind of code is being duplicated in several places for no good
reason.

>
> Axiom:  The more work involved in writing tests, the fewer
> tests that will get written.

At some point you will have to run them too, I don't think you want to
reimplement the discovery part yet another time.


-- 
-- Guilherme H. Polo Goncalves
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Greg Ewing


Lie Ryan wrote:


I actually prefer strings. Just like 'w' or 'r' in open().

Or why not add "f" "c" as modes?

open('file.txt', 'wf')


I like this, because it doesn't expand the signature that
file-like objects need to support. If you're wrapping
another file object you just need to pass on the mode
string and it will all work.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Greg Ewing


Martin v. Löwis wrote:


That should be implement by passing O_SYNC on open, rather than
explicitly calling fsync.


On platforms which have it (MacOSX doesn't seem to,
according to the man page).

This is another good reason to put these things in the
mode string.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Paul Moore

2009/3/11 Greg Ewing :
> Lie Ryan wrote:
>
>> I actually prefer strings. Just like 'w' or 'r' in open().
>>
>> Or why not add "f" "c" as modes?
>>
>> open('file.txt', 'wf')
>
> I like this, because it doesn't expand the signature that
> file-like objects need to support. If you're wrapping
> another file object you just need to pass on the mode
> string and it will all work.

Of course, a file opened for write, in text mode, with auto-sync on
flush, has mode "wtf". I'm in favour just for the chance to use that
mode :-)

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Greg Ewing  canterbury.ac.nz> writes:
> 
> I like this, because it doesn't expand the signature that
> file-like objects need to support. If you're wrapping
> another file object you just need to pass on the mode
> string and it will all work.

What do you mean? open() doesn't allow you to wrap other file objects.

As for adding options to the mode string, I think it will only make things
unreadable. Better make the option explicit, like others already are (buffering,
newline, encoding).

Besides, file objects still have to support a sync() method, since sync-on-close
doesn't cater for all uses.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Greg Ewing


Antoine Pitrou wrote:


What do you mean? open() doesn't allow you to wrap other file objects.


I'm talking about things like GzipFile that take a
filename and mode, open the file and then wrap the
file object.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Raymond Hettinger

The current formatting mini-language provisions left/right/center alignment, prefixes for 0b 0x 0o, and rules on when to show the 
plus-sign.  I think it would be far more useful to provision a simple way of specifying a thousands separator.


Financial users in particular find the locale approach to be frustrating and non-obvious.  Putting in a thousands separator is a 
common task for output destined to be read by non-programmers.



Raymond 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Nick Coghlan

Greg Ewing wrote:
> Antoine Pitrou wrote:
> 
>> What do you mean? open() doesn't allow you to wrap other file objects.
> 
> I'm talking about things like GzipFile that take a
> filename and mode, open the file and then wrap the
> file object.

The tempfile module would be another example.

For that reason, I think Steven's idea of a filetools module which
provided context managers and the like that wrapped *existing* file-like
objects might be preferable.

Otherwise it may be a while before sync-aware code is able to deal with
anything other than basic files.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Antoine Pitrou

Raymond Hettinger  rcn.com> writes:
> 
> Financial users in particular find the locale approach to be frustrating and
non-obvious.  Putting in a
> thousands separator is a 
> common task for output destined to be read by non-programmers.

Please note that for it to be useful in all parts of the world, it must also
allow changing the decimal point.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Nick Coghlan

Raymond Hettinger wrote:
> The current formatting mini-language provisions left/right/center
> alignment, prefixes for 0b 0x 0o, and rules on when to show the
> plus-sign.  I think it would be far more useful to provision a simple
> way of specifying a thousands separator.
> 
> Financial users in particular find the locale approach to be frustrating
> and non-obvious.  Putting in a thousands separator is a common task for
> output destined to be read by non-programmers.

+1 for the general idea.

A specific syntax proposal:

  [[fill]align][sign][#][0][minimumwidth][,sep][.precision][type]

'sep' is the new field that defines the thousands separator. It appears
immediately before the precision specifier and starts with a leading comma.

I believe this syntax is unambiguous and backwards compatible because
the only other place a comma might appear (the fill field) is required
to be followed by an alignment character.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread James Y Knight



On Mar 11, 2009, at 9:06 PM, Nick Coghlan wrote:


Raymond Hettinger wrote:

The current formatting mini-language provisions left/right/center
alignment, prefixes for 0b 0x 0o, and rules on when to show the
plus-sign.  I think it would be far more useful to provision a simple
way of specifying a thousands separator.

Financial users in particular find the locale approach to be  
frustrating
and non-obvious.  Putting in a thousands separator is a common task  
for

output destined to be read by non-programmers.


+1 for the general idea.

A specific syntax proposal:

 [[fill]align][sign][#][0][minimumwidth][,sep][.precision][type]

'sep' is the new field that defines the thousands separator. It  
appears
immediately before the precision specifier and starts with a leading  
comma.


I believe this syntax is unambiguous and backwards compatible because
the only other place a comma might appear (the fill field) is required
to be followed by an alignment character.


You might be interested to know that in India, the commas don't come  
every 3 digits. In india, they come every two digits, after the first  
three. Thus one billion = 1,00,00,00,000. How are you gonna represent  
*that* in a formatting mini-language? :)


See also http://en.wikipedia.org/wiki/Indian_numbering_system

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Antoine Pitrou

Nick Coghlan  gmail.com> writes:
> 
> The tempfile module would be another example.

Do you really need your temporary files to survive system crashes? ;)

> For that reason, I think Steven's idea of a filetools module which
> provided context managers and the like that wrapped *existing* file-like
> objects might be preferable.

Well, well, let's clarify things a bit.
If we want to help users with this problem, we can provide two things:
1. a new sync() method on the standard objects provided by the IO lib
2. a facility to automatically call sync() on flush() and/or close() calls

Step 1 may be done with a generic implementation in the IO ABCs calling
self.flush() and then os.fsync(self.fileno()). IMO it is important that it is a
method of IO objects because implementations may want to override it. An
external facility would be too inflexible.

Step 2 may be done with a generic wrapper. However, we could also provide an
open() flag which transparently invokes the wrapper. After all, open() is
already a convenience function creating a raw file object and wrapping it in two
optional layers.

(as a side note, wrappers have a non-zero performance impact, especially on
small ops - e.g. reading or writing a few bytes)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-ideas] Ext4 data loss

2009-03-11 Thread zooko

Would there be interest in a filetools module? Replies and  
discussion to python-ideas please.



I've been using and maintaining a few filesystem hacks for, let's  
see, almost nine years now:


http://allmydata.org/trac/pyutil/browser/pyutil/pyutil/fileutil.py

(The first version of that was probably written by Greg Smith in  
about 1999.)


I'm sure there are many other such packages.  A couple of quick  
searches of pypi turned up these two:


http://pypi.python.org/pypi/Pythonutils
http://pypi.python.org/pypi/fs

I wonder if any of them have the sort of functionality you're  
thinking of.


Regards,

Zooko
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Raymond Hettinger



[James Y Knight]
You might be interested to know that in India, the commas don't come  
every 3 digits. In india, they come every two digits, after the first  
three. Thus one billion = 1,00,00,00,000. How are you gonna represent  
*that* in a formatting mini-language? :)


It is not the goal to replace locale or to accomodate every
possible convention.  The goal is to make a common task easier
for many users.  The current, default use of the period as a decimal 
point has not proven to be problem eventhough that convention is
not universal.   For a thousands separator, a comma is a decent choice 
that makes it easy follow-on with s.replace(',', '_') or somesuch.


This simple utility could help a lot of programmers make their output
look more professional and readable.  I hope the idea doesn't get
sunk by a desire to over-parameterize and cover every possible use case.

My pocket calculators all support thousands separators but in Python,
we have to do a funky dance for even this most basic bit of formatting.

I'd like to think that in 2009 we could show a little progress beyond
C's printf() or Fortran's write() formats.


Raymond




import locale
locale.setlocale(locale.LC_ALL, 'English_United States.1252')

'English_United States.1252'

conv = locale.localeconv()  # get a mapping of conventions
x = 1234567.8
locale.format("%d", x, grouping=True)

'1,234,567'

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Ben Finney

James Y Knight  writes:

> You might be interested to know that in India, the commas don't come
> every 3 digits. In india, they come every two digits, after the
> first three. Thus one billion = 1,00,00,00,000. How are you gonna
> represent *that* in a formatting mini-language? :)

Likewise, China uses four-digit groupings (per “myriad”)
http://en.wikipedia.org/wiki/Chinese_numerals#Reading_and_transcribing_numbers>.

-- 
 \   “Self-respect: The secure feeling that no one, as yet, is |
  `\suspicious.” —Henry L. Mencken |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Guido van Rossum

On Wed, Mar 11, 2009 at 6:01 PM, Antoine Pitrou  wrote:
> Raymond Hettinger  rcn.com> writes:
>>
>> Financial users in particular find the locale approach to be frustrating and
> non-obvious.  Putting in a
>> thousands separator is a
>> common task for output destined to be read by non-programmers.
>
> Please note that for it to be useful in all parts of the world, it must also
> allow changing the decimal point.

Now that this cat is out of the bag (or should I say now that this can
of worms is opened :-) I suggest moving this to python-ideas and
writing a proper PEP. I expect that nobody likes that idea, but it
seems better than the alternative, which is to let the programmer who
gets to implement it design it...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Nick Coghlan

Antoine Pitrou wrote:
> Nick Coghlan  gmail.com> writes:
>> The tempfile module would be another example.
> 
> Do you really need your temporary files to survive system crashes? ;)

No, but they need to provide the full file API. If we add a sync()
method to file objects, that becomes part of the "file-like" API.

On the performance side... the overhead from fsync() itself is going to
dwarf the CPU overhead of going through a wrapper class.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Approaches to argument type-checking

2009-03-11 Thread Tennessee Leeuwenburg

Hi all,

I am currently looking at issue 5236. This issue regards the exception
raised when a bytes string is passed into time.strptime. In addition to the
specific question I have regarding this issue, I wasn't sure if this was
something for python-dev or for the issue comment. However, it does concern
general Python coding approach, so just give me a pointer over whether this
is better kept on the tracker or whether posting to the list was a good idea
(I'm slowly learning!)

EXAMPLE OF PROBLEM:

>>> time.strptime(b'2009', "%Y")
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/tjl/python3/lib/python3.1/_strptime.py", line 454, in
_strptime_time
return _strptime(data_string, format)[0]
  File "/home/tjl/python3/lib/python3.1/_strptime.py", line 322, in
_strptime
found = format_regex.match(data_string)
TypeError: can't use a string pattern on a bytes-like object


WHEREAS:

>>> time.strftime(b"%Y")
Traceback (most recent call last):
  File "", line 1, in 
TypeError: strftime() argument 1 must be str, not bytes


What is occurring here is that the arguments to strftime are being
type-checked up-front, whereas in strptime they are not. Further, srtptime
is implemented in a python file, _strptime.py, whiel strftime is implemented
in timemodule.c.

It appears as though it is generally the case (or at least often the case)
that C functions are making use of the vgetargs function which performs a
goodly bit of type checking. However, the same does not seem to hold for the
Python interface functions.

>From the Python interpreter perspective, though, both are in the time module
(time.strftime and time.strptime) so the inconsistency is a bit jarring. I
can see that I could solve this a few ways:
  * Do a false-parse of the arguments using the same vgetargs1 method, but
not do anything with the return value
  * Perform a type-check on the relevant argument, data_string, in Python
and raise a more specific Exception
  * Write some kind of generic type-checking helper method which could be
re-used

Is there a general strategy used in Python development which I should be
aware of? i.e. is it customary to type-check every argument of an interface
method? Or is it customary not to perform type checking up-front and simply
allow the exception to occur deeper in the code? Are there performance
issues surrounding defensive programming?

Regards,
-Tennessee


-- 
--
Tennessee Leeuwenburg
http://myownhat.blogspot.com/
"Don't believe everything you think"
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Approaches to argument type-checking

2009-03-11 Thread Benjamin Peterson

2009/3/11 Tennessee Leeuwenburg :
> Is there a general strategy used in Python development which I should be
> aware of? i.e. is it customary to type-check every argument of an interface
> method? Or is it customary not to perform type checking up-front and simply
> allow the exception to occur deeper in the code? Are there performance
> issues surrounding defensive programming?

Generally we avoid checking types at all in Python because of ducking
typing. The C interface must check types because they have to
translate to the C level equivalents.

If tests are failing from a C implementation on a Python
implementation because of extensive type checking, I would be tempted
to mark those tests as implementation details.

However, in the case of this specific issue, I think rejecting bytes
purposefully is good because it avoids programming errors.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Nick Coghlan

Raymond Hettinger wrote:
> 
> [James Y Knight]
>> You might be interested to know that in India, the commas don't come 
>> every 3 digits. In india, they come every two digits, after the first 
>> three. Thus one billion = 1,00,00,00,000. How are you gonna represent 
>> *that* in a formatting mini-language? :)
> 
> It is not the goal to replace locale or to accomodate every
> possible convention.  The goal is to make a common task easier
> for many users.  The current, default use of the period as a decimal
> point has not proven to be problem eventhough that convention is
> not universal.   For a thousands separator, a comma is a decent choice
> that makes it easy follow-on with s.replace(',', '_') or somesuch.

In that case, I would simplify my suggestion to:

  [[fill]align][sign][#][0][minimumwidth][,][.precision][type]

Addition to mini language documentation:
  The ',' option indicates that commas should be included in the
 output as a thousands separator. As with locales which do not use a
 period as the decimal point, locales which use a different convention
 for digit separation will need to use the locale module to obtain
 appropriate formatting.

Guido has asked for a PEP to be developed on python-ideas to define the
deliberately limited scope though, so I'm going to bow out of the
conversation now...

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Raymond Hettinger



[Guido van Rossum]

I suggest moving this to python-ideas and
writing a proper PEP. 


Okay, it's moved.

Will write up a PEP, do research on what  other languages 
do and collect everyone's ideas on what to put in the shed.
(hundreds and ten thousands grouping, various choices of 
decimal points, mayan number systems and whatnot).  Will

start with Nick's simple proposal as a starting point.

[Nick Coghlan]

 [[fill]align][sign][#][0][minimumwidth][,][.precision][type]


Other suggestions and comments welcome.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Guido van Rossum

On Wed, Mar 11, 2009 at 8:34 PM, Raymond Hettinger  wrote:
>>  I expect that nobody likes that idea,
>
> Do you mean the idea of a thousands separator
> or the idea of also parameterizing the decimal point
> or both?

Sorry, neither. I meant the idea of having to write a PEP. :-)

(Added back python-dev to clarify for all.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread James Y Knight


On Mar 11, 2009, at 11:40 PM, Nick Coghlan wrote:

Raymond Hettinger wrote:

It is not the goal to replace locale or to accomodate every
possible convention.  The goal is to make a common task easier
for many users.  The current, default use of the period as a decimal
point has not proven to be problem eventhough that convention is
not universal.   For a thousands separator, a comma is a decent  
choice

that makes it easy follow-on with s.replace(',', '_') or somesuch.


In that case, I would simplify my suggestion to:

 [[fill]align][sign][#][0][minimumwidth][,][.precision][type]

Addition to mini language documentation:
 The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not use a
period as the decimal point, locales which use a different convention
for digit separation will need to use the locale module to obtain
appropriate formatting.



This proposal has the advantage that you're not overly specifying the  
behavior in the format string itself.


That is: the "," option is really just indicating "please insert  
separators". With the current locale-ignorant implementation, that'd  
just mean "a comma every 3 digits". But it leaves the door open for a  
locale-sensitive variant of the format to be added in the future  
without conflicting with the instructions in the format string. (as  
the ability to specify an arbitrary character, or the ability to  
specify a comma instead of a period for the decimal point would).


I'm not against Raymond's proposal, just against doing a *bad* job of  
making it work in multiple locales. Locale conventions can be complex,  
and are going to be best represented outside the format string.


(BTW: single quote is used by printf for the grouping flag rather than  
comma)


James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Lie Ryan


James Y Knight wrote:

On Mar 11, 2009, at 11:40 PM, Nick Coghlan wrote:

Raymond Hettinger wrote:

It is not the goal to replace locale or to accomodate every
possible convention.  The goal is to make a common task easier
for many users.  The current, default use of the period as a decimal
point has not proven to be problem eventhough that convention is
not universal.   For a thousands separator, a comma is a decent choice
that makes it easy follow-on with s.replace(',', '_') or somesuch.


In that case, I would simplify my suggestion to:

 [[fill]align][sign][#][0][minimumwidth][,][.precision][type]

Addition to mini language documentation:
 The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not use a
period as the decimal point, locales which use a different convention
for digit separation will need to use the locale module to obtain
appropriate formatting.



This proposal has the advantage that you're not overly specifying the 
behavior in the format string itself.


That is: the "," option is really just indicating "please insert 
separators". With the current locale-ignorant implementation, that'd 
just mean "a comma every 3 digits". But it leaves the door open for a 
locale-sensitive variant of the format to be added in the future without 
conflicting with the instructions in the format string. (as the ability 
to specify an arbitrary character, or the ability to specify a comma 
instead of a period for the decimal point would).


I'm not against Raymond's proposal, just against doing a *bad* job of 
making it work in multiple locales. Locale conventions can be complex, 
and are going to be best represented outside the format string.


How about having a country code field, e.g. en-us would format according 
to US locale, in to India, ch to China, etc... that way the format 
string would become very simple (although the lib maintainer would need 
to know customs from all over the world). Then have a special country 
code that is a placeholder for whatever the locale the machine is set to.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Formatting mini-language suggestion

2009-03-11 Thread Raymond Hettinger



[Lie Ryan]
How about having a country code field, e.g. en-us would format according 
to US locale, in to India, ch to China, etc... that way the format 
string would become very simple (although the lib maintainer would need 
to know customs from all over the world). Then have a special country 
code that is a placeholder for whatever the locale the machine is set to.


Am moving the discussion to the python-ideas list (at Guido's request).
My proposal is strictly limited to the builtin, non-locale dependent formatting.
Improvements to the locale module are probably as subject for another day.


Raymond



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

59 matches

Mail list logo