Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-28 Thread Akira Li
Ben Hoyt  writes:

> Hi Python dev folks,
>
> I've written a PEP proposing a specific os.scandir() API for a
> directory iterator that returns the stat-like info from the OS, *the
> main advantage of which is to speed up os.walk() and similar
> operations between 4-20x, depending on your OS and file system.*
> ...
> http://legacy.python.org/dev/peps/pep-0471/
> ...
> Specifically, this PEP proposes adding a single function to the ``os``
> module in the standard library, ``scandir``, that takes a single,
> optional string as its argument::
>
> scandir(path='.') -> generator of DirEntry objects
>

Have you considered adding support for paths relative to directory
descriptors [1] via keyword only dir_fd=None parameter if it may lead to
more efficient implementations on some platforms?

[1]: https://docs.python.org/3.4/library/os.html#dir-fd


--
akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator

2014-06-29 Thread Akira Li
Chris Angelico  writes:

> On Sat, Jun 28, 2014 at 11:05 PM, Akira Li <4kir4...@gmail.com> wrote:
>> Have you considered adding support for paths relative to directory
>> descriptors [1] via keyword only dir_fd=None parameter if it may lead to
>> more efficient implementations on some platforms?
>>
>> [1]: https://docs.python.org/3.4/library/os.html#dir-fd
>
> Potentially more efficient and also potentially safer (see 'man
> openat')... but an enhancement that can wait, if necessary.
>

Introducing the feature later creates unnecessary incompatibilities.
Either it should be explicitly rejected in the PEP 471 and
something-like `os.scandir(os.open(relative_path, dir_fd=fd))` recommended
instead (assuming `os.scandir in os.supports_fd` like `os.listdir()`).

At C level it could be implemented using fdopendir/openat or scandirat.

Here's the function description using Argument Clinic DSL:

/*[clinic input]

os.scandir

path : path_t(allow_fd=True, nullable=True) = '.'

*path* can be specified as either str or bytes. On some
platforms, *path* may also be specified as an open file
descriptor; the file descriptor must refer to a directory.  If
this functionality is unavailable, using it raises
NotImplementedError.

*

dir_fd : dir_fd = None

If not None, it should be a file descriptor open to a
directory, and *path* should be a relative string; path will
then be relative to that directory.  if *dir_fd* is
unavailable, using it raises NotImplementedError.

Yield a DirEntry object for each file and directory in *path*.

Just like os.listdir, the '.' and '..' pseudo-directories are skipped,
and the entries are yielded in system-dependent order.

{parameters}
It's an error to use *dir_fd* when specifying *path* as an open file
descriptor.

[clinic start generated code]*/


And corresponding tests (from test_posix:PosixTester), to show the
compatibility with os.listdir argument parsing in detail:

def test_scandir_default(self):
# When scandir is called without argument,
# it's the same as scandir(os.curdir).
self.assertIn(support.TESTFN, [e.name for e in posix.scandir()])

def _test_scandir(self, curdir):
filenames = sorted(e.name for e in posix.scandir(curdir))
self.assertIn(support.TESTFN, filenames)
#NOTE: assume listdir, scandir accept the same types on the platform
self.assertEqual(sorted(posix.listdir(curdir)), filenames)

def test_scandir(self):
self._test_scandir(os.curdir)

def test_scandir_none(self):
# it's the same as scandir(os.curdir).
self._test_scandir(None)

def test_scandir_bytes(self):
# When scandir is called with a bytes object,
# the returned entries names are still of type str.
# Call `os.fsencode(entry.name)` to get bytes
self.assertIn('a', {'a'})
self.assertNotIn(b'a', {'a'})
self._test_scandir(b'.')

@unittest.skipUnless(posix.scandir in os.supports_fd,
 "test needs fd support for posix.scandir()")
def test_scandir_fd_minus_one(self):
# it's the same as scandir(os.curdir).
self._test_scandir(-1)

def test_scandir_float(self):
# invalid args
self.assertRaises(TypeError, posix.scandir, -1.0)

@unittest.skipUnless(posix.scandir in os.supports_fd,
 "test needs fd support for posix.scandir()")
def test_scandir_fd(self):
fd = posix.open(posix.getcwd(), posix.O_RDONLY)
self.addCleanup(posix.close, fd)
self._test_scandir(fd)
self.assertEqual(
sorted(posix.scandir('.')),
sorted(posix.scandir(fd)))
# call 2nd time to test rewind
self.assertEqual(
sorted(posix.scandir('.')),
sorted(posix.scandir(fd)))

@unittest.skipUnless(posix.scandir in os.supports_dir_fd,
 "test needs dir_fd support for os.scandir()")
def test_scandir_dir_fd(self):
relpath = 'relative_path'
with support.temp_dir() as parent:
fullpath = os.path.join(parent, relpath)
with support.temp_dir(path=fullpath):
support.create_empty_file(os.path.join(parent, 'a'))
support.create_empty_file(os.path.join(fullpath, 'b'))
fd = posix.open(parent, posix.O_RDONLY)
self.addCleanup(posix.close, fd)
self.assertEqual(
sorted(posix.scandir(relpath, dir_fd=fd)),
sorted(posix.scandir(fullpath)))
# check that fd is still useful
self.assertEqual(
sorted(

Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

2014-07-01 Thread Akira Li
Ben Hoyt  writes:

> Thanks, Victor.
>
> I don't have any experience with dir_fd handling, so unfortunately
> can't really comment here.
>
> What advantages does it bring? I notice that even os.listdir() on
> Python 3.4 doesn't have anything related to file descriptors, so I'd
> be in favour of not including support. We can always add it later.
>
> -Ben

FYI, os.listdir does support file descriptors in Python 3.3+ try:

  >>> import os
  >>> os.listdir(os.open('.', os.O_RDONLY))

NOTE: os.supports_fd and os.supports_dir_fd are different sets.

See also,
https://mail.python.org/pipermail/python-dev/2014-June/135265.html


--
Akira


P.S. Please, don't put your answer on top of the message you are
replying to.

>
> On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner  
> wrote:
>> Hi,
>>
>> IMO we must decide if scandir() must support or not file descriptor.
>> It's an important decision which has an important impact on the API.
>>
>>
>> To support scandir(fd), the minimum is to store dir_fd in DirEntry:
>> dir_fd would be None for scandir(str).
>>
>>
>> scandir(fd) must not close the file descriptor, it should be done by
>> the caller. Handling the lifetime of the file descriptor is a
>> difficult problem, it's better to let the user decide how to handle
>> it.
>>
>> There is the problem of the limit of open file descriptors, usually
>> 1024 but it can be lower. It *can* be an issue for very deep file
>> hierarchy.
>>
>> If we choose to support scandir(fd), it's probably safer to not use
>> scandir(fd) by default in os.walk() (use scandir(str) instead), wait
>> until the feature is well tested, corner cases are well known, etc.
>>
>>
>> The second step is to enhance pathlib.Path to support an optional file
>> descriptor. Path already has methods on filenames like chmod(),
>> exists(), rename(), etc.
>>
>>
>> Example:
>>
>> fd = os.open(path, os.O_DIRECTORY)
>> try:
>>for entry in os.scandir(fd):
>>   # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
>>   path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
>>   # ... use path which uses dir_fd ...
>> finally:
>> os.close(fd)
>>
>> Problem: if the path object is stored somewhere and use after the
>> loop, Path methods will fail because dir_fd was closed. It's even
>> worse if a new directory uses the same file descriptor :-/ (security
>> issue, or at least tricky bugs!)
>>
>> Victor
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: 
>> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-09 Thread Akira Li
Ben Hoyt  writes:
...
> ``scandir()`` yields a ``DirEntry`` object for each file and directory
> in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'``
> pseudo-directories are skipped, and the entries are yielded in
> system-dependent order. Each ``DirEntry`` object has the following
> attributes and methods:
>
> * ``name``: the entry's filename, relative to the ``path`` argument
>   (corresponds to the return values of ``os.listdir``)
>
> * ``full_name``: the entry's full path name -- the equivalent of
>   ``os.path.join(path, entry.name)``

I suggest renaming .full_name -> .path

.full_name might be misleading e.g., it implies that .full_name ==
abspath(.full_name) that might be false. The .path name has no such
associations.

The semantics of the the .path attribute is defined by these assertions::

for entry in os.scandir(topdir):
#NOTE: assume os.path.normpath(topdir) is not called to create .path
assert entry.path == os.path.join(topdir, entry.name)
assert entry.name == os.path.basename(entry.path)
assert entry.name == os.path.relpath(entry.path, start=topdir)
assert os.path.dirname(entry.path) == topdir
assert (entry.path != os.path.abspath(entry.path) or
os.path.isabs(topdir)) # it is absolute only if topdir is
assert (entry.path != os.path.realpath(entry.path) or
topdir == os.path.realpath(topdir)) # symlinks are not resolved
assert (entry.path != os.path.normcase(entry.path) or
topdir == os.path.normcase(topdir)) # no case-folding,
# unlike PureWindowsPath


...
> * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never
>   requires a system call on Windows, and usually doesn't on POSIX
>   systems

I suggest documenting the implicit follow_symlinks parameter for .is_X methods.

Note: lstat == partial(stat, follow_symlinks=False).

In particular, .is_dir() should probably use follow_symlinks=True by
default as suggested by Victor Stinner *if .is_dir() does it on Windows*

MSDN says: GetFileAttributes() does not follow symlinks.

os.path.isdir docs imply follow_symlinks=True: "both islink() and
isdir() can be true for the same path."


...
> Like the other functions in the ``os`` module, ``scandir()`` accepts
> either a bytes or str object for the ``path`` parameter, and returns
> the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the
> same type as ``path``. However, it is *strongly recommended* to use
> the str type, as this ensures cross-platform support for Unicode
> filenames.

Document when {e.name for e in os.scandir(path)} != set(os.listdir(path))
+

e.g., path can be an open file descriptor in os.listdir(path) since
Python 3.3 but the PEP doesn't mention it explicitly.

It has been discussed already e.g.,
https://mail.python.org/pipermail/python-dev/2014-July/135296.html

PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path (.full_name) attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 ).

Reject explicitly in PEP 471 the support for dir_fd parameter
+

aka the support for paths relative to directory descriptors.

Note: it is a *different* (but related) issue.


...
> Notes on exception handling
> ---
>
> ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods
> rather than attributes or properties, to make it clear that they may
> not be cheap operations, and they may do a system call. As a result,
> these methods may raise ``OSError``.
>
> For example, ``DirEntry.lstat()`` will always make a system call on
> POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a
> ``stat()`` system call on such systems if ``readdir()`` returns a
> ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under
> certain conditions or on certain file systems.
>
> For this reason, when a user requires fine-grained error handling,
> it's good to catch ``OSError`` around these method calls and then
> handle as appropriate.
>

I suggest documenting that next(os.scandir()) may raise OSError

e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir

Also, document whether os.scandir() itself may raise OSError (whether
opendir or other OS functions may be called before the first yield).


...
os.scandir() should allow the explicit cleanup
++

::
with closing(os.scandir()) as entries:
for _ in entries:
break

entries.close() is called that frees the resources if necessary, to
*avoid relying on garbage-collecti

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Akira Li
Nick Coghlan  writes:
...
> definition of floats and the definition of container invariants like
> "assert x in [x]")
>
> The current approach means that the lack of reflexivity of NaN's stays
> confined to floats and similar types - it doesn't leak out and infect
> the behaviour of the container types.
>
> What we've never figured out is a good place to *document* it. I
> thought there was an open bug for that, but I can't find it right now.

There was related issue "Tuple comparisons with NaNs are broken"
http://bugs.python.org/issue21873 
but it was closed as "not a bug" despite the corresponding behavior is
*not documented* anywhere.


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-14 Thread Akira Li
Nick Coghlan  writes:

> On 13 Jul 2014 20:54, "Tim Delaney"  wrote:
>>
>> On 14 July 2014 10:33, Ben Hoyt  wrote:
>>>
>>>
>>>
>>> If we go with Victor's link-following .is_dir() and .is_file(), then
>>> we probably need to add his suggestion of a follow_symlinks=False
>>> parameter (defaults to True). Either that or you have to say
>>> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
>>> less nice.
>>
>>
>> Absolutely agreed that follow_symlinks is the way to go, disagree on the
> default value.
>>
>>>
>>> Given the above arguments for symlink-following is_dir()/is_file()
>>> methods (have I missed any, Victor?), what do others think?
>>
>>
>> I would say whichever way you go, someone will assume the opposite. IMO
> not following symlinks by default is safer. If you follow symlinks by
> default then everyone has the following issues:
>>
>> 1. Crossing filesystems (including onto network filesystems);
>>
>> 2. Recursive directory structures (symlink to a parent directory);
>>
>> 3. Symlinks to non-existent files/directories;
>>
>> 4. Symlink to an absolutely huge directory somewhere else (very annoying
> if you just wanted to do a directory sizer ...).
>>
>> If follow_symlinks=False by default, only those who opt-in have to deal
> with the above.
>
> Or the ever popular symlink to "." (or a directory higher in the tree).
>
> I think os.walk() is a good source of inspiration here: call the flag
> "followlink" and default it to False.
>

Let's not multiply entities beyond necessity.

There is well-defined *follow_symlinks* parameter
https://docs.python.org/3/library/os.html#follow-symlinks
e.g., os.access, os.chown, os.link, os.stat, os.utime and many other
functions in os module support follow_symlinks parameter, see
os.supports_follow_symlinks.

os.walk is an exception that uses *followlinks*. It might be because it
is an old function e.g., newer os.fwalk uses follow_symlinks.



As it has been said: os.path.isdir, pathlib.Path.is_dir in Python
File.directory? in Ruby, System.Directory.doesDirectoryExist in Haskell,
`test -d` in shell do follow symlinks i.e., follow_symlinks=True as
default is more familiar for .is_dir method.

`cd path` in shell, os.chdir(path), `ls path`, os.listdir(path), and
os.scandir(path) itself follow symlinks (even on Windows:
http://bugs.python.org/issue13772 ). GUI file managers such as
`nautilus` also treat symlinks to directories as directories -- you may
click on them to open corresponding directories.

Only *recursive* functions such as os.walk, os.fwalk do not follow
symlinks by default, to avoid symlink loops. Note: the behavior is
consistent with coreutils commands such as `cp` that follows symlinks
for non-recursive actions but e.g., `du` utility that is inherently
recursive doesn't follow symlinks by default.

follow_symlinks=True as default for DirEntry.is_dir method allows to
avoid easy-to-introduce bugs while replacing old
os.listdir/os.path.isdir code or writing a new code using the same
mental model.


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 "scandir" accepted

2014-07-22 Thread Akira Li
Ben Hoyt  writes:

> I think if I were doing this from scratch I'd reimplement listdir() in
> Python as "return [e.name for e in scandir(path)]".
...
> So my basic plan is to have an internal helper function in
> posixmodule.c that either yields DirEntry objects or strings. And then
> listdir() would simply be defined something like "return
> list(_scandir(path, yield_strings=True))" in C or in Python.
>
> My reasoning is that then there'll be much less (if any) code
> duplication between scandir() and listdir().
>
> Does this sound like a reasonable approach?

Note: listdir() accepts an integer path (an open file descriptor that
refers to a directory) that is passed to fdopendir() on POSIX [4] i.e.,
*you can't use scandir() to replace listdir() in this case* (as I've
already mentioned in [1]). See the corresponding tests from [2].

[1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
[2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html

>From os.listdir() docs [3]:

> This function can also support specifying a file descriptor; the file
> descriptor must refer to a directory.

[3] https://docs.python.org/3.4/library/os.html#os.listdir
[4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 "scandir" accepted

2014-07-22 Thread Akira Li
Ben Hoyt  writes:

>> Note: listdir() accepts an integer path (an open file descriptor that
>> refers to a directory) that is passed to fdopendir() on POSIX [4] i.e.,
>> *you can't use scandir() to replace listdir() in this case* (as I've
>> already mentioned in [1]). See the corresponding tests from [2].
>>
>> [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
>> [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html
>>
>> From os.listdir() docs [3]:
>>
>>> This function can also support specifying a file descriptor; the file
>>> descriptor must refer to a directory.
>>
>> [3] https://docs.python.org/3.4/library/os.html#os.listdir
>> [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736
>
> Fair point.
>
> Yes, I hadn't realized listdir supported dir_fd (must have been
> looking at 2.x docs), though you've pointed it out at [1] above. and I
> guess I wasn't thinking about implementation at the time.

FYI, dir_fd is related but *different*: compare "specifying a file
descriptor" [1] vs. "paths relative to directory descriptors" [2].

"NOTE: os.supports_fd and os.supports_dir_fd are different sets." [3]:

  >>> import os
  >>> os.listdir in os.supports_fd
  True
  >>> os.listdir in os.supports_dir_fd
  False


[1] https://docs.python.org/3/library/os.html#path-fd
[2] https://docs.python.org/3/library/os.html#dir-fd
[3] https://mail.python.org/pipermail/python-dev/2014-July/135296.html

To be clear: *listdir() does not support dir_fd* though it can be
emulated using os.open(dir_fd=..).

You can safely ignore the rest of the e-mail until you want to implement
path-fd [1] support for os.scandir() in several months.

Here's code example that demonstrates both path-fd [1] and dir-fd [2]:

  import contextlib
  import os

  with contextlib.ExitStack() as stack:
  dir_fd = os.open('/etc', os.O_RDONLY)
  stack.callback(os.close, dir_fd)
  fd = os.open('init.d', os.O_RDONLY, dir_fd=dir_fd) # dir-fd [2]
  stack.callback(os.close, fd)
  print("\n".join(os.listdir(fd))) # path-fd [1]

It is the same as os.listdir('/etc/init.d') unless '/etc' is symlinked
to refer to another directory after the first os.open('/etc',..)
call. See also, os.fwalk(dir_fd=..) [4]

[4] https://docs.python.org/3/library/os.html#os.fwalk

> However, given that we have to support this for listdir() anyway, I
> think it's worth reconsidering whether scandir()'s directory argument
> can be an integer FD.

What is entry.path in this case? If input directory is a file descriptor
(an integer) then os.path.join(directory, entry.name) won't work.

"PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 )." [5]

[5] https://mail.python.org/pipermail/python-dev/2014-July/135441.html

On the other hand os.fwalk() [4] that supports both path-fd [1] and
dir-fd [2] could be implemented without entry.path property if
os.scandir() supports just path-fd [1]. os.fwalk() provides a safe way
to traverse a directory tree without symlink races e.g., [6]:

  def get_tree_size(directory):
  """Return total size of files in directory and subdirs."""
  return sum(entry.lstat().st_size
 for root, dirs, files, rootfd in fwalk(directory)
 for entry in files)

[6] http://legacy.python.org/dev/peps/pep-0471/#examples

where fwalk() is the exact copy of os.fwalk() except that it uses
_fwalk() which is defined in terms of scandir():

  import os

  # adapt os._fwalk() to use scandir() instead of os.listdir()
  def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks):
  # Note: This uses O(depth of the directory tree) file descriptors:
  # if necessary, it can be adapted to only require O(1) FDs, see
  # http://bugs.python.org/issue13734

  entries = scandir(topfd)
  dirs, nondirs = [], []
  for entry in entries: #XXX call onerror on OSError on next() and return?
  # report symlinks to directories as directories (like os.walk)
  #  but no recursion into symlinked subdirectories unless
  #  follow_symlinks is true

  # add dangling symlinks as nondirs (DirEntry.is_dir() doesn't
  #  raise on broken links)
  try:
  (dirs if entry.is_dir() else nondirs).append(entry)
  except FileNotFoundError:
  continue # ignore disappeared files

  if topdown:
  yield toppath, dirs, nondirs, topfd

  for entry in dirs:
  try:
  orig_st = entry.stat(follow_symlinks=follow_symlinks)
  #XXX O_DIRECTORY, O_CLOEXEC, [? O_NOCTTY, O_SEARCH ?]
  dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd)
  except OSError as err:
 

Re: [Python-Dev] Exposing the Android platform existence to Python modules

2014-08-01 Thread Akira Li
Shiz  writes:

> Hi folks,
>
> I’m working on porting CPython to the Android platform, and while
> making decent progress, I’m currently stuck at a higher-level issue
> than adding #ifdefs for __ANDROID__ to C extension modules.
>
> The idea is, not only CPython extension modules have some assumptions
> that don’t seem to fit Android’s mold, some default Python-written
> modules do as well. However, whereas CPython extensions can trivially
> check if we’re building for Android by checking the __ANDROID__
> compiler macro, Python modules can do no such check, and are left
> wondering how to figure out if the platform they are currently running
> on is an Android one. To my knowledge there is no reliable way to
> detect if one is using Android as a vehicle for their journey using
> any other way.
>
> Now, the main question is: what would be the best way to ‘expose’ the
> indication that Android is being ran on to Python-living modules? My
> own thought was to add sys.getlinuxuserland(), or
> platform.linux_userland(), in similar vein to sys.getwindowsversion()
> and platform.linux_distribution(), which could return information
> about the userland of running CPython instance, instead of knowing
> merely the kernel and the distribution.
>
> This way, code could trivially check if it ran on the GNU(+associates)
> userland, or under a BSD-ish userland, or Android… and adjust its
> behaviour accordingly.
>
> I would be delighted to hear comments on this proposal, or better yet,
> alternative solutions. :)
>
> Kind regards,
> Shiz
>
> P.S.: I am well aware that Android might as well never be officially
> supported in CPython. In that case, consider this a thought experiment
> of how it /would/ be handled. :)

Python uses os.name, sys.platform, and various functions from `platform`
module to provide version info:

- coarse: os.name is 'posix', 'nt', 'ce', 'java' [1]. It is defined by
  availability of some builtin modules ('posix', 'nt' in
  particular) at import time.

- finer: sys.platform may start with freebsd, linux, win, cygwin, darwin
 (`uname -s`). It is defined at python build time.

- detailed: `platform` module. It provides as much info as possible
e.g., platform.uname(), platform.platform().
It may use runtime commands to get it.

If Android is posixy enough (would `posix` module work on Android?)
then os.name could be left 'posix'.

You could set sys.platform to 'android' (like sys.platform may be
'cygwin' on Windows) if Android is not like *any other* Linux
distribution (from the point of view of writing a working Python code on
it) i.e., if Android is further from other Linux distribution than
freebsd, linux, darwin from each other then it might deserve
sys.platform slot.

If sys.platform is left 'linux' (like sys.platform is 'darwin' on iOS)
then platform module could be used to detect Android e.g.,
platform.linux_distribution() though (it might be removed in Python 3.6)
it is unpredictable [2] unless you fix it on your python distribution,
e.g., here's an output on my machine:

  >>> import platform
  >>> platform.linux_distribution()
  ('Ubuntu', '14.04', 'trusty')

For example:

  is_android = (platform.linux_distribution()[0] == 'Android')

You could also define platform.android_version() that can provide Android
specific version details as much as you need:

  is_android = bool(platform.android_version().release)

You could provide an alias android_ver (like existing java_ver, libc_ver,
mac_ver, win32_ver).

See also, "When to use os.name, sys.platform, or platform.system?" [3]

Unrelated, TIL [4]:

  Android is a Linux distribution according to the Linux Foundation

[1] https://docs.python.org/3.4/library/os.html#os.name
[2] http://bugs.python.org/issue1322
[3]
http://stackoverflow.com/questions/4553129/when-to-use-os-name-sys-platform-or-platform-system
[4] http://en.wikipedia.org/wiki/Android_(operating_system)


btw, does it help adding os.get_shell_executable() [5] function, to
avoid hacking subprocess module, so that os.confstr('CS_PATH') or
os.defpath on Android could be defined to include /system/bin instead?

[5] http://bugs.python.org/issue16353


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exposing the Android platform existence to Python modules

2014-08-03 Thread Akira Li
Guido van Rossum  writes:

> Well, it really does look like checking for the presence of those ANDROID_*
> environment variables it the best way to recognize the Android platform.
> Anyone can do that without waiting for a ruling on whether Android is Linux
> or not (which would be necessary because the docs for sys.platform are
> quite clear about its value on Linux systems). Googling terms like "is
> Android Linux" suggests that there is considerable controversy about the
> issue, so I suggest you don't wait. :-)

I don't see sysconfig mentioned in the discussion (maybe for a
reason). It might provide build-time information e.g.,

  built_for_android = 'android' in sysconfig.get_config_var('MULTIARCH')

assuming the complete value is something like 'arm-linux-android'.  It
says that the python binary is built for android (the current platform
may or may not be Android).


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exposing the Android platform existence to Python modules

2014-08-03 Thread Akira Li
Shiz  writes:

> The most obvious change would be to subprocess.Popen(). The reason a
> generic approach there won't work is also the reason I expect more
> changes might be needed: the Android file system doesn't abide by any
> POSIX file system standards. Its shell isn't located at /bin/sh, but at
> /system/bin/sh. The only directories it provides that are POSIX-standard
> are /dev and /etc, to my knowledge. You could check to see if
> /system/bin/sh exists and use that first, but that would break the
> preferred shell on POSIX systems that happen to have /system for some
> reason or another. In short: the preferred shell on POSIX systems is
> /bin/sh, but on Android it's /system/bin/sh. Simple existence checking
> might break the preferred shell on either. For more specific stdlib
> examples I'd have to check the test suite again.

FYI, /bin/sh is not POSIX, see
http://bugs.python.org/issue16353#msg224514


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread Akira Li
"Schmitt  Uwe (ID SIS)"  writes:

> I discovered a problem using cPickle.loads from CPython 2.7.6.
>
> The last line in the following code raises an infinite recursion
>
> class T(object):
>
> def __init__(self):
> self.item = list()
>
> def __getattr__(self, name):
> return getattr(self.item, name)
>
> import cPickle
>
> t = T()
>
> l = cPickle.dumps(t)
> cPickle.loads(l)
...
> Is this a bug or did I miss something ?

The issue is that your __getattr__ raises RuntimeError (due to infinite
recursion) for non-existing attributes instead of AttributeError. To fix
it, you could use object.__getattribute__:

  class C:
def __init__(self):
self.item = []
def __getattr__(self, name):
return getattr(object.__getattribute__(self, 'item'), name)

There were issues in the past due to {get,has}attr silencing
non-AttributeError exceptions; therefore it is good that pickle breaks
when it gets RuntimeError instead of AttributeError.


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.walk() is going to be *fast* with scandir

2014-08-11 Thread Akira Li
Armin Rigo  writes:

> On 10 August 2014 08:11, Larry Hastings  wrote:
>>> A small tip from my bzr days - cd into the directory before scanning it
>>
>> I doubt that's permissible for a library function like os.scandir().
>
> Indeed, chdir() is notably not compatible with multithreading.  There
> would be a non-portable but clean way to do that: the functions
> openat() and fstatat().  They only exist on relatively modern Linuxes,
> though.

There is os.fwalk() that could be both safer and faster than
os.walk(). It yields rootdir fd that can be used by functions that
support dir_fd parameter, see os.supports_dir_fd set. They use *at()
functions under the hood.

os.fwalk() could be implemented in terms of os.scandir() if the latter
would support fd parameter like os.listdir() does (be in os.supports_fd
set (note: it is different from os.supports_dir_fd)).

Victor Stinner suggested [1] to allow scandir(fd) but I don't see it
being mentioned in the pep 471 [2]: it neither supports nor rejects the
idea.

[1] https://mail.python.org/pipermail/python-dev/2014-July/135283.html
[2] http://legacy.python.org/dev/peps/pep-0471/


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline with statement line continuation

2014-08-13 Thread Akira Li
Nick Coghlan  writes:

> On 12 August 2014 22:15, Steven D'Aprano  wrote:
>> Compare the natural way of writing this:
>>
>> with open("spam") as spam, open("eggs", "w") as eggs, frobulate("cheese") as 
>> cheese:
>> # do stuff with spam, eggs, cheese
>>
>> versus the dynamic way:
>>
>> with ExitStack() as stack:
>> spam, eggs = [stack.enter_context(open(fname), mode) for fname, mode in
>>   zip(("spam", "eggs"), ("r", "w")]
>> cheese = stack.enter_context(frobulate("cheese"))
>> # do stuff with spam, eggs, cheese
>
> You wouldn't necessarily switch at three. At only three, you have lots
> of options, including multiple nested with statements:
>
> with open("spam") as spam:
> with open("eggs", "w") as eggs:
> with frobulate("cheese") as cheese:
> # do stuff with spam, eggs, cheese
>
> The "multiple context managers in one with statement" form is there
> *solely* to save indentation levels, and overuse can often be a sign
> that you may have a custom context manager trying to get out:
>
> @contextlib.contextmanager
> def dish(spam_file, egg_file, topping):
> with open(spam_file), open(egg_file, 'w'), frobulate(topping):
> yield
>
> with dish("spam", "eggs", "cheese") as spam, eggs, cheese:
> # do stuff with spam, eggs & cheese
>
> ExitStack is mostly useful as a tool for writing flexible custom
> context managers, and for dealing with context managers in cases where
> lexical scoping doesn't necessarily work, rather than being something
> you'd regularly use for inline code.
>
> "Why do I have so many contexts open at once in this function?" is a
> question developers should ask themselves in the same way its worth
> asking "why do I have so many local variables in this function?"

Multiline with-statement can be useful even with *two* context
managers. Two is not many.

Saving indentations levels along is a worthy goal. It can affect
readability and the perceived complexity of the code.

Here's how I'd like the code to look like:

  with (open('input filename') as input_file,
open('output filename', 'w') as output_file):
  # code with list comprehensions to transform input file into output file

Even one additional unnecessary indentation level may force to split
list comprehensions into several lines (less readable) and/or use
shorter names (less readable). Or it may force to move the inline code
into a separate named function prematurely, solely to preserve the
indentation level (also may be less readable) i.e.,

  with ... as input_file:
  with ... as output_file:
  ... #XXX indentation level is lost for no reason

  with ... as infile, ... as outfile: #XXX shorter names
  ...

  with ... as input_file:
  with ... as output_file:
  transform(input_file, output_file) #XXX unnecessary function

And (nested() can be implemented using ExitStack):

  with nested(open(..),
  open(..)) as (input_file, output_file):
  ... #XXX less readable

Here's an example where nested() won't help:

  def get_integers(filename):
  with (open(filename, 'rb', 0) as file,
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as 
mmapped_file):
  for match in re.finditer(br'\d+', mmapped_file):
  yield int(match.group())

Here's another:

  with (open('log'+'some expression that generates filename', 'a') as logfile,
redirect_stdout(logfile)):
  ...


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-16 Thread Akira Li
Steven D'Aprano  writes:

> On Wed, Sep 17, 2014 at 11:14:15AM +1000, Chris Angelico wrote:
>> On Wed, Sep 17, 2014 at 5:29 AM, R. David Murray  
>> wrote:
>
>> > Basically, we are pretending that the each smuggled
>> > byte is single character for string parsing purposes...but they don't
>> > match any of our parsing constants.  They are all "any character" matches
>> > in the regexes and what have you.
>> 
>> This is slightly iffy, as you can't be sure that one byte represents
>> one character, but as long as you don't much care about that, it's not
>> going to be an issue.
>
> This discussion would probably be a lot more easy to follow, with fewer 
> miscommunications, if there were some examples. Here is my example, 
> perhaps someone can tell me if I'm understanding it correctly.
>
> I want to send an email including the header line:
>
> 'Subject: “NOBODY expects the Spanish Inquisition!”'
>

  >>> from email.header import Header
  >>> h = Header('Subject: “NOBODY expects the Spanish Inquisition!”')
  >>> h.encode('utf-8')
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n 
=?utf-8?q?=E2=80=9D?='
  >>> h.encode()
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n 
=?utf-8?q?=E2=80=9D?='
  >>> h.encode('ascii')
  '=?utf-8?q?Subject=3A_=E2=80=9CNOBODY_expects_the_Spanish_Inquisition!?=\n 
=?utf-8?q?=E2=80=9D?='


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 481 - Migrate Some Supporting Repositories to Git and Github

2014-11-30 Thread Akira Li
Larry Hastings  writes:

> On 11/29/2014 04:37 PM, Donald Stufft wrote:
>> On Nov 29, 2014, at 7:15 PM, Alex Gaynor  wrote:
>>> Despite being a regular hg
>>> user for years, I have no idea how to create a local-only branch, or a 
>>> branch
>>> which is pushed to a remote (to use the git term).
>> I also don’t know how to do this.
>
> Instead of collectively scratching your heads, could one of you guys
> do the research and figure out whether or not hg supports this
> workflow?  One of the following two things must be true:
>
> 1. hg supports this workflow (or a reasonable fascimile), which may
>lessen the need for this PEP.
> 2. hg doesn't support this workflow, which may strengthen the need for
>this PEP.
>

Assuming git's "all work is done in a local branch" workflow, you could
use bookmarks with hg 

http://lostechies.com/jimmybogard/2010/06/03/translating-my-git-workflow-with-local-branches-to-mercurial/
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/#branching-with-bookmarks
http://mercurial.selenic.com/wiki/BookmarksExtension#Usage
http://stackoverflow.com/questions/1598759/git-and-mercurial-compare-and-contrast


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Alexander Belopolsky  writes:

> On Wed, Apr 8, 2015 at 3:57 PM, Isaac Schwabacher 
> wrote:
>>
>> On 15-04-08, Alexander Belopolsky wrote:
>> > With datetime, we also have a problem that POSIX APIs don't have to
> deal with: local time
>> > arithmetics. What is t + timedelta(1) when t falls on the day before
> DST change? How would
>> > you set the isdst flag in the result?
>>
>> It's whatever time comes 60*60*24 seconds after t in the same time zone,
> because the timedelta class isn't expressive enough to represent anything
> but absolute time differences (nor should it be, IMO).
>
> This is not what most uses expect.  The expect
>
> datetime(y, m, d, 12, tzinfo=New_York) + timedelta(1)
>
> to be
>
> datetime(y, m, d+1, 12, tzinfo=New_York)

It is incorrect. If you want d+1 for +timedelta(1); use a **naive**
datetime. Otherwise +timedelta(1) is +24h:

  tomorrow = tz.localize(aware_dt.replace(tzinfo=None) + timedelta(1), 
is_dst=None)
  dt_plus24h = tz.normalize(aware_dt + timedelta(1)) # +24h

*tomorrow* and *aware_dt* have the *same* time but it is unknown how
 many hours have passed if the utc offset has changed in between.
*dt_plus24h* may have a different time but there are exactly 24 hours
 have passed between *dt_plush24* and *aware_dt*
http://stackoverflow.com/questions/441147/how-can-i-subtract-a-day-from-a-python-date

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Lennart Regebro  writes:

> OK, so I realized another thing today, and that is that arithmetic
> doesn't necessarily round trip.
>
> For example, 2002-10-27 01:00 US/Eastern comes both in DST and STD.
>
> But 2002-10-27 01:00 US/Eastern STD minus two days is 2002-10-25 01:00
> US/Eastern DST

"two days" is ambiguous here. It is incorrect if you mean 48 hours (the
difference is 49 hours):

  #!/usr/bin/env python3
  from datetime import datetime, timedelta
  import pytz

  tz = pytz.timezone('US/Eastern')
  then_isdst = False # STD
  then = tz.localize(datetime(2002, 10, 27, 1), is_dst=then_isdst)
  now =  tz.localize(datetime(2002, 10, 25, 1), is_dst=None) # no utc transition
  print((then - now) // timedelta(hours=1))
  # -> 49

> However, 2002-10-25 01:00 US/Eastern DST plus two days is 2002-10-27
> 01:00 US/Eastern, but it is ambiguous if you want DST or not DST.

It is not ambiguous if you know what "two days" *in your particular
application* should mean (`day+2` vs. +48h exactly):

  print(tz.localize(now.replace(tzinfo=None) + timedelta(2), is_dst=then_isdst))
  # -> 2002-10-27 01:00:00-05:00 # +49h
  print(tz.normalize(now + timedelta(2))) # +48h
  # -> 2002-10-27 01:00:00-04:00

Here's a simple mental model that can be used for date arithmetics:

- naive datetime + timedelta(2) == "same time, elapsed hours unknown"
- aware utc datetime + timedelta(2) == "same time, +48h"
- aware datetime with timezone that may have different utc offsets at
different times + timedelta(2) == "unknown time, +48h"

"unknown" means that you can't tell without knowning the specific
timezone.

It ignores leap seconds.

The 3rd case behaves *as if* the calculations are performed using these
steps (the actual implementation may be different):

1. convert an aware datetime
object to utc (dt.astimezone(pytz.utc))
2. do the simple arithmetics using utc time
3. convert the result to the original pytz timezone (utc_dt.astimezone(tz))

you don't need `.localize()`, `.normalize()` calls here.

> And you can't pass in a is_dst flag to __add__, so the arithmatic must
> just pick one, and the sensible one is to keep to the same DST.
>
> That means that:
>
> tz = get_timezone('US/Eastern')
> dt = datetimedatetime(2002, 10, 27, 1, 0, tz=tz, is_dst=False)
> dt2 = dt - 420 + 420
> assert dt == dt2
>
> Will fail, which will be unexpected for most people.
>
> I think there is no way around this, but I thought I should flag for
> it. This is a good reason to do all your date time arithmetic in UTC.
>
> //Lennart

It won't fail:

  from datetime import datetime, timedelta
  import pytz

  tz = pytz.timezone('US/Eastern')
  dt = tz.localize(datetime(2002, 10, 27, 1), is_dst=False)
  delta = timedelta(seconds=420)

  assert dt == tz.normalize(tz.normalize(dt - delta) + delta)

The only reason `tz.normalize()` is used so that tzinfo would be correct
for the resulting datetime object; it does not affect the comparison otherwise:

  assert dt == (dt - delta + delta) #XXX tzinfo may be incorrect
  assert dt == tz.normalize(dt - delta + delta) # correct tzinfo for the final 
result

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Isaac Schwabacher  writes:
> ...
>
> I know that you can do datetime.now(tz), and you can do datetime(2013,
> 11, 3, 1, 30, tzinfo=zoneinfo('America/Chicago')), but not being able
> to add a time zone to an existing naive datetime is painful (and
> strptime doesn't even let you pass in a time zone). 

`.now(tz)` is correct. `datetime(..., tzinfo=tz)`) is wrong: if tz is a
pytz timezone then you may get a wrong tzinfo (LMT), you should use
`tz.localize(naive_dt, is_dst=False|True|None)` instead.

> ...

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Alexander Belopolsky  writes:

> Sorry for a truncated message.  Please scroll past the quoted portion.
>
> On Thu, Apr 9, 2015 at 10:21 PM, Alexander Belopolsky <
> alexander.belopol...@gmail.com> wrote:
>
>>
>> On Thu, Apr 9, 2015 at 4:51 PM, Isaac Schwabacher 
>> wrote:
>>
>>> > > > Well, you are right, but at least we do have a localtime utility
>>> hidden in the email package:
>>> > > >
>>> > > > >>> from datetime import *
>>> > > > >>> from email.utils import localtime
>>> > > > >>> print(localtime(datetime.now()))
>>> > > > 2015-04-09 15:19:12.84-04:00
>>> > > >
>>> > > > You can read  for the reasons it
>>> did not make into datetime.
>>> > >
>>> > > But that's restricted to the system time zone. Nothing good ever
>>> comes from the system time zone...
>>> >
>>> > Let's solve one problem at a time. ...
>>>
>>> PEP 431 proposes to import zoneinfo into the stdlib, ...
>>
>>
>> I am changing the subject so that we can focus on one question without
>> diverting to PEP-size issues that are better suited for python ideas.
>>
>> I would like to add a functionality to the datetime module that would
>> solve a seemingly simple problem: given a naive datetime instance assumed
>> to be in local time, construct the corresponding aware datetime object with
>> tzinfo set to an appropriate fixed offset datetime.timezone instance.
>>
>> Python 3 has this functionality implemented in the email package since
>> version 3.3, and it appears to work well even
>> in the ambiguous hour
>>
>> >>> from email.utils import localtime
>> >>> from datetime import datetime
>> >>> localtime(datetime(2014,11,2,1,30)).strftime('%c %z %Z')
>> 'Sun Nov  2 01:30:00 2014 -0400 EDT'
>> >>> localtime(datetime(2014,11,2,1,30), isdst=0).strftime('%c %z %Z')
>> 'Sun Nov  2 01:30:00 2014 -0500 EST'
>>
>> However, in a location with a more interesting history, you can get a
>> situation that
>>
>
> would look like this in the zoneinfo database:
>
> $ zdump -v  -c 1992 Europe/Kiev
> ...
> Europe/Kiev  Sat Mar 24 22:59:59 1990 UTC = Sun Mar 25 01:59:59 1990 MSK
> isdst=0
> Europe/Kiev  Sat Mar 24 23:00:00 1990 UTC = Sun Mar 25 03:00:00 1990 MSD
> isdst=1
> Europe/Kiev  Sat Jun 30 21:59:59 1990 UTC = Sun Jul  1 01:59:59 1990 MSD
> isdst=1
> Europe/Kiev  Sat Jun 30 22:00:00 1990 UTC = Sun Jul  1 01:00:00 1990 EEST
> isdst=1
> Europe/Kiev  Sat Sep 28 23:59:59 1991 UTC = Sun Sep 29 02:59:59 1991 EEST
> isdst=1
> Europe/Kiev  Sun Sep 29 00:00:00 1991 UTC = Sun Sep 29 02:00:00 1991 EET
> isdst=0
> ...
>
> Look what happened on July 1, 1990.  At 2 AM, the clocks in Ukraine were
> moved back one hour.  So times like 01:30 AM happened twice there on that
> day.  Let's see how Python handles this situation
>
> $ TZ=Europe/Kiev python3
 from email.utils import localtime
 from datetime import datetime
 localtime(datetime(1990,7,1,1,30)).strftime('%c %z %Z')
> 'Sun Jul  1 01:30:00 1990 +0400 MSD'
>
> So far so good, I've got the first of the two 01:30AM's.  But what if I
> want the other 01:30AM?  Well,
>
 localtime(datetime(1990,7,1,1,30), isdst=0).strftime('%c %z %Z')
> 'Sun Jul  1 01:30:00 1990 +0300 EEST'
>
> gives me "the other 01:30AM", but it is counter-intuitive: I have to ask
> for the standard (winter)  time to get the daylight savings (summer) time.
>

It looks incorrect. Here's the corresponding pytz code:

  from datetime import datetime
  import pytz

  tz = pytz.timezone('Europe/Kiev')
  print(tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=False).strftime('%c %z 
%Z'))
  # -> Sun Jul  1 01:30:00 1990 +0300 EEST
  print(tz.localize(datetime(1990, 7, 1, 1, 30), is_dst=True).strftime('%c %z 
%Z'))
  # -> Sun Jul  1 01:30:00 1990 +0400 MSD
  
See also "Enhance support for end-of-DST-like ambiguous time" [1]

[1] https://bugs.launchpad.net/pytz/+bug/1378150

`email.utils.localtime()` is broken:

  from datetime import datetime
  from email.utils import localtime

  print(localtime(datetime(1990, 7, 1, 1, 30)).strftime('%c %z %Z'))
  # -> Sun Jul  1 01:30:00 1990 +0300 EEST
  print(localtime(datetime(1990, 7, 1, 1, 30), isdst=0).strftime('%c %z %Z'))
  # -> Sun Jul  1 01:30:00 1990 +0300 EEST
  print(localtime(datetime(1990, 7, 1, 1, 30), isdst=1).strftime('%c %z %Z'))
  # -> Sun Jul  1 01:30:00 1990 +0300 EEST
  print(localtime(datetime(1990, 7, 1, 1, 30), isdst=-1).strftime('%c %z %Z'))
  # -> Sun Jul  1 01:30:00 1990 +0300 EEST
  

Versions:

  $ ./python -V
  Python 3.5.0a3+
  $ dpkg -s tzdata | grep -i version
  Version: 2015b-0ubuntu0.14.04

> The uncertainty about how to deal with the repeated hour was the reason why
> email.utils.localtime-like  interface did not make it to the datetime
> module.

"repeated hour" (time jumps back) can be treated like a end-of-DST
transition, to resolve ambiguities [1].

> The main objection to the isdst flag was that in most situations,
> determining whether DST is in effect is as hard as finding the UTC offset,
> so reducing the problem of finding the 

Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Alexander Belopolsky  writes:

> ...
> For most world locations past discontinuities are fairly well documented
> for at least a century and future changes are published with at least 6
> months lead time.

It is important to note that the different versions of the tz database
may lead to different tzinfo (utc offset, tzname) even for *past* dates.

i.e., (lt, tzid, isdst) is not enough because the result for (lt,
tzid(2015b), isdst) may be different from (lt, tzid(X), isdst)
where

lt = local time e.g., naive datetime
tzid = timezone from the tz database e.g., Europe/Kiev
isdst = a boolean flag for disambiguation
X != 2015b

In other words, a fixed utc offset might not be sufficient even for past
dates.

>...
> Moreover, a program that rejects invalid times on input, but stores them
> for a long time may see its database silently corrupted after a zoneinfo
> update.

> Now it is time to make specific proposal.  I would like to extend
> datetime.astimezone() method to work on naive datetime instances.  Such
> instances will be assumed to be in local time and discontinuities will be
> handled as follows:
>
>
> 1. wall(t) == lt has a single solution.  This is the trivial case and
> lt.astimezone(utc) and lt.astimezone(utc, which=i)  for i=0,1 should return
> that solution.
>
> 2. wall(t) == lt has two solutions t1 and t2 such that t1 < t2. In this
> case lt.astimezone(utc) == lt.astimezone(utc, which=0) == t1 and
>  lt.astimezone(utc, which=1) == t2.

In pytz terms: `which = not isdst` (end-of-DST-like transition: isdst
changes from True to False in the direction of utc time).

It resolves AmbiguousTimeError raised by `tz.localize(naive, is_dst=None)`.

> 3. wall(t) == lt has no solution.  This happens when there is UTC time t0
> such that wall(t0) < lt and wall(t0+epsilon) > lt (a positive discontinuity
> at time t0). In this case lt.astimezone(utc) should return t0 + lt -
> wall(t0).  I.e., we ignore the discontinuity and extend wall(t) linearly
> past t0.  Obviously, in this case the invariant wall(lt.astimezone(utc)) ==
> lt won't hold.   The "which" flag should be handled as follows:
>  lt.astimezone(utc) == lt.astimezone(utc, which=0) and lt.astimezone(utc,
> which=0) == t0 + lt - wall(t0+eps).

It is inconsistent with the previous case: here `which = isdst` but
`which = not isdst` above.

`lt.astimezone(utc, which=0) == t0 + lt - wall(t0+eps)` corresponds to:

  result = tz.normalize(tz.localize(lt, isdst=False))

i.e., `which = isdst` (t0 is at the start of DST and therefore isdst
changes from False to True).

It resolves NonExistentTimeError raised by `tz.localize(naive,
is_dst=None)`. start-of-DST-like transition ("Spring forward").

For example,

  from datetime import datetime, timedelta
  import pytz
  
  tz = pytz.timezone('America/New_York')
  # 2am -- non-existent time
  print(tz.normalize(tz.localize(datetime(2015, 3, 8, 2), is_dst=False)))
  # -> 2015-03-08 03:00:00-04:00 # after the jump (wall(t0+eps))
  print(tz.localize(datetime(2015, 3, 8, 3), is_dst=None))
  # -> 2015-03-08 03:00:00-04:00 # same time, unambiguous
  # 2:01am -- non-existent time
  print(tz.normalize(tz.localize(datetime(2015, 3, 8, 2, 1), is_dst=False)))
  # -> 2015-03-08 03:01:00-04:00
  print(tz.localize(datetime(2015, 3, 8, 3, 1), is_dst=None))
  # -> 2015-03-08 03:01:00-04:00 # same time, unambiguous
  # 2:59am non-existent time
  dt = tz.normalize(tz.localize(datetime(2015, 3, 8, 2, 59), is_dst=True))
  print(dt)
  # -> 2015-03-08 01:59:00-05:00 # before the jump (wall(t0-eps))
  print(tz.normalize(dt + timedelta(minutes=1)))
  # -> 2015-03-08 03:00:00-04:00


> With the proposed features in place, one can use the naive code
>
> t =  lt.astimezone(utc)
>
> and get predictable behavior in all cases and no crashes.
>
> A more sophisticated program can be written like this:
>
> t1 = lt.astimezone(utc, which=0)
> t2 = lt.astimezone(utc, which=1)
> if t1 == t2:
> t = t1
> elif t2 > t1:
> # ask the user to pick between t1 and t2 or raise
> AmbiguousLocalTimeError
> else:
> t = t1
> # warn the user that time was invalid and changed or raise
> InvalidLocalTimeError

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status on PEP-431 Timezones

2015-04-15 Thread Akira Li
Isaac Schwabacher  writes:

> On 15-04-15, Akira Li <4kir4...@gmail.com> wrote:
>> Isaac Schwabacher  writes:
>> > ...
>> >
>> > I know that you can do datetime.now(tz), and you can do datetime(2013,
>> > 11, 3, 1, 30, tzinfo=zoneinfo('America/Chicago')), but not being able
>> > to add a time zone to an existing naive datetime is painful (and
>> > strptime doesn't even let you pass in a time zone). 
>> 
>> `.now(tz)` is correct. `datetime(..., tzinfo=tz)`) is wrong: if tz is a
>> pytz timezone then you may get a wrong tzinfo (LMT), you should use
>> `tz.localize(naive_dt, is_dst=False|True|None)` instead.
>
> The whole point of this thread is to finalize PEP 431, which fixes the
> problem for which `localize()` and `normalize()` are workarounds. When
> this is done, `datetime(..., tzinfo=tz)` will be correct.
>
> ijs

The input time is ambiguous. Even if we assume PEP 431 is implemented in
some form, your code is still missing isdst parameter (or the
analog). PEP 431 won't fix it; it can't resolve the ambiguity by
itself. Notice is_dst paramter in the `tz.localize()` call (current
API).

.now(tz) works even during end-of-DST transitions (current API) when the
local time is ambiguous.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Aware datetime from naive local time Was: Status on PEP-431 Timezones

2015-04-17 Thread Akira Li
On Thu, Apr 16, 2015 at 1:14 AM, Alexander Belopolsky <
alexander.belopol...@gmail.com> wrote:

>
> On Wed, Apr 15, 2015 at 4:46 PM, Akira Li <4kir4...@gmail.com> wrote:
>
>> > Look what happened on July 1, 1990.  At 2 AM, the clocks in Ukraine were
>> > moved back one hour.  So times like 01:30 AM happened twice there on
>> that
>> > day.  Let's see how Python handles this situation
>> >
>> > $ TZ=Europe/Kiev python3
>> >>>> from email.utils import localtime
>> >>>> from datetime import datetime
>> >>>> localtime(datetime(1990,7,1,1,30)).strftime('%c %z %Z')
>> > 'Sun Jul  1 01:30:00 1990 +0400 MSD'
>> >
>> > So far so good, I've got the first of the two 01:30AM's.  But what if I
>> > want the other 01:30AM?  Well,
>> >
>> >>>> localtime(datetime(1990,7,1,1,30), isdst=0).strftime('%c %z %Z')
>> > 'Sun Jul  1 01:30:00 1990 +0300 EEST'
>> >
>> > gives me "the other 01:30AM", but it is counter-intuitive: I have to ask
>> > for the standard (winter)  time to get the daylight savings (summer)
>> time.
>> >
>>
>> It looks incorrect. Here's the corresponding pytz code:
>>
>>   from datetime import datetime
>>   import pytz
>>
>>   tz = pytz.timezone('Europe/Kiev')
>>   print(tz.localize(datetime(1990, 7, 1, 1, 30),
>> is_dst=False).strftime('%c %z %Z'))
>>   # -> Sun Jul  1 01:30:00 1990 +0300 EEST
>>   print(tz.localize(datetime(1990, 7, 1, 1, 30),
>> is_dst=True).strftime('%c %z %Z'))
>>   # -> Sun Jul  1 01:30:00 1990 +0400 MSD
>>
>> See also "Enhance support for end-of-DST-like ambiguous time" [1]
>>
>> [1] https://bugs.launchpad.net/pytz/+bug/1378150
>>
>> `email.utils.localtime()` is broken:
>>
>
> If you think there is a bug in email.utils.localtime - please open an
> issue at .
>
>

Your question below suggests that you believe it is not a bug i.e.,
`email.utils.localtime()` is broken *by design* unless you think it is ok
to ignore `+0400 MSD`.

pytz works for me (I can get both `+0300 EEST` and `+0400 MSD`).  I don't
think `localtime()` can be fixed without the tz database. I don't know
whether it should be fixed, let somebody else who can't use pytz to pioneer
the issue. The purpose of the code example is to **inform** that
`email.utils.localtime()` fails (it returns only +0300 EEST) in this case:


>>   from datetime import datetime
>>   from email.utils import localtime
>>
>>   print(localtime(datetime(1990, 7, 1, 1, 30)).strftime('%c %z %Z'))
>>   # -> Sun Jul  1 01:30:00 1990 +0300 EEST
>>   print(localtime(datetime(1990, 7, 1, 1, 30), isdst=0).strftime('%c %z
>> %Z'))
>>   # -> Sun Jul  1 01:30:00 1990 +0300 EEST
>>   print(localtime(datetime(1990, 7, 1, 1, 30), isdst=1).strftime('%c %z
>> %Z'))
>>   # -> Sun Jul  1 01:30:00 1990 +0300 EEST
>>   print(localtime(datetime(1990, 7, 1, 1, 30), isdst=-1).strftime('%c %z
>> %Z'))
>>   # -> Sun Jul  1 01:30:00 1990 +0300 EEST
>>
>>
>> Versions:
>>
>>   $ ./python -V
>>   Python 3.5.0a3+
>>   $ dpkg -s tzdata | grep -i version
>>   Version: 2015b-0ubuntu0.14.04
>>
>> > The uncertainty about how to deal with the repeated hour was the reason
>> why
>> > email.utils.localtime-like  interface did not make it to the datetime
>> > module.
>>
>> "repeated hour" (time jumps back) can be treated like a end-of-DST
>> transition, to resolve ambiguities [1].
>
>
> I don't understand what you are complaining about.  It is quite possible
> that pytz uses is_dst flag differently from the way email.utils.localtime
> uses isdst.
>
> I was not able to find a good description of what is_dst means in pytz,
> but localtime's isdst is documented as follows:
>
> a positive or zero value for *isdst* causes localtime to
> presume initially that summer time (for example, Daylight Saving Time)
> is or is not (respectively) in effect for the specified time.
>
> Can you demonstrate that email.utils.localtime does not behave as
> documented?
>


No need to be so defensive about it. *""repeated hour" (time jumps back)
can be treated like a end-of-DST transition, to resolve ambiguities [1]."*
is just a *an example* on how to fix the problem in the same way how it is
done in pytz:

  >>> from datetime import datetime
  >>> import pytz

Re: [Python-Dev] should tests be thread-safe?

2014-05-11 Thread Akira Li
Victor Stinner  writes:

> If you need a well defined environement, run your test in a subprocess.
> Depending on the random function, your test may be run with more threads.
> On BSD, it changes for example which thread receives a signal. Importing
> the tkinter module creates a "hidden" C thread for the Tk loop.

Does it mean that non-thread-safe tests can't be run using a GUI test
runner that is implemented using tkinter?


--
akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subprocess shell=True on Windows doesn't escape ^ character

2014-06-13 Thread Akira Li
Florian Bruhin  writes:

> * Nikolaus Rath  [2014-06-12 19:11:07 -0700]:
>> "R. David Murray"  writes:
>> > Also notice that using a list with shell=True is using the API
>> > incorrectly.  It wouldn't even work on Linux, so that torpedoes
>> > the cross-platform concern already :)
>> >
>> > This kind of confusion is why I opened http://bugs.python.org/issue7839.
>> 
>> Can someone describe an use case where shell=True actually makes sense
>> at all?
>> 
>> It seems to me that whenever you need a shell, the argument's that you
>> pass to it will be shell specific. So instead of e.g.
>> 
>> Popen('for i in `seq 42`; do echo $i; done', shell=True)
>> 
>> you almost certainly want to do
>> 
>> Popen(['/bin/sh', 'for i in `seq 42`; do echo $i; done'], shell=False)
>> 
>> because if your shell happens to be tcsh or cmd.exe, things are going to
>> break.
>
> My usecase is a spawn-command in a GUI application, which the user can
> use to spawn an executable. I want the user to be able to use the
> usual shell features from there. However, I also pass an argument to
> that command, and that should be escaped.

You should pass the command as a string and use cmd.exe quote rules [1]
(note: they are different from the one provided by
`subprocess.list2cmdline()` [2] that follows Microsoft C/C++ startup
code rules [3] e.g., `^` is not special unlike in cmd.exe case).

[1]: 
http://blogs.msdn.com/b/twistylittlepassagesallalike/archive/2011/04/23/everyone-quotes-arguments-the-wrong-way.aspx

[2]: 
https://docs.python.org/3.4/library/subprocess.html#converting-an-argument-sequence-to-a-string-on-windows

[3]: http://msdn.microsoft.com/en-us/library/17w5ykft%28v=vs.85%29.aspx


--
akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com