from:"Joachim König"

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Joachim König


Guido van Rossum wrote:

On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes  wrote:
  

[...]
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.
[...]


If I understand the post properly, it's up to the app to call fsync(),
and it's only necessary when you're doing one of the rename dances, or
updating a file in place. Basically, as he explains, fsync() is a very
heavyweight operation; I'm against calling it by default anywhere.

  
To me, the flaw seem to be in the close() call (of the operating 
system). I'd expect the data to be
in a persistent state once the close() returns. So there would be no 
need to fsync if the file gets closed anyway.


Of course the close() call could take a while (up to 30 seconds in 
laptop mode), but if one does
not want to wait that long, than one can continue without calling 
close() and take the risk.


Of course, if the data should be on a persistant storage without closing 
the file (e.g. for database
applications), than one has to carefully call the different sync 
methods, but that's an other story.


Why has this ext4 problem not come up for other filesystems?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] PEP 376 - from PyPM's point of view

2009-07-15 Thread Joachim König


Tarek Ziadé wrote:

So basically "site-packages" is a distribution location that is
avoided by everyone because it doesn't
know how to handle multiple versions. 
I think you overrate the importance of having multiple versions of a 
package available
for the same python interpreter. If you have m different versions of n 
packages then

you could have n**m different combinations for an application so you need a
possiblilty to select one combination from n**m possible ones at application
startup time. Is this really worth it?


If we had a multi-versions
support protocol, that would
help os packagers and application developers to be friends again imho ;)
  
Let's remove site-packages from Python then.
The _one_ site-packages folder stands for _one_ python interpreter. All 
the clever
efforts to provide a set of package versions at runtime to an 
application (that uses
the singleton python interpreter) do logically create a new python 
interpreter with
a site-packages folder that contains just the versions of the packages 
the application
needs, unfortunately by mucking with PYTHONPATH and .pth, 
site.py etc
making it very difficult to understand what is happening for the joe 
average python developer.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] PEP 376 - from PyPM's point of view

2009-07-16 Thread Joachim König


P.J. Eby wrote:

At 05:16 PM 7/15/2009 +0200, Joachim König wrote:

f you have m different versions of n packages then
you could have n**m different combinations for an application so you 
need a
possiblilty to select one combination from n**m possible ones at 
application

startup time. Is this really worth it?


Obviously yes, as neither buildout nor setuptools would exist 
otherwise.  ;-)
Nor would Fedora be packaging certain library versions as eggs 
specifically to get certain multi-version scenarios to work.


The specific solutions for handling n*m problems aren't fantastic, but 
they are clearly needed.

I still do not see the need.

IMO the whole obfuscation comes from fact that all versions of all 
packages are installed into
one location where python automaticallly looks for packages and then 
with a lot of magic the
packages are hidden from the interpreter and only specific requested 
versions are made "visible"

to the interpreter at runtime.

Why do the package have to be installed there at the first place?

For an application it would be enough to have an additional directory on 
its PYTHONPATH where
the packages required for this application would be installed. So a 
package could be installed either
to the common directory ("site-packages") or an application specific 
directory (e.g. something like
"app-packages//"). This approach has been used by Zope2 with 
its "private" lib/python

directory for years.

So one would have to set up the application specific packages before 
running the application, but the
whole clutter of uncounted versions of the same package in one directory 
could go away. The
"drawback" of this approach would be, that the same version of a package 
would have to be installed

multiple times if needed by different applications.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] How we can get rid of eggs for 2.6 and beyond

2008-03-21 Thread Joachim König

Phillip J. Eby wrote:
> Second, there were no uninstall tools for it, so I'd have had to 
> write one myself.  (Zed's "easy_f'ing_uninstall" to the contrary, it 
> ain't easy, and I have an aversion to deleting stuff on people's 
> systems without knowing what will break.  There's a big difference 
> between them typing 'rm -rf' themselves, and me doing it.)
>   
I think, the uninstall should _not_ 'rm -rf' but only 'rm' the files 
(and 'rmdir' directories, but not
recursively) that it created, and that have not been modified in the 
meantime (after
the installation). This can be easily achieved by recording a checksum 
(eg. md5 or sha)
upon installation and only deleting a file if the checksum is correct 
and only deleting directories when they
are empty (after the installed files in them have been deleted). 
Otherwise, the uninstall should
complain and leave the modified files installed.

Joachim
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

2008-04-10 Thread Joachim König

Phillip J. Eby wrote:
> It would be, if .eggs were a packaging format, rather than a binary 
> distribution/runtime format.
>
> Remember "eggs are to Python as jars are to Java" -- a Java .jar 
> doesn't contain documentation either, unless it's needed at 
> runtime.  Same for configuration files.
>   
But there's generally no need to easily have a look into a .class file 
with a tool the user
is used to whereas for python, we're often interested in knowing the 
details. And having
a zip file in my way to the source has left me frustrated often enough.

If you want to be consequent and consistent leave out the .py files from 
eggs, a jar file
normally doesn't contain .java sources files either.

> They're not system packages, in other words.  The assumption that 
> they are is another marketing failure, due to conflation of "package 
> == distribution of python code" and "package == thing you manage with 
> a system packager".  People see, "I put my package in an .egg" and 
> think it's the latter definition, when it's barely even the former.  :)
>   
I agree that they are not system packages, but I would have prefered to 
install multiple versions
of a package to separate "site-packages" directories, something that is 
really well understood by
most unsofisticated python programmers. The selection of the version 
could then be made at
runtime by a PYTHONPATH setting and not by fiddling with .pth files 
(something that could
be autmated by a tool and persisted in batch files).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Joachim König-Baltes

[EMAIL PROTECTED] wrote:

[...]
> microtreading:
>   Exploiting language features to use cooperative multitasking in tasks
>   that "read" like they are single-threaded.
>
> asynchronous IO:
>   Performing IO to/from an application in such a way that the
>   application does not wait for any IO operations to complete, but
>   rather polls for or is notified of the readiness of any IO operations.
>
>   
[...]
> Asyncore *only* implements asynchronous IO -- any "tasks" performed in
> its context are the direct result of an IO operation, so it's hard to
> say it implements cooperative multitasking (and Josiah can correct me if
> I'm wrong, but I don't think it intends to).
>
> Much of the discussion here has been about creating a single, unified
> asynchronous IO mechanism that would support *any* kind of cooperative
> multitasking library.  I have opinions on this ($0.02 each, bulk
> discounts available), but I'll keep them to myself for now.
>   
Talking only about async I/O in order to write cooperative tasks that 
"smell" single threaded is to
restricted IMO.

If there are a number of cooperative tasks that "read" single-threaded 
(or sequential) than the goal
is to avoid a _blocking operation_ in any of them because the other 
tasks could do useful things
in the meantime.

But there are a number of different blocking operations, not only async 
IO (which is easily
handled by select()) but also:

- waiting for a child process to exit
- waiting for a posix thread to join()
- waiting for a signal/timer
- ...

Kevent (kernel event) on BSD e.g. tries to provide a common 
infrastructure to provide a file descriptor
where one can push some conditions onto and select() until one of the 
conditions is met. Unfortunately,
thread joining is not covered by it, so one cannot wait (without some 
form of busy looping) until one
of the conditions is true if thread joining is one of them, but for all 
the other cases it would be possible.

There are many other similar approaches (libevent, notify, to name a few).

So in order to avoid blocking in a task, I'd prefer that the task:

- declaratively specifies what kind of conditions (events) it wants to 
wait for. (API)

If that declaration is a function call, then this function could 
implicitely yield if the underlying implementation
would be stackless or greenlet based.

Kevent on BSD systems already has a usable API for defining the 
conditions by structures and there is
also a python module for it.

The important point IMO is to have an agreed API for declaring the 
conditions a task wants to wait for.
The underlying implementation in a scheduler would be free to use 
whatever event library it wants to
use.

E.g. have a wait(events = [], timeout = -1) method would be sufficient 
for most cases, where an event would specify

- resource type (file, process, timer, signal, ...)
- resource id (fd, process id, timer id, signal number, ...)
- filter/flags (when to fire, e.g. writable, readable exception for fd, ...)
- ...

the result could be a list of events that have "fired", more or less 
similar to the events in the argument list,
but with added information on the exact condition.

The task would return from wait(events) when at least 1 of the 
conditions is met. The task then knows e.g.
that an fd is readable and can then do the read() on its own in the way 
it likes to do it, without being forced
to let some uber framework do the low level IO. Just the waiting for 
conditions without blocking the
application is important.

I have implemented something like the above, based on greenlets.

In addition to the event types specified by BSD kevent(2) I've added a 
TASK and CHANNEL resource type
for the events, so that I can wait for tasks to complete or send/receive 
messages to/from other tasks without
blocking the application.

But the implementation is not the important thing, the API is, and then 
we can start writing competing implementations.

Joachim

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Joachim König-Baltes

Adam Olsen wrote:
> I agree with everything except this.  A simple function call would
> have O(n) cost, thus being unacceptable for servers with many open
> connections.  Instead you need it to maintain a set of events and let
> you add or remove from that set as needed.
We can learn from kevent here, it already has EV_ADD,
EV_DELETE,  EV_ENABLE, EV_DISABLE, EV_ONESHOT
flags. So the event-conditions would stay "in the scheduler" (per task)
so that they can fire multiple times without the need to be handled
over again and again.

Thanks, that's exactly the discussion I'd like to see, discussing about
a simple API.

>> I have implemented something like the above, based on greenlets.
>
> I assume greenlets would be an internal implementation detail, not
> exposed to the interface?
Yes, you could use stackless, perhaps even Twisted,
but I'm not sure if that would work because the requirement for the
"reads single-threaded" is the simple wait(...) function call that does 
a yield
(over multiple stack levels down to the function that created the task),
something that is only provided by greenlet and stackless to my knowledge.

Joachim

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Joachim König-Baltes

Adam Olsen schrieb:
> I don't think we're on the same page then.  The way I see it you want
> a single async IO implementation shared by everything while having a
> collection of event loops that cooperate "just enough".  The async IO
> itself would likely end up being done in C.
>
No, I'd like to have:

- An interface for a task to specifiy the events it's interested in, and 
waiting
  for at least one of the events (with a timeout).
- an interface for creating a task (similar to creating a thread)
- an interface for a schedular to manage the tasks

When a task resumes after a wait(...) it knows which of the events it was
waiting for have fired. It can then do whatever it wants, do the low 
level non
blocking IO on its own or using something else. Of course, as these are
cooperative tasks, they still must be careful not to block, e.g. not reading
to much from a file descriptor that is readable, but these problems have
been solved in a lot of libraries, and I would not urge the task to use a
specific way to accomplish its "task".

The problem solved by this approach is to allow a number of cooperating
threads to wait for an event without the need to busy loop or block by 
delegating
the waiting to a central instance, the scheduler. How this efficient 
waiting is
implemented is the responsability of the scheduler, but the schedular would
not do the (possibly blocking) io operation, it would only guaranty to
continue a task, when it can do an IO operation without blocking.

Joachim

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Joachim König-Baltes

Joachim König-Baltes wrote:
> The problem solved by this approach is to allow a number of cooperating
> threads to wait for an event without the need to busy loop or block by 
> delegating
> the waiting to a central instance, the scheduler. How this efficient 
> waiting is
> implemented is the responsability of the scheduler, but the schedular would
> not do the (possibly blocking) io operation, it would only guaranty to
> continue a task, when it can do an IO operation without blocking.
>
>   
 From the point of view of the task, it only has to sprinkle a number of 
wait(...) calls
before doing blocking calls, so there is no need to use callbacks or 
writing the
inherently sequential code upside down. That is the gain I'm interested in.

The style used in asyncore, inheriting from a class and calling return 
in a method
and being called later at a different location (different method) just 
interrupts the
sequential flow of operations and makes it harder to understand. The same is
true for all other strategies using callbacks or similar mechanisms.

All this can be achieved with a multilevel yield() that is hidden in a 
function call.
So the task does a small step down (wait) in order to jump up (yield) to 
the scheduler
without disturbing the eye of the beholder.

Joachim

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Twisted Isn't Specific (was Re: Trial balloon: microthreads library in stdlib)

2007-02-16 Thread Joachim König-Baltes

[EMAIL PROTECTED] wrote:

> When you boil it down, Twisted's event loop is just a notification for 
> "a connection was made", "some data was received on a connection", "a 
> connection was closed", and a few APIs to listen or initiate different 
> kinds of connections, start timed calls, and communicate with threads. 
>  All of the platform details of how data is delivered to the 
> connections are abstracted away.  How do you propose we would make a 
> less "specific" event mechanism?
But that is exactly the problem I have with Twisted. For HTTP it creates 
its own set of notifications instead of structuring the code similar to
SocketServer (UDP and TCP), BaseHTTPServer, SimpleHTTPServer etc which 
are well understood in the python community and e.g.
used by medusa and asyncore. Having to completely restructure one's own 
code is a high price.

Giving control away into a big framework that calls my own code for  not 
so easy to understand reasons (for a twisted noob) does not give
me a warm feeling. It's o.k. for complex applications like web servers 
but for small networking applications I'd like to have a chance to
understand what's going on. Asyncore is so simple that it's easy to 
follow when I let it do the select() for me.

That said, I conclude that the protocol implementations are superb but 
unfortunately to tightly coupled to the Twisted philosophy, sitting
in the middle, trying to orchestrate instead of being easy to integrate.

Joachim
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-26 Thread Joachim König-Baltes

Armin Rigo wrote:
> I just realized that this is not really true in the present context.
> If the goal is to support programs that "look like" they are
> multi-threaded, i.e. don't use callbacks, as I think is Joachim's goal,
> then most of the time the wait() function would be only called with a
> *single* event, rarely two or three, never more.  Indeed, in this model
> a large server is implemented with many microthreads: at least one per
> client.  Each of them blocks in a separate call to wait().  In each such
> call, only the events revelant to that client are mentioned.
>   
Yes exactly.

> In other words, the cost is O(n), but n is typically 1 or 2.  It is not
> the total number of events that the whole application is currently
> waiting on.  Indeed, the scheduler code doing the real OS call (e.g. to
> select()) can collect the events in internal dictionaries, or in Poll
> objects, or whatever, and update these dictionaries or Poll objects with
> the 1 or 2 new events that a call to wait() introduces.  In this
> respect, the act of *calling* wait() already means "add these events to
> the set of all events that need waiting for", without the need for a
> separate API for doing that.
>   
But as I'd like to make the event structure similar to the BSD-kevent 
structure, we could use
a flag in the event structure that tells the schedular to consider it 
only once or keep it in its
dictionary, than the task would not need to supply the event each time.

> [I have experimented myself with a greenlet-based system giving wrapper
> functions for os.read()/write() and socket.recv()/send(), and in this
> style of code we tend to simply spawn new greenlets all the time.  Each
> one looks like an infinite loop doing a single simple job: read some
> data, process it, write the result somewhere else, start again.  (The
> loops are not really infinite; e.g. if sockets are closed, an exception
> is generated, and it causes the greenlet to exit.)  So far I've managed
> to always wait on a *single* event in each greenlet, but sometimes it
> was a bit contrieved and being able to wait on 2-3 events would be
> handy.]
>   
I do not spawn new greenlets all the time. Instead, my tasks either 
wait() or call wrappers
for read/write/send/recv... that implicitely call wait(...) until enough 
data is available, and the
wait(...) does the yield to the scheduler that can either continue other 
tasks or call kevent/poll/select
if no task is runnable.

What I'd like to see in an API/library:

* a standard schedular that is easily extensible
* event structure/class  that's  easily extensible

E.g. I've extended the kevent structure for the scheduler to also 
include channels similar to
stackless. These are python only communication structures, so there is 
no OS support
for blocking on them, but the scheduler can decide if there is something 
available for a task
that waits on a channel, so the channels are checked first in the 
schedular to see if a task
can continue and only if no channel event is available the schedular 
calls kevent/select/poll.

While the scheluler blocks in kevent/select/poll, nothing happens on the 
channels as no
task is running, so the scheduler never blocks (inside the OS) 
unnecessarily.

Joachim








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-26 Thread Joachim König-Baltes

Adam Olsen wrote:
> That would depend on whether Joachim's wait() refers to the individual
> tasks' calls or the scheduler's call.  I assumed it referred to the
> scheduler.  In the basic form it would literally be select.select(),
> which has O(n) cost and often fairly large n.
The wait(events, timeout) call of a task would only mention the events
that the task is interested in. The wait() call yields that list to the 
scheduler.

The scheduler then analyzes the list of events that tasks are waiting for
and compares it to it's last call to select/poll/kevent and continues
tasks in a round robin fashion until all events have been scheduled to
the waiting tasks. Only when the scheduler has no events to deliver
(e.g. all tasks are waiting) a new select/poll/kevent OS call is made
by the scheduler, with a computed timeout to the lowest timeout value
of all the tasks, so that a timeout can be delivered at the right time.

Joachim
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

Re: [Python-Dev] [Distutils] PEP 376 - from PyPM's point of view

Re: [Python-Dev] [Distutils] PEP 376 - from PyPM's point of view

Re: [Python-Dev] How we can get rid of eggs for 2.6 and beyond

Re: [Python-Dev] [Distutils] how to easily consume just the parts of eggs that are good for you

Re: [Python-Dev] microthreading vs. async io

Re: [Python-Dev] microthreading vs. async io

Re: [Python-Dev] microthreading vs. async io

Re: [Python-Dev] microthreading vs. async io

Re: [Python-Dev] Twisted Isn't Specific (was Re: Trial balloon: microthreads library in stdlib)

Re: [Python-Dev] microthreading vs. async io

Re: [Python-Dev] microthreading vs. async io

12 matches

Site Navigation

Mail list logo

Footer information