SOCKS5 + python = pain in you-know-where ?

2007-08-30 Thread Valery
Hi all

just googled both the web and groups. Who could believe in that:
nothing simple, helpful and working concerning SOCKS5 support in
python.

Anyone got success here?

Regards,

Valery.

-- 
http://mail.python.org/mailman/listinfo/python-list


a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

2009-12-09 Thread Valery
Hi all,

Q: how to organize parallel accesses to a huge common read-only Python
data structure?

Details:

I have a huge data structure that takes >50% of RAM.
My goal is to have many computational threads (or processes) that can
have an efficient read-access to the huge and complex data structure.

"Efficient" in particular means "without serialization" and "without
unneeded lockings on read-only data"

To what I see, there are following strategies:

1. multi-processing
 => a. child-processes get their own *copies* of huge data structure
-- bad and not possible at all in my case;
 => b. child-processes often communicate with the parent process via
some IPC -- bad (serialization);
 => c. child-processes access the huge structure via some shared
memory approach -- feasible without serialization?! (copy-on-write is
not working here well in CPython/Linux!!);

2. multi-threading
 => d. CPython is told to have problems here because of GIL --  any
comments?
 => e. GIL-less implementations have their own issues -- any hot
recommendations?

I am a big fan of parallel map() approach -- either
multiprocessing.Pool.map or even better pprocess.pmap. However this
doesn't work straight-forward anymore, when "huge data" means >50%
RAM
;-)

Comments and ideas are highly welcome!!

Here is the workbench example of my case:

##
import time
from multiprocessing import Pool
def f(_):
time.sleep(5) # just to emulate the time used by my
computation
res = sum(parent_x) # my sofisticated formula goes here
return res

if __name__ == '__main__':
parent_x = [1./i for i in xrange(1,1000)]# my huge read-
only data :o)
p = Pool(7)
res= list(p.map(f, xrange(10)))
# switch to ps and see how fast your free memory is getting
wasted...
print res
##

Kind regards
Valery
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

2009-12-10 Thread Valery
Hi Klauss,

> How's the layout of your data, in terms # of objects vs. bytes used?

dict (or list) of 10K-100K objects. The objects are lists or dicts.
The whole structure eats up to 2+ Gb RAM


> Just to have an idea of the overhead involved in refcount
> externalization (you know, what I mentioned 
> here:http://groups.google.com/group/unladen-swallow/browse_thread/thread/9...
> )

yes, I've understood the idea explained by you there.

regards,
Valery
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

2009-12-26 Thread Valery
Hi Antoine

On Dec 11, 3:00 pm, Antoine Pitrou  wrote:
> I was going to suggest memcached but it probably serializes non-atomic
> types. It doesn't mean it will be slow, though. Serialization implemented
> in C may well be faster than any "smart" non-serializing scheme
> implemented in Python.

No serializing could be faster than NO serializing at all :)

If child process could directly read the parent RAM -- what could be
better?

> What do you call "problems because of the GIL"? It is quite a vague
> statement, and an answer would depend on your OS, the number of threads
> you're willing to run, and whether you want to extract throughput from
> multiple threads or are just concerned about latency.

it seems to be a known fact, that only one CPython iterpreter will be
running at a time, because a thread is aquiring the GIL during the
execution and other threads within same process are then just waiting
for GIL to be released.


> In any case, you have to do some homework and compare the various
> approaches on your own data, and decide whether the numbers are
> satisfying to you.

well, I the least evil is to pack-unpack things into array.array and/
or similarly NumPy.

I do hope that Klauss' patch will be accepted, because it will let me
to forget a lot of those unneeded packing-unpacking.


> > I am a big fan of parallel map() approach
>
> I don't see what map() has to do with accessing data. map() is for
> *processing* of data. In other words, whether or not you use a map()-like
> primitive does not say anything about how the underlying data should be
> accessed.

right. However, saying "a big fan" has had another focus here: if you
write your code based on maps then you have a tiny effort to convert
your code into a MULTIprocessing one :)

just that.

Kind regards.
Valery
-- 
http://mail.python.org/mailman/listinfo/python-list


multiprocessing + console + windows = challenge?

2009-09-26 Thread Valery
Hi all.

So, the doc is pitiless:

   "Note Functionality within this package requires that the __main__
method be importable by the children. This is covered in Programming
guidelines however it is worth pointing out here. This means that some
examples, such as the multiprocessing.Pool examples will not work in
the interactive interpreter. For example:"

My question:

Q: did any one manage to resurrect multiprocessing module in
interactive Python console? Especially interesting would be on
Windows :)

pprocess library works in console when on Linux, but it doesn't on
Windows :-/

regards
Valery
-- 
http://mail.python.org/mailman/listinfo/python-list


"from logging import *" causes an error under Ubuntu Karmic

2009-10-04 Thread Valery
Hi all

is it a pure Ubuntu Karmic (beta) issue?..

$ python
Python 2.6.3 (r263:75183, Oct  3 2009, 11:20:50)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from logging import *
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'NullHandler'

$ uname -a
Linux vaktop 2.6.31-11-generic #38-Ubuntu SMP Fri Oct 2 11:55:55 UTC
2009 i686 GNU/Linux

--
regards
Valery
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "from logging import *" causes an error under Ubuntu Karmic

2009-10-04 Thread Valery
OK, I've filed a bug. Because Python2.5 works fine here.
--
Valery
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to initialize each multithreading Pool worker with an individual value?

2010-12-02 Thread Valery
On Dec 1, 3:24 am, James Mills  wrote:
> I assume you are talking about multiprocessing
> despite you mentioning "multithreading" in the mix.

yes, sorry.


> Have a look at the source code for multiprocessing.pool
> and how the Pool object works and what it does
> with the initializer argument. I'm not entirely sure it
> does what you expect and yes documentation on this
> is lacking...

I see

I found my way "to seed" each member of Pool with own data. I do it
right after after initialization:

port = None
def port_seeder(port_val)
from time import sleep
sleep(1) # or less...
global port
port = port_val

if __name__ == '__main__':
pool = Pool(3)
pool.map(port_seeder, range(3), chunksize=1)
# now child processes are initialized with individual values.

Another (a bit more heavier) approach would be via shared resource.

P.S. sorry, I found your answer only now.

reagrds
--
Valery
-- 
http://mail.python.org/mailman/listinfo/python-list


Anything better than asyncio.as_completed() and asyncio.wait() to manage execution of large amount of tasks?

2014-07-15 Thread Valery Khamenya
Hi,

both asyncio.as_completed() and asyncio.wait() work with lists only. No
generators are accepted. Are there anything similar to those functions that
pulls Tasks/Futures/coroutines one-by-one and processes them in a limited
task pool?

I have gazillion of Tasks, and do not want to instantiate them all at once,
but to instantiate and to address them one by one as the running tasks are
completed.

best regards
--
Valery
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Anything better than asyncio.as_completed() and asyncio.wait() to manage execution of large amount of tasks?

2014-07-20 Thread Valery Khamenya
Hi Maxime,

many thanks for your great solution. It would be so great to have it in
stock asyncio and use it out-of-the-box...
I've made 4 fixes to it that are rather of "cosmetic" nature. Here is the
final version:

import asyncio
from concurrent import futures


def as_completed_with_max_workers(tasks, *, loop=None, max_workers=5,
timeout=None):
loop = loop if loop is not None else asyncio.get_event_loop()
workers = []
pending = set()
done = asyncio.Queue(maxsize=max_workers, loop=loop) # Valery: respect
the "loop" parameter
exhausted = False
    timeout_handle = None # Valery: added to see, if we indeed have to call
timeout_handle.cancel()

@asyncio.coroutine
def _worker():
nonlocal exhausted
while not exhausted:
try:
t = next(tasks)
pending.add(t)
yield from t
yield from done.put(t)
pending.remove(t)
except StopIteration:
exhausted = True

def _on_timeout():
for f in workers:
f.cancel()
workers.clear()
# Wake up _wait_for_one()
done.put_nowait(None)

@asyncio.coroutine
def _wait_for_one():
f = yield from done.get()
if f is None:
raise futures.TimeoutError()
return f.result()

workers = [asyncio.async(_worker(), loop=loop) for _ in
range(max_workers)] # Valery: respect the "loop" parameter

if workers and timeout is not None:
timeout_handle = loop.call_later(timeout, _on_timeout)

while not exhausted or pending or not done.empty():
yield _wait_for_one()

if timeout_handle: # Valery: call timeout_handle.cancel() only if it is
needed
timeout_handle.cancel()


best regards
--
Valery A.Khamenya
-- 
https://mail.python.org/mailman/listinfo/python-list


asyncio with map&reduce flavor and without flooding the event loop

2014-08-03 Thread Valery Khamenya
Hi all

I am trying to use asyncio in real applications and it doesn't go that
easy, a help of asyncio gurus is needed badly.

Consider a task like crawling the web starting from some web-sites. Each
site leads to generation of new downloading tasks in exponential(!)
progression. However we don't want neither to flood the event loop nor to
overload our network. We'd like to control the task flow. This is what I
achieve well with modification of nice Maxime's solution proposed here:
https://mail.python.org/pipermail/python-list/2014-July/675048.html

Well, but I'd need as well a very natural thing, kind of map() & reduce()
or functools.reduce() if we are on python3 already. That is, I'd need to
call a "summarizing" function for all the downloading tasks completed on
links from a page. This is where i fail :(

I'd propose an oversimplified but still a nice test to model the use case:
Let's use fibonacci function implementation in its ineffective form.
That is, let the coro_sum() be our reduce() function and coro_fib be our
map().
Something like this:

@asyncio.coroutine
def coro_sum(x):
return sum(x)

@asyncio.coroutine
def coro_fib(x):
if x < 2:
return 1
res_coro =
executor_pool.spawn_task_when_arg_list_of_coros_ready(coro=coro_sum,

 arg_coro_list=[coro_fib(x - 1), coro_fib(x - 2)])
return res_coro

So that we could run the following tests.

Test #1 on one worker:

  executor_pool = ExecutorPool(workers=1)
  executor_pool.as_completed( coro_fib(x) for x in range(20) )

Test #2 on two workers:
  executor_pool = ExecutorPool(workers=2)
  executor_pool.as_completed( coro_fib(x) for x in range(20) )

It would be very important that both each coro_fib() and coro_sum()
invocations are done via a Task on some worker, not just spawned implicitly
and unmanaged!

It would be cool to find asyncio gurus interested in this very natural goal.
Your help and ideas would be very much appreciated.

best regards
--
Valery
-- 
https://mail.python.org/mailman/listinfo/python-list


Problem: neither urllib2.quote nor urllib.quote encode the unicode strings arguments

2008-10-03 Thread Valery Khamenya
Hi all

things like urllib.quote(u"пиво Müller ") fail with error message:
: u'\u043f'

Similarly with urllib2.

Anyone got a hint?? I need it to form the URI containing non-ascii chars.
thanks in advance,
best regards
--
Valery
--
http://mail.python.org/mailman/listinfo/python-list


How to initialize each multithreading Pool worker with an individual value?

2010-11-30 Thread Valery Khamenya
Hi,

multithreading.pool Pool has a promissing initializer argument in its
constructor.
However it doesn't look possible to use it to initialize each Pool's
worker with some individual value (I'd wish to be wrong here)

So, how to initialize each multithreading Pool worker with the
individual values?

The typical use case might be a connection pool, say, of 3 workers,
where each of 3 workers has its own TCP/IP port.

from multiprocessing.pool import Pool

def port_initializer(_port):
global port
port = _port

def use_connection(some_packet):
global _port
print "sending data over port # %s" % port

if __name__ == "__main__":
ports=((4001,4002, 4003), )
p = Pool(3, port_initializer, ports) # oops... :-)
some_data_to_send = range(20)
p.map(use_connection, some_data_to_send)


best regards
--
Valery A.Khamenya
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to initialize each multithreading Pool worker with an individual value?

2010-12-01 Thread Valery Khamenya
Hi Dan,

> If you create in the parent a queue in shared memory (multiprocessing
> facilitates this nicely), and fill that queue with the values in your
> ports tuple, then you could have each child in the worker pool extract
> a single value from this queue so each worker can have its own, unique
> port value.

this port number is supposed to be used once, namely during
initialization. Quite usual situation with conections is so, that it
is a bit expensive to initiate it each time the connection is about to
be used. So, it is often initialized once and dropped only when all
communication is done.

In contrast, your case looks for me that you rather propose to
initiate the connection each time a new job comes from queue for an
execution.

Regards,
Valery
-- 
http://mail.python.org/mailman/listinfo/python-list