Python ORM library for distributed mostly-read-only objects?

2014-06-22 Thread smurfix
My problem: I have a large database of interconnected objects which I need to 
process with a combination of short- and long-lived workers. These objects are 
mostly read-only (i.e. any of them can be changed/marked-as-deleted, but that 
happens infrequently). The workers may or may not be within one Python process, 
or even on one system.

I've been doing this with a "classic" session-based SQLAlchemy ORM, approach, 
but that ends up way too slow and memory intense, as each thread gets its own 
copy of every object it needs. I don't want that.

My existing code does object loading and traversal by simple attribute access; 
I'd like to keep that if at all possible.

Ideally, what I'd like to have is an object server which mediates write access 
to the database and then sends change/invalidation notices to the workers. 
(Changes are infrequent enough that I don't care if a worker gets a notice it's 
not interested in.)

I don't care if updates are applied immediately or are only visible to the 
local process until committed. I also don't need fancy indexing or query 
abilities; if necessary I can go to the storage backend for that. (That should 
be SQL, though a NoSQL back-end would be nice to have.)

Does something like this already exist, somewhere out there, or do I need to 
write this, or does somebody know of an alternate solution?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-22 Thread smurfix
On Sunday, June 22, 2014 3:49:53 PM UTC+2, Roy Smith wrote:

> Can you give us some more quantitative idea of your requirements?  How 
> many objects?  How much total data is being stored?  How many queries 
> per second, and what is the acceptable latency for a query?

Not yet, A whole lot, More than fits in memory, That depends.

To explain. The data is a network of diverse related objects. I can keep the 
most-used objects in memory but not all of them. Indeed, I _need_ to keep them, 
otherwise this will be too slow, even when using Mongo instead of SQLAlchemy. 
Which objects are "most-used" changes over time.

I could work with MongoEngine by judicious hacking (augment DocumentField 
dereferencing with a local cache), but that leaves the update problem.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread smurfix
memcache (or redis or ...) would be an option. However, I'm not going to go 
through the network plus deserialization for every object, that'd be too slow - 
thus I'd still need a local cache - which needs to be invalidated.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python ORM library for distributed mostly-read-only objects?

2014-06-23 Thread smurfix
On Monday, June 23, 2014 5:54:38 PM UTC+2, Lie Ryan wrote:

> If you don't want each thread to have their own copy of the object, 
> 
> Don't use thread-scoped session. Use explicit scope instead.

How would that work when multiple threads traverse the in-memory object 
structure and cause relationships to be loaded?

IIRC sqlalchemy's sessions are not thread safe.
-- 
https://mail.python.org/mailman/listinfo/python-list