Python ORM library for distributed mostly-read-only objects?
My problem: I have a large database of interconnected objects which I need to process with a combination of short- and long-lived workers. These objects are mostly read-only (i.e. any of them can be changed/marked-as-deleted, but that happens infrequently). The workers may or may not be within one Python process, or even on one system. I've been doing this with a "classic" session-based SQLAlchemy ORM, approach, but that ends up way too slow and memory intense, as each thread gets its own copy of every object it needs. I don't want that. My existing code does object loading and traversal by simple attribute access; I'd like to keep that if at all possible. Ideally, what I'd like to have is an object server which mediates write access to the database and then sends change/invalidation notices to the workers. (Changes are infrequent enough that I don't care if a worker gets a notice it's not interested in.) I don't care if updates are applied immediately or are only visible to the local process until committed. I also don't need fancy indexing or query abilities; if necessary I can go to the storage backend for that. (That should be SQL, though a NoSQL back-end would be nice to have.) Does something like this already exist, somewhere out there, or do I need to write this, or does somebody know of an alternate solution? -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
On Sunday, June 22, 2014 3:49:53 PM UTC+2, Roy Smith wrote: > Can you give us some more quantitative idea of your requirements? How > many objects? How much total data is being stored? How many queries > per second, and what is the acceptable latency for a query? Not yet, A whole lot, More than fits in memory, That depends. To explain. The data is a network of diverse related objects. I can keep the most-used objects in memory but not all of them. Indeed, I _need_ to keep them, otherwise this will be too slow, even when using Mongo instead of SQLAlchemy. Which objects are "most-used" changes over time. I could work with MongoEngine by judicious hacking (augment DocumentField dereferencing with a local cache), but that leaves the update problem. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
memcache (or redis or ...) would be an option. However, I'm not going to go through the network plus deserialization for every object, that'd be too slow - thus I'd still need a local cache - which needs to be invalidated. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python ORM library for distributed mostly-read-only objects?
On Monday, June 23, 2014 5:54:38 PM UTC+2, Lie Ryan wrote: > If you don't want each thread to have their own copy of the object, > > Don't use thread-scoped session. Use explicit scope instead. How would that work when multiple threads traverse the in-memory object structure and cause relationships to be loaded? IIRC sqlalchemy's sessions are not thread safe. -- https://mail.python.org/mailman/listinfo/python-list
