Re: Multiple database support
Hi guys, I've been heavily swamped with work for college, so I missed this thead and the few others on multiple databases. Sorry. I have implemented a proof-of-concept database scaling solution for Django. It tackles all kind of scaling issues I have seen in Django. It's purpose is mainly to find out if we could scale up Django at all. I didn't worry too much about syntax and the way it's supposed to integrate into Django - I just hacked away in Django code to make it work the fastest possible way I could think of. The solution covers the largest part of Simon's #2 problem. I added a few attributes and config parameters to the ORM so you can decide which models are hosted on which server. One model can be hosted on 20 servers with the actual location depending on a foreign key value. We're using it to store data for different groups on different servers for a more horizontal scaling. For example if a photo got a ForeignKey to group A it will be routed to server 15 because of some logic. You can also route objects 1-1000 to server 1 and 1001-2000 to server 2. I have also added database denormalization, caching foreign key querysets to the DB, bulk prefetching, in-model privacy checks and a few other things. A large percentage of the stuff probably isn't suitable for Django- trunk. Most of it tackles quite specific and hard scaling issues, but I guess there's a way to build it more modular and make it work for more people. After all I'm new to Django-developers and also to opening up my work. If some of you are interested in the code and would benefit from it I would be more than happy to share. Just posting a big pile of code probably won't help you too much, so I thought I'd write a few lines documentation about each part and post them here. Does that sound reasonable? Jan On May 22, 4:59 pm, Simon Willison <[EMAIL PROTECTED]> wrote: > I have to admit I'm slightly worried about the multi-database > proposal, because at the moment it doesn't seem to solve either of the > multi-db problems I'm concerned about. > > The proposal at the moment deals with having different models live in > different databases - for example, the Forum application lives on DB1 > while the Blog application lives on DB2. > > I can see how this could be useful, but the two database problems that > keep me up at night are the following: > > 1. Replication - being able to send all of my writes to one master > machine but spread all of my reads over several slave machines. > Thankfully Ivan Sagalaev's confusingly named mysql_cluster covers this > problem neatly without modification to Django core - it's just an > alternative DB backend which demonstrates that doing this isn't > particularly hard:http://softwaremaniacs.org/soft/mysql_cluster/en/ > > 2. Sharding - being able to put User entries 1-1000 on DB1, whereas > User entries 1001-2000 live on DB2 and so on. > > I'd love Django to have built-in abilities to solve #1 - it's a really > important first-step onscalingup to multiple databases, and it's > also massively easier than any other part of the multi-db problem. > > I wouldn't expect a magic solution to #2 because it's so highly > dependent on the application that is being built, but at the same time > it would be nice to see a multi-db solution at least take this in to > account (maybe just by providing an easy tool to direct an ORM request > to a specific server based on some arbitrary logic). > > I may have misunderstood the proposal, but I think it's vital that the > above two use cases are considered. Even if they can't be solved > outright, providing tools that custom solutions to these cases can be > built with should be a priority for multi-db support. > > Cheers, > > Simon --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Multiple database support
I've been doing a little reading on multi-db code an wiki. You've basically been tackling problem #3 (different data types and engines) - which I didn't care about at all. That's good, I guess. The way I handle database connections is just by having a connection pool of different connection objects alive at the same time and create new cursors on the connections I need. Since I have only implemented the most simple SELECT FROM WHERE for one and many rows I haven't worried too much about commit and rollback and stuff like that. So I don't get all of your code and why you need to use threadlocals and stuff like that. The basic thing I do when a Django Model is sharded to different server is this: 1. I write a get_shards classfunction for every model that does some logic and returns a list of one or more shard objects that have a link to this shard's connection. 2. Ask the model class on which shards it is, this returns a list of shard objects. 3. For each shard object I get a new database cursor from the connection which lives in a seperate connection object for every shard (I'm not perfectly sure if this is thread safe) 4. For each of those cursors I repeat the query Django wanted to run. Then I try to stich the responses together the best I can. You guys obviously know your code better than me. Should I start re- writing my code (necessary after queryset-refactoring) based on your patch? Jan On Jul 8, 4:04 pm, "Ben Ford" <[EMAIL PROTECTED]> wrote: > Hi Jan, > > It sounds like you've made great progress. We have an informal trac and hg > repo set up at trac and hg dot woe-beti.de respectively. you're more than > welcome to add your documentation there! Let me know if you want an hg repo > tp play with too and I'll sort it out for you. > > Cheers, > Ben > > 2008/7/8 Jan Oberst <[EMAIL PROTECTED]>: > > > > > > > Hi guys, > > > I've been heavily swamped with work for college, so I missed this > > thead and the few others on multiple databases. Sorry. > > > I have implemented a proof-of-concept database scaling solution for > > Django. It tackles all kind of scaling issues I have seen in Django. > > It's purpose is mainly to find out if we could scale up Django at all. > > I didn't worry too much about syntax and the way it's supposed to > > integrate into Django - I just hacked away in Django code to make it > > work the fastest possible way I could think of. > > > The solution covers the largest part of Simon's #2 problem. I added a > > few attributes and config parameters to the ORM so you can decide > > which models are hosted on which server. One model can be hosted on 20 > > servers with the actual location depending on a foreign key value. > > > We're using it to store data for different groups on different servers > > for a more horizontal scaling. For example if a photo got a ForeignKey > > to group A it will be routed to server 15 because of some logic. > > > You can also route objects 1-1000 to server 1 and 1001-2000 to server > > 2. > > > I have also added database denormalization, caching foreign key > > querysets to the DB, bulk prefetching, in-model privacy checks and a > > few other things. > > > A large percentage of the stuff probably isn't suitable for Django- > > trunk. Most of it tackles quite specific and hard scaling issues, but > > I guess there's a way to build it more modular and make it work for > > more people. After all I'm new to Django-developers and also to > > opening up my work. > > > If some of you are interested in the code and would benefit from it I > > would be more than happy to share. > > > Just posting a big pile of code probably won't help you too much, so I > > thought I'd write a few lines documentation about each part and post > > them here. Does that sound reasonable? > > > Jan --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Proposal: user-friendly API for multi-database support
On Sep 10, 7:53 pm, Simon Willison <[EMAIL PROTECTED]> wrote: > Dealing with single queries that span multiple databases > > > Once you have different tables living in different databases there's > always the chance that someone will try to write a query that attempts > to join tables that live on two different database servers. I don't > think we should address this problem at all (aside from maybe > attempting to throw a descriptive error message should it happen) - if > you're scaling across different servers you need to be aware of the > limitations of that approach. > > That said, databases like MySQL actually do allow cross-database joins > provided both databases live on the same physical server. Is this > something we should support? I'd like to say "no" and assume that > people who need to do that will be happy rolling their own SQL using a > raw cursor, but maybe I'm wrong and it's actually a common use case. > > Connection pooling > -- > > This is where I get completely out of my depth, but it seems like we > might need to implement connection pooling at some point since we are > now maintaining multiple connections to multiple databases. We could > roll our own solution here, but to my knowledge SQLAlchemy has a solid > connection pool implementation which is entirely separate from the > rest of the SQLAlchemy ORM. We could just ensure that if someone needs > connection pooling there's a documented way of integrating the > SQLAlchemy connection pool with Django - that way we don't have an > external dependency on SQL Alchemy for the common case but people who > need connection pools can still have them. > > Backwards compatibility > --- > > I think we can do all of the above while maintaining almost 100% > backwards with Django 1.0. In the absence of a DATABASES setting we > can construct one using Django's current DATABASE_ENGINE / > DATABASE_NAME / etc settings to figure out the 'default' connection. > Everything else should Just Work as it does already - the only people > who will need to worry are those who have hacked together their own > multi-db support based on Django internals. > I think sharding really is something every developer would do different because it's just so dependent on the actual business logic and how your models work. That said maybe there's a few things lots of people agree on and could build a base for more detailed implementations. At our (large-scale) project we chose Django because of the ORM and I implemented a complete sharding setup on top of the ORM. The important thing for me was to have quite a bit of magic when it comes to handling the DB. Because you specify relations between data and even specify how you use them (by building querysets) there's a lot of information to draw conclusions from. This can lead to a simple and plain interface. My goal was to have the same application code run on a single-DB machine and a sharded DB environment without changing anything. The problem with sharding is, that lots of operations won't work. No JOINs, no DB-side foreign keys, no transactions, no auto_increment IDs... Another few won't work the way they are supposed to. ORDER BY, LIMIT, COUNT and so on. But like some of you said that's just the way it is. If you want to use sharding you should take care of those exceptions yourself. Our project is based on profiles. And our profile-based sharding should put all data that's related to a profile on one shard. My sharding config now tells the ORM that there's models that are always on the same DB machine (photo sets and photos and comments for the photos for example). Now once you have a photo object and trigger that foreign key (photo.comments_set.all() example [1]) this query will go directly to the shard the photo itself is on. Since we already know that shard there's not even a need to look that up. Sharding exceptions occour when the ORM tries something it can't handle if the model is sharded. The great thing about this is, that we can reduce the amount of use cases where we actually have to spawn a query across multiple shards. This won't work in every environment. I implemented this by hacking quite a bit of Django source (mainly the descriptors of the ForeignKey field). It sure would be neat to have a proper API for this. The real problem with these techniques is not to implement them for youself. But to implement them for everyone else. Since I'm the one responsible for our ORM and models I can hack away just like this. Doing it properly seems like a whole other story. -Jan [1] - A little bit of my model code: This will put all Profile objects on shards 11-13 and establish dependencies so that dependent objects and querysets know a priori which shard to query. class ProfileShardManager(ShardManager): # Tell the model which shards it belongs to. use_shards = [11,12,13] def initial_id_to_shard_mapping
Re: Proposal: django.forms.SafeForm - forms with built in CSRF protection
On Sep 22, 10:25 pm, Simon Willison <[EMAIL PROTECTED]> wrote: > CSRF[1] is one of the most common web application vulnerabilities, but > continues to have very poor awareness in the developer community. > Django ships with CSRF protection in the form of middleware, but it's > off by default. I'm willing to bet most people don't turn it on. I agree that a middleware is the wrong place for this. And I'll definately have to implement some kind of CSRF protection some time in the future. I like it! > Why not build this in to django.forms.Form directly? Because CSRF is > only an issue for forms that are supposed to only be used by > authenticated users. Forms that don't require a cookie don't need to > be protected. I'd protect all my forms if there's a neat way to do it. Why would it only apply to logged-in users? I'm not using contrib.auth. Jan --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Proposal: django.forms.SafeForm - forms with built in CSRF protection
On Sep 23, 1:53 am, Tai Lee <[EMAIL PROTECTED]> wrote: > On Sep 23, 9:27 am, Simon Willison <[EMAIL PROTECTED]> wrote: > > > The significant downside is that having a render() method on a form > > that performs the same function as render_to_response feels really, > > really strange. It's convenient, but it just doesn't feel right and > > I'm not sure I can justify it. > > How would this work when you have multiple forms/modelforms/formsets > on one page? You could save a only verifier code in the cookie that is unique for all forms rendered on a single request object. Then the form would sign its csrf fields with that one unique code. Storing that unique code on the request object seems strange, though. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Proposal: django.forms.SafeForm - forms with built in CSRF protection
On Sep 23, 6:13 pm, oggy <[EMAIL PROTECTED]> wrote: > Could we just include something like a signed salt+timestamp > +REMOTE_ADDR in a hidden field? It's not exactly bulletproof because > of the possibility of a same-IP-CSRF (affecting people behind > proxies), but it's dead simple and doesn't require a lot of code > change: Form -> SafeForm + request as the first parameter to __init__. > Heck, I'd even trust sed to do it for me ;). Adding a signed field with a timestamp would be a much easier way to secure forms. But it's not nearly as as secure as having the token signed with an additional cookie. By setting a signed cookie you can verify that this very form was displayed to this very client. Also, you don't want to expire a form too early for people who just type slow. And if a token is available for too long someone can generate a proper token and then use it for an attack for too long. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Visual recognition of Django website
On Sep 21, 10:29 pm, Matt Boersma <[EMAIL PROTECTED]> wrote: > On Sep 21, 2007, at 2:22 PM, SmileyChris <[EMAIL PROTECTED]> wrote: > > > > > On Sep 19, 11:44 pm, Ned Batchelder <[EMAIL PROTECTED]> wrote: > >> Now we just need to get someone to put it on the site... > > > +1. Easy to do, looks good. > > +1. It's excellent. +1. Very nice work. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: _QuerySet.first()
> Does this differ from just using a slice does? > >foo = MyModel.objects.all()[0] > latest() involves an automated ORDER BY after the field you specified in Meta's get_latest_by. So it would do the same as a slice[0], if you sorted it correctly. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---