Dear Russell, PaulM and Django-devs, I'm Paolo Coffetti, a software engineer living in Amsterdam, the Netherlands. I'm very close to a master degree at University of Bergamo, Italy: I've finished all the courses and currently working on my thesis which I'll be defending on June 9. The thesis is about a project which I temporarily named Moogle (My Own Google). Moogle is a website with a full text search engine for private data: it connects to a user's accounts in Facebook, Twitter, Google (Drive, Gmail, Google Plus) and Dropbox, indexes all her data (only textual information) and provides a private full text search (available only to her). The project is not complete yet, but I'm working hard.
I've been working in Amsterdam in a small startup company named United Academics since 2010 as web developer. Officially I'm not working for them anymore because I'm 100% focused on my thesis, but from time to time I still help their IT team. United Academics is a company which aims today to provide an Open Access repository to scientists, but unfortunately is lately facing some financial difficulties. In the last years United Academics was something slightly different and I worked on projects like a printing on demand website (www.print2book.com), a job portal (now offline), a bookstore (now offline), a couple of Wordpress websites (http://www.united-academics.org/magazine/) and more recently an Open Access repository (not online yet). All those projects (apart from the Wordpress ones) were made in Python and Django so I consider myself an expert with those technologies. I'm definitely not a senior cause I still have a lot to learn, but I'm also definitely more than a junior. I spent some hours during the last days on Django's proposals for Google Summer of Code. I'm particularly interested in: Security Enhancements and Reducing coupling in Django components. I've got a good impression reading those ideas and I would like to ask you more details. This is not my official proposal, I haven't deeply studied the codebase yet, nor made a detailed plan, but only a first approach in order to get a clearer idea on what the aims are and see if I am on the right track. Also, I've never contributed to Django's code and, even tho I know Python quite well, all those ideas sound quite tough to me and I'm wondering if I'm good enough. *Reducing coupling in Django components* This is the idea I'd appreciate the most as Django user. Having the possibility to use only the template engine or the ORM or the form layer could be useful for many projects (I will give an example later). Anyway it is not clear to me if this idea is more about refactoring/reorganizing dependencies among modules in order to decouple parts of the system or if it is more about packaging. *Packaging part* Honestly I don't know much about packaging best-practices; I did some research today but it's not even clear to me whether Django uses Setuptools or Distutils. Anyway looking at the tickets for the packaging component I found something. https://code.djangoproject.com/ticket/18937 It's about adding metadata to Python distributions as described in PEP345, but actually as jezdez says "it's easy enough to put the info in a setup.cfg as it's implemented by distutils2/packaging... On the other hand as long as distutils2/packaging isn't official released/included in Python 3.X it may be senseless to do so." https://code.djangoproject.com/ticket/21108 Make pre-release installable via pip *Refactoring dependencies part* I guess this part requires a good overview on the entire system, which I don't have. But I have a great understanding of object oriented principles in Python and this could be useful. So in the end, in order to complete this task I would need a great guidance cause it's even hard to me to define what exactly are the activities to be done, estimate and prioritize them. I could for instance start with a single component like the ORM and clean it up. I investigated a little bit and django/db currently has the following dependencies: dispatch, apps, core, forms, utils. I actually feel very challenged by this activity, it looks like I can really learn a lot by looking into the core of Django and I don't mind to work hard! Plus I have an idea, not sure whether it is pertinent tho. I'm working on a project which has, among the others, 2 parts: - a Django website which has a database with OAuth tokens (and many other things) - some Python code which uses the same OAuth tokens to do some batch operations The 2 parts are independent sub-systems and they are deployed to different machines; but both parts must be able to read/update the same set of OAuth tokens. One solution would be using the ORM provided by Django and the same models in the Python batch code in order to read/update the same database (or its replica). I know it is possible to use the ORM outside of a web context (as James wrote a long while ago http://stackoverflow.com/a/584208/1969672) but we can do better and this idea of GSoC would maybe improve it. A second solution, way more flexible, would be as follows. Imagine if Django could provide a HTTP RESTful webservice to expose the data in its database together with an admin page where it is possible to select what models to expose and give the right permissions. Any external system in need to read/write those models could simply use this webservice. In my case the Python code could use SQLAlchemy or any other ORM and manage the synchronization with Django's database using the webservice. I know that there are many Django REST framework out there, so anyone could build such a webservice, but what about providing it by default in the same way as we provide Django's amazing admin backend? *Security Enhancements* I know something about security, some years ago I coauthored a chapter of a book published by O'Reilly about security: http://goo.gl/E7TkgH Anyway I don't consider myself an expert in web security. Some ideas that I found reading some tickets and some books: ** Enhancing CSRF protection* Some ideas have already been discussed in the ticket: https://code.djangoproject.com/ticket/16859 It's not clear to me tho whether we want or not tie CSRF to sessions (I guess not). ** Integrating carljm's django-secure project (https://github.com/carljm/django-secure)* This would be a great achievement and a good match with the next point ** Building an interactive admin dashboard to display and check installation security parameters* This idea challenges me a lot: a sort of monitoring and auditing tool which could inspect the code and give an overview of some security aspects. For instance it could: - Check the status of DEBUG mode - Check ALLOWED_HOSTS - Encourage the use of HTTPS for dynamic pages, static files, cookies; plus HSTS - Monitor HTML escaping exceptions like the safe template filter and the autoescape template tag - Monitor the use of eval in python code http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html - Monitor the use of pickle for de-serializing user input and YAML load in Python code: http://nadiaspot.com/why-python-pickle-is-insecure/ http://nedbatchelder.com/blog/201302/war_is_peace.html - Watch for raw SQL query to avoid SQL injection - When using FileField and ImageField advise the use of python-magic: https://github.com/ahupp/python-magic to check the file type And check that the directory where those file are saved doesn't allow them to be executed - Check for forms without CSRF protection - Discourage the use of Meta.exclude and suggest the use of Meta.fields instead - Suggest to change the default admin URL (I think it was even a suggestion from Jacob Kaplan-Moss) Use HTTPS for the admin site, limiting its access based on IPs and maybe even encourage the use of django-admin-honeypot https://github.com/dmpayton/django-admin-honeypot - Check for potentially dangerous allow_tags attribute in the admin - Similar strategies (change url, use HTTPS, limiting access based on IP) for Django admindocs - Check for new release of Django (and maybe even the libraries used and listed in requirements.txt) ** SECRET_KEY in settings* Many people use environments variables to setup this variable I prefer to generate a random SECRET_KEY on deploy: secret_key_file = source_folder + '/<main_app>/secret_key.py' if not exists(secret_key_file): chars = 'abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*(-_=+)' key = ''.join(random.SystemRandom().choice(chars) for _ in range(50)) append(secret_key_file, "SECRET_KEY = '%s'" % (key,)) append(settings_path, '\nfrom .secret_key import SECRET_KEY') Maybe we could implement a similar strategy by default Some resources where we can find other info: http://www.amazon.com/The-Tangled-Web-Securing-Applications/dp/1593273886/?ie=UTF8&tag=cn-001-20 http://www.amazon.com/The-Web-Application-Hackers-Handbook/dp/1118026470/?ie=UTF8&tag=cn-001-20 https://code.google.com/p/browsersec/wiki/Main https://wiki.mozilla.org/WebAppSec/Secure_Coding_Guidelines I've never worked for an Open Source project but I've been willing to do that for many years, so I'm excited to finally have the chance to do so. Also, please consider that I'd really love to take part of a Google Summer of Code project, so I will apply for more than one project (mainly Python projects), but honestly working on Django is my dream! I also see that Django is very popular and I'll be competing with many smart students, so could you please suggest me which idea I should focus more on? I have a little preference for the second one, because I find it more challenging, shall I go for that or do you already have a designated student? Sorry for the lengthy email... Kind regards, Paolo -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/25e64f09-9f2f-4012-b832-4b3165a3d5d6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
