Hello everyone.

I started using Django about a week ago. I have a particular app which
more or less accepts CSV files, Excel files, DBase files, and whatever
else, takes the uploaded file, converts it to a tab-delimited format,
and then does statistical work over it.

Originally this application was written in PHP by someone else. I've
since taken it and rewrote it in Rails. Rails didn't have good
libraries (I had to call out to Python programs anyway), and it was
slow, so I rewrote it in Python, in the Pylons web framework. As I
said, about a week ago I started using Django, and I more or less
directly converted this app from Pylons to Django.

Pretty much all the web app itself does is display a form to upload the
file, it uploads the file, and then passes if off to an outside Python
library which does the conversion to tab. Once that is done, the
framework (in this case a Django view function) takes control again and
does the statistical work over the tab-delimited file. The code more or
less for the Pylons and Django versions is identical. There are
obviously small changes here and there to fit each framework, but the
controller or view code has changed very little.

Monday I got the entire app converted over to Django. I uploaded my
first file, a DBase file, and immediately noticed that it was taking
*forever*. After 24 seconds, I finally got my stats page. This same
process takes 3 seconds in Pylons. Something definitely wasn't right
here - my Django app was actually slower than both my Rails and the PHP
app.

So I started doing some tests to isolate the problem. I brought the
issue up in #django on freenode IRC, and someone immediately suggested
that the problem might be the dev server. Know that the dev server very
well could be at fault, I took a little script and got my Django app
running on the exact same Paste WSGI server that my Pylons app was
running on. Again, it took 24 seconds for it to run this 6 MB file.

Continuing on this testing this morning, I realized that a very good
way to test out exactly where bottleneck existed was to cut out the
uploading process alltogether - if the process finished very quickly on
a file that was already uploaded to the local filesystem, then the
problem existed within how Django's actual upload process. Sure enough,
when I had the process run on an already-uploaded file, the process
took 3 seconds. So uploading the file was taking 21 out of 24 seconds.

Again I brough this up in the IRC chat. Someone told me that nobody was
going to take my serious because I wasn't running Django the
'preferred' way, ie, on mod_python/Apache. I really didn't think this
was the limiting factor, but I installed Apache and mod_python and got
it all setup anyway. Again, it took 24 seconds for this file to upload
and process. That was 8x as long as my Pylons app running on its dinky
little WSGI Python server.

At this point I was able to narrow down the issue:

* it had to do with Django's upload process
* it was an equal problem on any server, whether Django's dev server,
the Paste server, or Apache

I ran some profiling in order to narrow the problem even further. This
first link is a profile of the view that displays the form. This view
actual doesn't do much, as I said, it pretty much just displays the
form. When the form is submitted via a POST request, it is sent to this
second view (the second link). This is where the upload takes place,
the processing happens, and the stats are finally displayed.

http://paste.e-scribe.com/1564/
http://paste.e-scribe.com/1565/

Someone suggested that an already pending patch would fix the problem.
Ticket 1484, which has been superseeded by Ticket 2070
(http://code.djangoproject.com/ticket/2070) has to do with streaming
uploads. This afternoon I applied the most recent patch in Ticket 2070,
and suprisingly, not only did it work, it also didn't have any effect
on the upload issue. Still the same 24 seconds.

I also discovered some other strange stuff. The 6 MB file which I had
been uploading was a DBase file. I uploaded a 7 MB Excel file, and it
took 17 seconds. I uploaded a 1 MB Excel file and it took 2 seconds. I
tried to upload a 13 MB CSV file and it was at 70+ seconds and still
not finished.

There doesn't seem to be any common pattern between all this. The
filetype really shouldn't make any difference, because as I said
earlier, both my Pylons app and Django app were using the same outside
library in the same way in order to conver t the file.

So I'm a bit stuck here. I'd love to use Django, but I cannot have it
running 3x slower than another Python framework. We do a lot of file
processing here. Hopefully with all this data someone will be able to
come up with some kind of idea as to what the problem might be and what
solution can be applied.

Thanks,
jp


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Reply via email to