Thanks for Rory and Grahams suggestions. For various reasons I haven't had the time to reconfigure the app to get the stack traces output, but have been breaking down the directories into smaller chunks and the app has been running at an acceptable speed. The large directories seems to be the main issue that was killing performance. It is still about 1/2 the speed I would expect, and have consistently achieved at times.
I looked into implementing Celery for image processing but it seems this would require a considerable refactoring of my code, which is rather complex and I'd rather not change. But there is another issue which seems to be more a fundamental issue. I have configured the app to run as one daemon process, by not setting the processes option in the WSGIDaemonProcess directive, but it seems the app is somehow running with more than one process. The indications that this is the case are shown in the response to two routes which act on global variables in my main app.py file, and intermittently return different responses. For example, I have an "abort_upload" route which empties a python queue of active upload jobs. This queue is a global variable in my main app.py Flask file. Most of the time calling this route has no effect, the app continues to process the queue. And the response I get in the browser indicates there is nothing in the queue. But the app just keeps running for hours and hours which can only occur if there is work in the queue. At one point I tried calling the abort_upload route continuously and it eventually aborted the queue and responded with the number of items in the queue that I expected. I also have an ajax call which gets log data from the app every 2 seconds. Typically this works fine, but intermittently it will return log data that is half a day old, then continues to return the current log information. Graham suggested that there might be a situation where multiple threads are being executed for the same task. At one point we had configured the WSGIDaemonProcess directive for multiple processes and one thread based on a post on StackOverflow, which was a disaster as the app then began processing the same image in many concurrent threads! I have this exact same app running on a dev site with none of these issues. Both running Centos7, both running the same version of Apache, mod_wsgi and the Apache and WSGI configs look the same. But it does appear to be running more than one process on our production site. I have no idea how to determine that there are in fact multiple processes running or how to ensure that there is only one running in the first place. My understanding is that the WSGIDaemonProcess directive should be all I need to ensure a single process for my app and thus ensure only one instance of the global variables. Thanks in advance for taking the time to go through this and help me out. Gary On Tuesday, December 8, 2020 at 11:45:19 PM UTC-8 rorycl wrote: > On 09/12/20, Rory Campbell-Lange ([email protected]) wrote: > > On 08/12/20, Gary Conley ([email protected]) wrote: > > > I suspect I have some sort of issue with large directories. As a > workaround > > > I've been breaking the directories down into 4000 images at a time and > the > > > performance is acceptable. So, while image processing may not be a > great > > > idea, it is working well for me provided I don't have huge > directories. I > > > had one as large as 10,000 that also ran fine, but 30,000+ was a total > bust > > > with performance rapidly going from 2 images per second to 7 seconds > per > > > image. With 4000 images in a directory I get consistent performance of > 1-2 > > > images per second. > > > > Off topic, but I suggest not having more than 1,000 files per directory > > if you can manage it, as running "ls" against a directory with more > > images than than on cloud storage or indifferent storage backends will > > cause a noticeable lag. > > Torek's answer on Stack Overflow suggests that git restricts the number > of files in a directory to 6700 by default > > https://stackoverflow.com/a/18732276 > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/2b1135ae-0d0e-4054-9a77-78223c4f78a3n%40googlegroups.com.
