Thanks Tim for the info. This is the discussion mentioned in the ticket (from 2012) https://groups.google.com/d/topic/django-developers/vtMVq8jwnf8/discussion
The solutions that ptone suggests in the ticket don't really work for Heroku. Also, to sync static files from local is not a good solution for example when using CI. And still there's the situation when trying to upload old files, like for example during a rollback. At the end, the main problem is that collectstatic is using two different backends. One is being provided by each of the static file finders (settings.STATICFILE_FINDERS) and the other one is the one defined in settings.STATICFILES_STORAGE. As there's not a standard hash method, the Storage superclass can't force to implement a standard hash method for all its subclasses. Maybe a solution would be to shift the responsibility of detecting a file change from collectstatic to the STATICFILES_STORAGE? In this way we provide the flexibility of letting the Storage subclasses to decide how they want to check if a file has changed, they can use any technique they like and keep consistent. A rough and simplified example: # django/core/files/storage.py class Storage(object): def has_changed(self, source_storage, source_path, path): raise NotImplementedError() class FileSystemStorage(Storage): def has_changed(self, source_storage, source_path, path): return source_storage.modified_time(source_path) > self.modified_time(path) # django/contrib/staticfiles/management/commands/collectstatic.py class Command(BaseCommand): def delete_file(self, path, prefixed_path, source_storage): if self.storage.has_changed(source_storage, path, prefixed_path): self.storage.delete(prefixed_path) And then, anyone could do this in their own project (or even in django-storages): # my_app/storages/custom_s3_storage.py class MyStorage(S3BotoStorage): def has_changed(self, source_storage, source_path, path): try: local_md5 = source_storage.get_md5(source_path) except (NotImplementedError, AttributeError): with source_storage.open(source_path) as source_file: local_md5 = hashlib.md5(source_file.read()).hexdigest() return self.get_md5(path) != local_md5 def get_md5(self, path): return self.bucket.get_key(path).md5 It keeps backward compatibility and allows the possibility to use any comparison method by any Storage subclass. On Friday, April 15, 2016 at 1:34:19 AM UTC+1, Tim Graham wrote: > > A proposal to use checksums was closed as wontfix in > https://code.djangoproject.com/ticket/19021. > > On Thursday, April 14, 2016 at 1:16:39 PM UTC-4, bliy...@rentlytics.com > wrote: >> >> This makes a lot of sense to me. >> >> On Tuesday, April 12, 2016 at 9:07:51 AM UTC-7, Daniel Blasco wrote: >>> >>> Hi, >>> >>> I posted this in django-users but I think that it goes better here. >>> >>> >>> I'm using django-storages to upload my static files to Amazon S3 and I'm >>> serving my application from Heroku. >>> >>> In my local development, when I run collectstatic for a second time just >>> after the first one, no files are being uploaded to S3 because >>> collectstatic checks for the modified_time to determine if the local files >>> are newer than the ones in S3. That's fine so far. >>> >>> The problem is when I deploy to Heroku. Collectstatic is being executed >>> from the Heroku server and absolutely all the files are always being >>> uploaded to S3, even the ones that have not changed. This is because during >>> the deployment Heroku creates a full copy of the source code, and therefore >>> all the files have a new modified_time. In my case, it takes almost 10 >>> minutes to upload ~1000 files for each deployment. >>> >>> Also, imagine the situation where the modified_times are not being >>> changed and I wanted to upload older versions of the static files. I wont >>> be able because storage wouldn't allow to upload files with an older >>> modified_time. >>> >>> I think that a more accurate way to check if a file needs to be replaced >>> could be by comparing their checksum/hash and offer this feature for all >>> the Storage subclasses. To preserve backwards compatibility, in >>> collectstatic command first determine if the storage subclass implements a >>> checksum generation and otherwise fallback to modified_time comparison. >>> >>> >>> What do you think, is this something that makes sense? >>> >> -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/aa13552c-de08-4573-b00b-6be898d1a7b5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.