Thanks Tim for the info.

This is the discussion mentioned in the ticket (from 
2012) https://groups.google.com/d/topic/django-developers/vtMVq8jwnf8/discussion

The solutions that ptone suggests in the ticket don't really work for 
Heroku. Also, to sync static files from local is not a good solution for 
example when using CI. And still there's the situation when trying to 
upload old files, like for example during a rollback.

At the end, the main problem is that collectstatic is using two different 
backends. One is being provided by each of the static file finders 
(settings.STATICFILE_FINDERS) and the other one is the one defined in 
settings.STATICFILES_STORAGE. As there's not a standard hash method, the 
Storage superclass can't force to implement a standard hash method for all 
its subclasses. 

Maybe a solution would be to shift the responsibility of detecting a file 
change from collectstatic to the STATICFILES_STORAGE? In this way we 
provide the flexibility of letting the Storage subclasses to decide how 
they want to check if a file has changed, they can use any technique they 
like and keep consistent.

A rough and simplified example:

# django/core/files/storage.py
class Storage(object):
    def has_changed(self, source_storage, source_path, path):
        raise NotImplementedError()


class FileSystemStorage(Storage):
    def has_changed(self, source_storage, source_path, path):
        return source_storage.modified_time(source_path) > 
self.modified_time(path)


# django/contrib/staticfiles/management/commands/collectstatic.py
class Command(BaseCommand):
    def delete_file(self, path, prefixed_path, source_storage):
        if self.storage.has_changed(source_storage, path, prefixed_path):
            self.storage.delete(prefixed_path)


And then, anyone could do this in their own project (or even in 
django-storages):

# my_app/storages/custom_s3_storage.py
class MyStorage(S3BotoStorage):
    def has_changed(self, source_storage, source_path, path):
        try:
            local_md5 = source_storage.get_md5(source_path)
        except (NotImplementedError, AttributeError):
            with source_storage.open(source_path) as source_file:
                local_md5 = hashlib.md5(source_file.read()).hexdigest()

        return self.get_md5(path) != local_md5

    def get_md5(self, path):
        return self.bucket.get_key(path).md5



It keeps backward compatibility and allows the possibility to use any 
comparison method by any Storage subclass.





On Friday, April 15, 2016 at 1:34:19 AM UTC+1, Tim Graham wrote:
>
> A proposal to use checksums was closed as wontfix in 
> https://code.djangoproject.com/ticket/19021.
>
> On Thursday, April 14, 2016 at 1:16:39 PM UTC-4, bliy...@rentlytics.com 
> wrote:
>>
>> This makes a lot of sense to me.
>>
>> On Tuesday, April 12, 2016 at 9:07:51 AM UTC-7, Daniel Blasco wrote:
>>>
>>> Hi,
>>>
>>> I posted this in django-users but I think that it goes better here.
>>>
>>>
>>> I'm using django-storages to upload my static files to Amazon S3 and I'm 
>>> serving my application from Heroku.
>>>
>>> In my local development, when I run collectstatic for a second time just 
>>> after the first one, no files are being uploaded to S3 because 
>>> collectstatic checks for the modified_time to determine if the local files 
>>> are newer than the ones in S3. That's fine so far.
>>>
>>> The problem is when I deploy to Heroku. Collectstatic is being executed 
>>> from the Heroku server and absolutely all the files are always being 
>>> uploaded to S3, even the ones that have not changed. This is because during 
>>> the deployment Heroku creates a full copy of the source code, and therefore 
>>> all the files have a new modified_time. In my case, it takes almost 10 
>>> minutes to upload ~1000 files for each deployment.
>>>
>>> Also, imagine the situation where the modified_times are not being 
>>> changed and I wanted to upload older versions of the static files. I wont 
>>> be able because storage wouldn't allow to upload files with an older 
>>> modified_time.
>>>
>>> I think that a more accurate way to check if a file needs to be replaced 
>>> could be by comparing their checksum/hash and offer this feature for all 
>>> the Storage subclasses. To preserve backwards compatibility, in 
>>> collectstatic command first determine if the storage subclass implements a 
>>> checksum generation and otherwise fallback to modified_time comparison.
>>>
>>>
>>> What do you think, is this something that makes sense?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/aa13552c-de08-4573-b00b-6be898d1a7b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to