фейсик после ipo сильно потерял в цене акций - им нужна комерциализация конче ;)

хорошее слово "конче"

нам тоже нужно ентое "конче"

On Tue 09 Oct 2012 07:28:01 EEST, django-developers@googlegroups.com wrote:
Today's Topic Summary

Group: http://groups.google.com/group/django-developers/topics

  * Feature request: collectstatic shouldn't recopy files that already
    exist in destination <#group_thread_0> [9 Updates]

Feature request: collectstatic shouldn't recopy files that already
exist in destination
<http://groups.google.com/group/django-developers/t/bed315abc8f09dff>

    Dan Loewenherz <d...@dlo.me> Oct 07 08:58PM -0700

    This issue just got me again tonight, so I'll try to push once
    more on this
    issue. It seems right now most people don't care that this is
    broken, which
    is a bummer, but in which case I'll just continue using my working
    solution.

    Dan

    ptone <pres...@ptone.com> Oct 07 10:38PM -0700

    so after scanning this thread and the ticket again - it is still
    unclear
    that there could be a completely universal solution.

    While it would be nice if the storage API had a checksum(name) or
    md5(name)
    method - not all custom storage backends are going to support a
    single
    checksum standard. S3 doesn't explicitly support MD5 (apparently it
    unofficially does through ETags). Without a universal checksum -
    you can't
    use it to compare files across arbitrary backends.

    I do agree that hacking modified_time return value is a little
    ugly - the
    API is clearly documented as "returns a datetime..." - so
    returning a M55
    checksum there is, well, hacky.

    If you are passionate about moving this forward, here is what I'd
    suggest.

    Implement, document, and test .md5(name) as a standard method on
    storage
    backends - like modified_time this would raise NotImplementedError
    if not
    available - this could easily be its own ticket. md5 is probably the
    closest you'll get to a checksum standard.

    Once you have an md5 method defined for backends - you could
    support a
    --md5 option to collectstatic that would use that as the
    target/source
    comparison.

    Another workaround is to just use collectstatic locally - and rsync
    --checksum to your remote if it supports rsync.

    -Preston


    On Sunday, October 7, 2012 8:59:16 PM UTC-7, Dan Loewenherz wrote:

    Jannis Leidel <lei...@gmail.com> Oct 08 12:33PM +0200


    > It's accurate *only* in certain situations. And on a distributed
    development team, I've run into a lot of issues with developers
    re-upload files that have already been uploaded because they just
    recently updated their repo.

    > A checksum is the only true accurate method to determine if a
    file has changed.

    > Additionally, you didn't address my point that I quoted from.
    Storage backends don't just reflect filesystems--they could
    reflect files stored in a database, S3, etc. And some of these
    filesystems don't support last modified times.

    Then, frankly, this is a problem of the storage backends, not
    Django's. The S3BotoStorage backend *does* have a modified_time
    method:

    
https://bitbucket.org/david/django-storages/src/1574890d87be/storages/backends/s3boto.py#cl-298

    What storage backend do you use that doesn't have a modified_time
    method?

    > This is a bit confusing...why call it last_modified when that's
    doesn't necessarily reflect what it's doing? It would be more
    flexible to create two methods:

    It's called modified_time, not last_modified.

    > def modification_identifier(self):

    > def has_changed(self):

    > Then, any backend could implement these however they might like,
    and collectstatic would have no excuse in uploading the same file
    more than once. Overloading last_modified to also do things like
    calculate md5's seems a bit hacky to me, and confusing for any
    developer maintaining a custom storage backend that doesn't
    support last modified.

    I disagree, modified_time is perfectly capable of handling your
    use case.

    Jannis

    Jannis Leidel <lei...@gmail.com> Oct 08 12:50PM +0200


    > so after scanning this thread and the ticket again - it is still
    unclear that there could be a completely universal solution.

    > While it would be nice if the storage API had a checksum(name)
    or md5(name) method - not all custom storage backends are going to
    support a single checksum standard. S3 doesn't explicitly support
    MD5 (apparently it unofficially does through ETags). Without a
    universal checksum - you can't use it to compare files across
    arbitrary backends.

    You're able to ask S3 for the date of last modification, I don't
    see why a comparison by hashing the file content is needed
    additionally. It'd have to download the full file to do that on
    Django's side and I'm not aware of a API for getting a hash from
    cloudfiles, S3 etc.

    > I do agree that hacking modified_time return value is a little
    ugly - the API is clearly documented as "returns a datetime..." -
    so returning a M55 checksum there is, well, hacky.

    I beg to differ, returning a datetime object makes absolute sense
    for comparing it to another datetime object. What I meant before
    is that the modified_time method can be written however the user
    wants as long as it returns a datetime object, even a date that is
    known to be older than the file on disk.

    > If you are passionate about moving this forward, here is what
    I'd suggest.

    > Implement, document, and test .md5(name) as a standard method on
    storage backends - like modified_time this would raise
    NotImplementedError if not available - this could easily be its
    own ticket. md5 is probably the closest you'll get to a checksum
    standard.

    -1

    Jannis


    Dan Loewenherz <d...@dlo.me> Oct 08 08:48AM -0700

    > The S3BotoStorage backend *does* have a modified_time method:

    >
    
https://bitbucket.org/david/django-storages/src/1574890d87be/storages/backends/s3boto.py#cl-298

    > What storage backend do you use that doesn't have a
    modified_time method?

    I don't think you're seeing the problem I'm having. I'm working with a
    distributed team using git. This means when we check out files,
    the local
    modified time is the time at which I checked the files out, not
    the time
    which the files were actually last modified.

    As a result, it's a questionable metric for figuring out if a file
    is the
    same or not, since every team member's local machine thinks they
    were all
    just created! We end up re-uploading the file every time.

    > necessarily reflect what it's doing? It would be more flexible
    to create
    > two methods:

    > It's called modified_time, not last_modified.

    Sorry, typo.


    > seems a bit hacky to me, and confusing for any developer
    maintaining a
    > custom storage backend that doesn't support last modified.

    > I disagree, modified_time is perfectly capable of handling your
    use case.

    No it does not address my needs, as I described above.

    Dan

    Dan Loewenherz <d...@dlo.me> Oct 08 08:56AM -0700

    > comparison by hashing the file content is needed additionally.
    It'd have to
    > download the full file to do that on Django's side and I'm not
    aware of a
    > API for getting a hash from cloudfiles, S3 etc.

    S3 stores the md5 info in an Etag header.

    Regarding Cloudfiles, this is what Rackspace has to say:

    You can ensure end-to-end data integrity by including an MD5
    checksum of
    > your object's data in the ETag header. You are not required to
    include
    > the ETag header, but it is recommended to ensure that the
    storage system
    > successfully stored your object's content.


    Dan

    ptone <pres...@ptone.com> Oct 08 10:06AM -0700

    On Monday, October 8, 2012 8:49:58 AM UTC-7, Dan Loewenherz wrote:

    > As a result, it's a questionable metric for figuring out if a
    file is the
    > same or not, since every team member's local machine thinks they
    were all
    > just created! We end up re-uploading the file every time.

    While git may be common, and your problem not unique - this is
    still a
    condition of your dev environment rendering modification dates
    invalid.
    There might be other situations where this is the case (I've run into
    scripts that muck with modification dates based on camera/jpeg
    metadata).

    So after some further discussion on IRC - it was determined that
    md5, while
    somewhat common, was far from a standard, and was likely not to be
    available as remote call for network based storage backends. And
    so the
    final resolution is to wontfix the ticket.

    In the end - this lack of a universal fingerprint is just a
    limitation of
    our storage tools.

    -Preston


    Alex Ogier <alex.og...@gmail.com> Oct 08 01:23PM -0400


    > In the end - this lack of a universal fingerprint is just a
    limitation of
    > our storage tools.

    > -Preston

    Is there a reason this fingerprint must be universal? If you're
    dealing
    with a backend like S3, where network latency and expensive writes
    are a
    problem, but md5 is a builtin remote call (available on any GET),
    why not
    just do an md5 sum in the _save() method? Basically, just buffer
    the File
    object you receive, take an md5 in python, and then make a
    decision whether
    to upload or not. In the common case of reading from local disk
    and writing
    to S3, this is a big win, and doesn't require cooperation from any
    other
    backends, or standardizing on md5 as a fingerprint method.

    Best,
    Alex Ogier

    Jeremy Dunck <jdu...@gmail.com> Oct 08 08:14PM -0700

    Would it be reasonable to have a backend-specific hook to
    determine a fingerprint, where that could be mtime or md5 or
    whathaveyou as long as equality (or maybe ordering) works?



You received this message because you are subscribed to the Google
Group django-developers.
You can post via email <mailto:django-developers@googlegroups.com>.
To unsubscribe from this group, send
<mailto:django-developers+unsubscr...@googlegroups.com> an empty message.
For more options, visit
<http://groups.google.com/group/django-developers/topics> this group.

--
You received this message because you are subscribed to the Google
Groups "Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/django-developers?hl=en.

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to