Re: [Tutor] How to identify clusters of similar files

2012-06-03 Thread Albert-Jan Roskam
From: Steven D'Aprano >To: Python Mailing List >Sent: Sunday, June 3, 2012 4:00 AM >Subject: Re: [Tutor] How to identify clusters of similar files > >Albert-Jan Roskam wrote: >> Hi, >> >> I want to use difflib to compare a lot (tens of thousands) of

Re: [Tutor] How to identify clusters of similar files

2012-06-02 Thread Steven D'Aprano
Albert-Jan Roskam wrote: Hi, I want to use difflib to compare a lot (tens of thousands) of text files. I know that many files are quite similar as they are subsequent versions of the same document (a primitive kind of version control). What would be a good approach to cluster the files based on

[Tutor] How to identify clusters of similar files

2012-06-02 Thread Albert-Jan Roskam
Hi, I want to use difflib to compare a lot (tens of thousands) of text files. I know that many files are quite similar as they are subsequent versions of the same document (a primitive kind of version control). What would be a good approach to cluster the files based on their likeness? I want t