On the principle that it's better to do something than just complain...

I monitored the time I spent looking for the emails associated with a
given patch and I found it takes high single digit minutes to find them.
Sometimes you can't find them (which takes a lot longer). I do this a
lot. 

I wrote a little proof-of-concept script to take the mailing list
archives and the ChangeLog files and annotate the ChangeLog files with
the URLs of the probable email containing the patch.

Sample output is here (annotation of the current ChangeLog file). 

http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/gcc_ChangeLog.txt?revision=1.1&view=markup
Or http://tinyurl.com/2v824o
Or http://preview.tinyurl.com/2v824o

The program is here (not much internal documentation at all). Testing
has been limited - in any case, with processing of text written by
people, perfection is not possible.

http://cobolforgcc.cvs.sourceforge.net/cobolforgcc/gcc/gcc/gcb/gcc_mailscan.rb?revision=1.1&view=markup
Or http://tinyurl.com/2yem2u 
Or http://preview.tinyurl.com/2yem2u

It runs in about 25 minutes on my system and uses a few hundred MB of
storage.

Things I learned:

1. There is a lot of data. It's a good thing Ruby 1.9 is a lot faster
than Ruby 1.8.

There are over 100 ChangeLog files in the GCC source, with over 600,000
lines in total. The gcc patches mailing list archives are over 2 GB in
size, and take a considerable time to download.

2. Most patches to ChangeLog have an identifiable email in the archive.
Things get spotty with branches in some cases, also as you go back in
time, and also there is a large gap in the email archives from a while
back.

3. I think this may be a useful thing. If a place could be found to put
the 30MB of files I would be happy to maintain them on a weekly basis or
so. Alternatively I could update the ChangeLog files themselves but I
have reason to suspect that may not be popular.

If nothing else happens I will keep it up-to-date for my own use.

Tim Josling

On Tue, 2007-12-04 at 08:05 -0500, Richard Kenner wrote:
> > I didn't say you cannot or should not use these tools.  But a good comment 
> > on a piece of code sure beats a good commit message, which must be looked 
> > at 
> > separately, and can be fragmented over multiple commits, etc.
> 
> I don't see one as "beating" the other because they have very different
> purposes.  Sometimes you need one and sometimes you need the other.
> 
> The purpose of COMMENTS is to help somebody understand the code as it
> stands at some point in time.  In most cases, that means saying WHAT the
> code does and WHY (at some level) it does what it does.  Once in a while,
> it also means saying why it DOESN'T do something, for example, if it might
> appear that there's a simpler way of doing what the code is doing now but
> it doesn't work for some subtle reason.  But it's NOT appropriate to put
> into comments the historical remark that this code used to have a typo
> which caused a miscompilation at some specific place.  However, the commit
> log IS the place for that sort of note.
> 
> My view is that, in general, the comments are usually the most appropriate
> place to put information about how the code currently works and the commit
> log is generally the best place for information that contrasts how the code
> currently works with how it used to work and provides the motivation for
> making the change.  But there are exceptions to both of those generalizations.

Reply via email to