Re: [Groff] groff repo conversion in progress

Eric S. Raymond Sun, 24 Nov 2013 05:45:00 -0800

Werner LEMBERG <w...@gnu.org>:
> > For your interest, and so you and the other listmembers can see how
> > this was done, I'm enclosing a tarball containing copies of the
> > current authormap file (which is what you modified, with five
> > entries added and some address removals reverted), the reposurgeon
> > lift script, and the Makefile. [...]
> 
> This is indeed amazingly simple.  Thanks for your work!


Most of the hard work got done back in 2010 when I saw the possibility
implied by the git import stream format, and wrote reposurgeon to
exploit it. The result is, as you noticed, an almost ridiculously
powerful tool that makes a lot of very hairy operations look simple.

The appearance is deceptive; there is a lot going on behind the
scenes.  I shall explain a bit more, because the conversion is not
finished yet and you will need to make some policy choices before
we're done.

The actual CVS-to-git conversion work was done by cvs-fast-export, 
which I also maintain.  The lift script expresses the edits to do on
the history once gitified.  Under slightly different conditions it 
might have looked like this:

verbose 1
set canonicalize
read .
delete :18138 obliterate

With this invocation, reposurgeon would have looked at the current
directory, seen that it was a CVS repo, looked in its table of import
front ends, and called cvs-fast-export itself, parsing the git
fast-import stream that it emits.  The generated command would have
looked like this:

find . -name '*,v' -print | cvs-fast-export -k --reposurgeon

But there was a default option I wanted to suppress (--reposurgeon),
so I ran the front end "by hand" (actually, through a Makefile
production) instead.  That option generates voluminous data on CVS
revision numbers that we don't need in this case because your change
comments have no CVS commit references in them to be translated.

The script is a little longer now:

verbose 1
set canonicalize
read groff-raw.fi
delete :18138 obliterate
# Salvage some multiline comments into git-like form by removing whitespace.
# On most this couldn't be done because they mixed topics, 
# so a summary line would have been misleading.
mailbox_in <<EOF
------------------------------------------------------------------------------
Event-Number: 1833

* html.cc (create_tmp_file, create_temp_name): Removed.

It has been replaced with calls to xtmpfile() and xtmptemplate().

------------------------------------------------------------------------------
Event-Number: 1916

[[Several hundred lines of text omitted]]

------------------------------------------------------------------------------
Event-Number: 18144

Fixes to TOC, BIBLIOGRAPHY, and ENDNOTES leading management and traps.
EOF
prefer git
write groff.fi

That mailbox_in section batch-modifies commit comments. To make it, 
I did this:

$ reposurgeon "read groff-raw.fi" "mailbox_out =L" >MULTILINE

which dumped all the non-git-conformant multiline comments into
MULTILINE. I then edited that and, when I was done, pasted it
into the lift script.

Here is your first policy choice.  I am thinking of writing a
filter operation that would take all the comments that look like this:

[start of comment]
[blank line]
* One sentence of random stuff
[blank line]
[end of comment]

and delete the leading "* ", leaving 

[start of comment]
[blank line]
One sentence of random stuff
[blank line]
[end of comment]

The operation would look, in reposurgeon, like this:

transform 1..$ /^\n\* (.*)\n\n$/\n\1\n\n/

The reason for this is that I think it would be good if leading '*' in
a gitk list of first lines were a visual warning that the following
comment is "old style" - fails to obey git conventions

The policy question is: are you OK with me editing the history that
much? Some people would not be.  It's another level of intrusiveness
up from just tweaking whitespace.

Also, you should know what I plan to do with the tarballs.  I have a tool
called git-weave, not yet published, which takes a sequence of tarballs and
weaves it into a revision history - one commit per tarball.  I can write
a little metadata fle to specify commit comments and tags.

I'll apply git-weave, then use reposurgeon to graft the tip of the woven
prehistory repo to the root.  Then I'll check in a commit describing the
conversion and including the recipe I used to do it.   

And that will be it.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

Re: [Groff] groff repo conversion in progress

Reply via email to