Hi, I'm quite tired of promises and all that perfectionist non-sense which locks us up with CVS for next 10 years of bikeshed. Therefore, I have prepared a plan how to do git migration, and I believe it's doable in less than 2 weeks (plus the testing). Of course, that assumes infra is going to cooperate quickly or someone else is willing to provide the infra for it.
I can provide some testing repos once someone is willing to provide the hardware. What needs to be done --------------------- I can do most of the scripting. What I need others to do is provide the hosting for git repos. We can't use public services like github since they don't allow us to set our own update hook, so we can't enforce signing policies etc. Once basic infra is ready, I think the following is the best way to switch: 1. send announcement to devs to explain how to use git, 2. lock CVS out to read-only, 3. create all the git repos, get hooks rolling, 4. enable R/W access to the repos. With some luck, no more than 2 hours downtime. The infra --------- The general idea is based on 3-level structure that's extension of how Funtoo works. The following ultimately pretty picture explains that: +----------------+ | developer repo | - - - - - - - - - - -, +----------------+ v | +------------------------------+ | | cache, DTDs and other extras | v +------------------------------+ +----------------+ | | user sync repo | <--------------------' +----------------+ - - - - - - - - - - , | v | +-----------------------------+ | | ChangeLogs, thick Manifests | v +-----------------------------+ +----------------+ | | rsync | <-------------------' +----------------+ Text version: We have main developer repo where developers work & commit and are relatively happy. For every push into developer repo, automated magic thingie merges stuff into user sync repo and updates the metadata cache there. User sync repo is for power users than want to fetch via git. It's quite fast and efficient for frequent updates, and also saves space by being free of ChangeLogs. On top of user sync repo rsync is propagated. The rsync tree is populated with all old ChangeLogs copied from CVS (stored in 30M git repo), new ChangeLogs are generated from git logs and Manifests are expanded. Main developer repo ------------------- I was able to create a start git repository that takes around 66M as a git pack (this is how much you will have to fetch to start working with it). The repository is stripped clean of history and ChangeLogs, and has thin Manifests only. This means we don't have to wait till someone figures out the perfect way of converting the old CVS repository. You don't need that history most of the time, and you can play with CVS to get it if you really do. In any case, we would likely strip the history anyway to get a small repo to work with. I have prepared a basic git update hook that keeps master clean and attached it to the bug [1]. It enforces basic policies, prevents forced updates and checks GPG signatures on left-most history line. It can also be extended to do more extensive tree checks. For GPG signing, I relied upon gpg to do the right thing. That is, git checks the signatures and we accept only trusted signatures. So an external tool (gentoo-keys) need to play with gpg to import, trust and revoke developer keys. I think we should also merge gentoo-news & glsa & herds.xml into the repository. They all reference Gentoo packages at a particular state in time, and it would be much nicer to have them synced properly. [1]:https://bugs.gentoo.org/show_bug.cgi?id=502060 User syncing repo ----------------- IMO this will be the most useful syncing method. The user syncing repo is updated automatically for developer repo commits, and afterwards md5-cache is regenerated and committed. Also other repositories (like DTDs, glsas and others if you dislike the previous idea) are merged into it. This repo is still free of ChangeLogs (since git logs are more efficient) and has thin Manifests. It's the space-efficient Gentoo variant. And commits are signed so users can verify the trust. The rsync tree -------------- We'd also propagate things to rsync. We'd have to populate it with old ChangeLogs, new ChangeLog entries (autogenerated from git) and thick Manifests. So users won't notice much of a change. The remaining issue is signing of stuff. We could supposedly sign Manifests but IMO it's a waste of resources considered how poor the signing system is for non-git repos. -- Best regards, Michał Górny
signature.asc
Description: PGP signature