On Mon, Oct 1, 2012 at 1:54 PM, Rich Freeman <ri...@gentoo.org> wrote: > On Mon, Oct 1, 2012 at 1:42 PM, Michael Mol <mike...@gmail.com> wrote: >> I don't know to what depth this has been discussed in the past, but if >> you use git, you also get an HTTP transport, which has a useful >> feature: You could simplify updating the tree on end-users's machines >> by using caching proxy servers (operating in accelerator mode) on the >> various mirrors. > > The issue I see here is a tradeoff of bandwidth vs CPU. I just ran an > emerge --sync and the total amount of transmitted data was 5M. The > whole tree is 250M, though no doubt with compression that could be > reduced. > > Now, one advantage of HTTP is that caching http servers are likely > more ubiquitous in general than rsync servers. But, we have a whole > bunch of rsync servers already, and we don't have a bunch of caching > http servers. > > I suspect bandwidth is going to cost more than CPU here. > > In any case, not a reason to hold up git, just one more possibility if > we ever move.
It really depends on how efficient 'git pull' is over HTTP, IMO. I mean, when I do pulls and pushes in my workflows, it doesn't do a full data send or a full data pull; that's reserved for 'git clone'. It may also depend on how often the pull is done; rsync doesn't pull a tree history, just a copy at the time of sync. git pulls new objects, which may include intermediate versions which are no longer necessary. (git may be capable of only syncing to 'HEAD' without worrying about intermediate states, but I don't know. I'm not advanced enough to definitively say one way or another.) As for setting up caching proxies at existing mirror sites...It should be a case of publishing, a single squid (or pick whatever proxy you prefer to standardize on) configuration file with appropriate ACLs and copy it all over the place. I'm not on the infra team, so I don't know what things look like under the hood, but I do know setting up squid in forward and reverse roles isn't very painful. -- :wq