On Thu, Apr 11, 2002 at 10:40:31PM -0700, Robert Tiberius Johnson wrote: > On Wed, 2002-04-10 at 02:28, Anthony Towns wrote: > > I'd suggest your formula would be better off being: > > bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x ) > I think it depends on what you're measuring. I can think of two ways to > measure the "goodness" of these schemes (there are certainly others): > > 1. What is the average bandwidth required at the server? > 2. What is the average bandwidth required at the client?
I don't think the bandwidth at the server is a major issue to anyone, although obviously improvements there are a Good Thing. Personally, I think "amount of time spent waiting for apt-get update to finish" is the important measure (well, "apt-get update; apt-get dist-upgrade" is important too, but I don't thing we've seen any feasible ideas at improving the latter). > prob2(i)=(prob1(i)/i)*norm, > > where norm is a normalization factor so the probabilities sum to 1. > I've been looking at question 2, and you're suggesting that I look at > question 1, except you forgot the normalization factor. I think this is > what you mean. Please correct me if I've misunderstood. No, I'm not. I'm saying that "the amount of time spent waiting for apt-get update" needs to count every apt-get update you run, not just the first. So, if over a period of a week, I run it seven times, and you run it once, I wait seven times as long as you do, so it's seven times more important to speed things up for me, than for you. > Anyway, here are the results you asked for. I'm NOT including the > normalization factor for easier comparison with your numbers. My diff > numbers are a little different from yours mainly because I charge 1K of > overhead for each file request. Merging, and reordering by decreasing estimated bandwidth. The ones marked with *'s aren't worth considering because there's a method that's both has less bandwidth required, and takes up less diskspace. The ones without stars are thus ordered by increasing diskspace, and decreasing bandwidth. > days/ > bsize dspace ebwidth > ------------------------------- Having the "ebwidth" of the current situation (everyone downloads the entire Packages file) for comparison would be helpful. > 1 12.000K 342.00K [diff] > 20 312.50K * 173.70K [cksum/rsync] > 2 24.000K * 171.20K [diff] > 3 36.000K * 95.900K [diff] > 40 156.30K * 89.300K [cksum/rsync] > 60 104.20K * 62.200K [cksum/rsync] > 4 48.000K * 58.500K [diff] > 80 78.100K * 49.300K [cksum/rsync] > 100 62.500K * 42.200K [cksum/rsync] > 5 60.000K * 38.800K [diff] > 120 52.100K * 37.900K [cksum/rsync] > 400 15.600K 37.700K [cksum/rsync] > 380 16.400K 36.800K [cksum/rsync] > 360 17.400K 35.900K [cksum/rsync] > 140 44.600K * 35.300K [cksum/rsync] > 340 18.400K 35.100K [cksum/rsync] > 320 19.500K 34.300K [cksum/rsync] > 300 20.800K * 33.600K [cksum/rsync] > 160 39.100K * 33.600K [cksum/rsync] > 280 22.300K 33.000K [cksum/rsync] > 180 34.700K * 32.700K [cksum/rsync] > 260 24.000K 32.500K [cksum/rsync] > 240 26.000K 32.200K [cksum/rsync] > 200 31.300K * 32.200K [cksum/rsync] > 220 28.400K 32.100K [cksum/rsync] > 6 72.000K 27.900K [diff] > 7 84.000K 21.800K [diff] > 8 96.000K 18.200K [diff] > 9 108.00K 16.100K [diff] > 10 120.00K 14.900K [diff] > 11 132.00K 14.100K [diff] > 12 144.00K 13.700K [diff] > 13 156.00K 13.400K [diff] > 14 168.00K 13.300K [diff] > 15 180.00K 13.100K [diff] 180k is roughly 10% of the size of the corresponding Packages.gz, so is relatively trivial. Since we'll probably do it at the same time as dropping the uncompressed Packages file (sid/main/i386 alone is 6MB), this is pretty neglible. Cheers, aj -- Anthony Towns <[EMAIL PROTECTED]> <http://azure.humbug.org.au/~aj/> I don't speak for anyone save myself. GPG signed mail preferred. ``BAM! Science triumphs again!'' -- http://www.angryflower.com/vegeta.gif
pgp6DeEYsec6i.pgp
Description: PGP signature