On 12/17/17 14:21, Michał Górny wrote: > ...
> Rationale > ========= > > At this moment, syncing the repository implies fetching 'files' > directories of all packages, even though the relevant files are used > only when a ebuild referencing them is being built. This means that our > users fetch many files that they will never use -- either because they > don't need the package in question, or because the file belongs > to an old version. > > For example, 'du -h app-shells/bash/files' states 232K while only three > of those files are used by the newest version, and everything else are > patches for old versions. And in case of bash, we're keeping those > versions pretty much 'forever'. > > The new policy mostly targets large patchsets and files relevant to old > package versions. By removing them from the repository, we're hoping to > reduce the growth of its size a bit and reduce the amount of data > transferred via rsync. Evaluating transfer size, since on-disk size is different and the latter will vary The numbers are interesting: - Total size of the tree: 224509 KiB #1 - Total size of files in files/: 27809 KiB #2 - Cumulative files/ >= 32KiB : 3289 KiB #2 Some simple math later and we discover that removing _all_ files from the offending packages would give only a 1,5% reduction in transfer size. Removing _all_ files/ directory would spare 12,4% or 1/8 I don't have numbers for the past, but if I recall correctly currently the situation is greener than 10 years ago. This to point that _some_ policy is _beneficial_ to avoid an explosion of the repo size. However restricting it further IMO would give very little benefit and (looking at the packages involved) make life harder for no good reason. It would be interesting instead to evaluate ways to remove _all_ files/ dirs from the tree, keeping ebuilds separated from data. a different tree for files/ can seen a cleaner approach, give all ebuilds the same mechanism to personalize patches & co, remove limits in size (well not all limits) Obviously the cost of such an operation is order of magnitude higher than putting some policies in place. #1 obtained with: find * -type f -exec cat {} + | wc -c #2 list obtained with: cd $PORTDIR for files in $(find * -type d -name files) ; do echo -n $(find ${files} -type f -exec cat {} + | wc -c) echo ",${files%/files}" done Best Regards, Francesco