On Wed, 13 Feb 2019 22:07:27 +0100 Joerg Jaspert <jo...@ganneff.de> wrote:
> On 15312 March 1977, Julian Andres Klode wrote:
> > It might make sense to consider switching to merged pdiffs, which generate
> > one Pdiff from each generation to the latest one. This can be done either
> > by preserving old index files and creating pdiffs from them, or simply by
> > concatenating the new pdiff to the old ones.
> 
> The code is in dak, generate_index_diffs.py - as soon as one gets me a
> MR on salsa for it, we can have this.
> 
> Make it "just work", that is, when its merged, next run should just do
> the right thing, and we are all happy. :)
> 
> -- 
> bye, Joerg
> 
> 

Hi,

I am considering to look at this feature - I am looking for a review
before I invest a lot of time on an implementation in case the design is
going to be rejected.


# Proposal
I have spoken with Julian about the APT side and we would end up doing
completely merged pdiffs for this to work (i.e. every patch must move
you from the current state to the newest state).  This means that every
dinstall will lead to a new generation of all existing patches.

To avoid a combinational blow up, Julian and I propose that we limit the
number patch generations to a low constant.  This would limit the number
of patches to a factor 3x of the current number. We can further reduce
the number by reducing the number of pdiffs.

## The rational behind multiple generations of pdiffs:

This is ensure that any "apt-get update" that fetches an Index file
during a mirror sync will still be able to see the patches files listed
in the Index file.

My understanding is that 3 generations will be sufficient to avoid
issues by giving "apt(-get) update" at least 6 hour window to complete
before there is an issue.
  As that pdiffs are only used during an "apt(-get) update", there is no
reason to be concerned about stale metadata in the Index after apt(-get)
has fetched all the files.

The number of generations will obviously be configurable, so we can
trivially change it if 3 is too much or too little.  My interest is that
we agree on the generation approach (also - my guesstimate of 3 is from
"rather safe than sorry" instead of a "carefully calculated math").

Addendum: Ideally, the Index file would be removed from the by-hash at
the same time as the patches file listed in it.  AFAICT, this is not
trivially possible in generate_releases.py and I have assumed it to be a
non-issue given the above safe-guard.  Let me know if you disagree with
this assumption.

# Alternatives

Theoretically, it is possible to do trade-off of the pdiff where only
some of them are merged.  However, apt(-get) nor the metadata are
currently geared/designed to do this efficiently.
  Furthermore, it would not have the full performance benefit for the
client as they would in many cases still end up having to download at
least 2-3 patches (and worst case 5-7) if we want to avoid a
considerable increase in pdiff files.

For these reasons, this approach has not been considered in depth.

# Optional improvements

When merging pdiffs, it is possible to do something smarter than simply
concatenating two pdiffs together.  If it is interesting and I can
understand the runes that make up `diffindex-rred` in apt-file/2.5.4
then I will try to implement some of this in dak.  This will reduce the
file-size of the merged patches and possibly fix #947839 as a side-effect.



@FTP masters: Do you agree with the "fully-merged" approach with N
generations (with N=3 by default) as a solution to this request?


Thanks,
~Niels

Reply via email to