I should probably be more specific: for a long time, I haven't wanted
caching in pkg_add, because sometimes mirrors get out of synch, and I was
worried a bad cache would be worse than no cache.

Also: when to generate it.
Also: how to store it. At some point we had sqlite in base, but it didn't
last because there was no client. AND pkg_add was an issue, because it's perl,
so to use sqlite in perl you also need DBI which is *huge*.

At some point during the last 10 months, I gave a script to generate
a cache package to sthen and naddy, so you guys could see index-0.tgz for
a while.... but I didn't really have time to hack on it, so naddy@ and
sthen@ gave it up.

Having it as a separate package meant a lot more glue. Then I figured: let's
make it a flavor to quirks. But coping with two flavors was complicated.
So it became an extra file in quirks... which means that quirks had to build
LAST (because contrary to sqlports/pkglocatedb, we want quirks to 
reflect *built packages* not possible ports)

This was surprisingly easy to do.

I did a few tryouts which were hilariously bogus, but the speed-up was
tremendous!* not opening a new connection for each new update-info *is* lots
faster (the alternative would be to use http 1.1 or later to keep the 
connection open... this may happen at some point but it's way more complicated)

I was expecting the call-out to locate(1) to be really slow... that's not
the case. It can make the fan hot (dixit landry@) but it's not that base.

Next part was ironing out some details: the cache is directly related to
ONE single repository location (which is okay, because most scenarios go to
one single repository), and it was possible to run locate beforehand on the
"assumed" list of packages to update.


So this is about the state of the current code. It still needs better checks
to deal with catastrophic failures (if you get a cache that does NOT match
at all the repository, some earlier bogus code resulted in pkg_add -u not
updating anything, which is not okay).

(note that in reality, most scenarios where the cache is wrong/incomplete 
will just end up in it not being used, so it is not that bad).


As far as actual testing goes: unless I instrument it a lot, apart from
reading the code and checking it does what it should, that it is actually
faster, and that the update is still correct, there's not much one can do.

(*) tremendous, as in, really WOW. I wasn't expecting such a difference and
I wrote the patch. Typical update scenarios where about everything is up to
date go down from 10s of minutes to 10s of seconds, it's THAT drastic.

The current code isn't active yet... it's easy to find the line that prevents
it from running.

I'm still fussing over error-handling, but it's getting there.
-- 
        Marc

Reply via email to