On 10/11/23 19:24, Antoine Beaupré wrote:
Yeah, really, the script you wrote should Just Work. I find the
`cache.upgrade()` call to be a little strange, personnally: I would try
ripping that out completely to see if it fixes the issue, but maybe you
have a better idea of why it's there in the first place?
It's admittedly been a little while so there's some rust in my noggin,
but the backbone of this script is `cache.get_changes()`. I believe it's
`cache.upgrade()` that actually marks the changes. Note of course that
none of the changes are applied without a `cache.commit()`, so no
upgrade operation is actually performed; this is just a way of saying
"if I were to upgrade, what would changes would be made?" Ultimately I
believe the removal of `cache.upgrade()` would result in the collector
never seeing available upgrades.
But because the `apt_info.py` can silently fail to update the cache, we
may want to add an extra metric to track the update timestamp on the
mirror info, I filed this bug about that:
https://github.com/prometheus-community/node-exporter-textfile-collector-scripts/issues/180
Agreed, that seems sound.
Anyway, I'll do some experimentation and see if I can develop some
properly-formed thoughts.
Thank you so much for your response!
I think adding instrumentation around how long the script takes to run
itself would be valuable, that could be a simple time counter added to
the script's output... This would allow tracking this problem in fleets
where there *isn't* such lock contention.
After all, the only reason we found out about this is because we got
repeated emails from cron about apticron or other software failing to
run apt-update. If that's removed from the equation, the script here
just fails silently, and I think that's also possibly a Bad Thing.
Yeah I won't lie, my immediate thought was "huh, I've never seen that
happen." Then my follow-up was "actually, how would I even know?" But at
the same time, I could make that argument for a lot of collectors! Is
there an established pattern for gathering this kind of data?
Kyle