On Mon, Mar 7, 2022 at 1:16 PM Aleix Pol <aleix...@kde.org> wrote: > > On Sat, Mar 5, 2022 at 8:36 AM Ben Cooksley <bcooks...@kde.org> wrote: > >> On Fri, Mar 4, 2022 at 12:49 AM Aleix Pol <aleix...@kde.org> wrote: >> >>> I'd say wireshark is too low level for what the problem is here. We are >>> talking about having too many HTTP requests for specific URLs. >>> >> >> Correct, I guess the difference in our approaches comes from a "before >> release" to a "monitor after release" angle to things. >> I'd like to see increased scrutiny during the development process as well >> to make sure that we release code that operates properly from Day 1. >> > > A way to do this could be using commit hooks that do not allow to reach > certain services. (which we discussed in private chat). > We could also analyse at cmake time the knsrc files we install, but this > has a very limited and specific scope. >
I've now applied two checks as part of the hooks which will hopefully catch anything new being introduced. We still need to ensure that anything pre-existing is sorted out of course. > > >> I can think two main measures: >>> - Trigger an alarm (an e-mail notification?) if there's a specific >>> UserAgent that has a specific portion of the queries we have in a specific >>> day in the services we care about. >>> - Offer plots to see how queries by UserAgent evolve over the last >>> couple of months (or couple of years). >>> >> >> At the moment our ability to analyse our logs is somewhat limited by our >> Privacy Policy - https://kde.org/privacypolicy/ >> Currently we don't have any provision for long term storage of >> this information even on an aggregated basis - so we would need to update >> this first. >> > > Hopefully the NDA should help here and it doesn't seem all that far away. > I know Neofytos and Ade have been working on it lately. > The privacy policy will still need to be updated, but that can form part of the puzzle yes. > > The second issue there is that we are transitioning users to contact a CDN >> based endpoint (which is substantially more scalable). >> This does mean we lose visibility on data such as User Agents and the >> URLs being impacted though as we only get aggregated data unless we ask for >> raw logs - which makes implementing something like what you've described >> much harder. >> > > That does seem like a stopper. Still, it seems like it's not that big of a > problem when there is a CDN, so we better worry about the other cases. > We should still be reasonable to the CDN of course, but it makes it much more managable yes. > > Aleix > Cheers, Ben