On Sat, 11 Jan 2025 at 20:03, Niall Pemberton <niall.pember...@gmail.com> wrote: > > Hi Attic Team, > > A number of retired projects are flagged on a monthly report (to the > Privacy Committee) as contravening the ASF privacy policy due to their use > of Google Analytics (mainly)[1].
It's not just the analytics; fonts, images and scripts (etc.) must not be loaded from non-permitted sites. > I'm starting with the assumption that it would be difficult/painful to fix > this now that these projects are in the Attic, but I thought I would ask > here if there was any way to do this? Yes, it was difficult to prepare sites for the Attic. In general, it's not possible (or desirable) to regenerate the website from source. So originally it meant the Attic needed to change every single HTML source file. This was a lot of work, even if some could be automated. The Attic banner is now added with a server filter (in Lua) that adds the banner text to the HTML files. I think the first stage is to establish what effect the CSP will have on the behaviour of each site. In the case of tracking, that is not needed for Attic sites, so unless the site no longer works properly, any failed fetches can just be ignored. For other assets such as fonts, images and scripts, failure to load is likely to have an adverse effect, so needs to be solved. In theory the server filter could be extended to change URLs to a local copy, though it might be tricky to only change the relevant URLs. (Only automatically loaded resources need be changed) I suspect it might be necessary to edit the HTML files to do this properly. Some scripts must be accessed from the 3rd party and cannot be replaced with local copies. That might mean rewriting entire pages if the script is essential to the site working. In the case of analytics, such references can just be removed/disabled. For some references it should be possible to set up a central proxy server to fetch the resource, and change the HTML to use the proxy. If the proxy is set up properly, the user PII is not passed on to the 3rd party. Or it might be possible to use server rewrites if those can be guaranteed not to pass on any PII from the original request. For testing, it should be possible to set up a Docker container with a webserver having the appropriate CSP settings. It can also have the Lua filter for experiments with that. The website source can then be checked out locally, and mapped into the container on startup I worked on the Lua script, and did some CSP testing, and may still have suitable Docker scripts. > Below is a list of websites that are being flagged. I am willing to do some > work to fix this issue, so if you have any ideas on how this could be > resolved, then I would appreciate it, > Thanks > > Niall > > > apex.apache.org > archiva.apache.org > bahir.apache.org > directmemory.apache.org > eagle.apache.org > hama.apache.org > hawq.apache.org > ibatis.apache.org > marmotta.apache.org > metron.apache.org > mxnet.apache.org > ode.apache.org > polygene.apache.org > reef.apache.org > stdcxx.apache.org > stratos.apache.org > streams.apache.org > tajo.apache.org > trafodion.apache.org > tuscany.apache.org > twill.apache.org > usergrid.apache.org > wink.apache.org > > [1] > https://privacy.apache.org/faq/committers.html#can-i-use-google-analytics