On 12/03/2026 13:10, Stuart Henderson wrote:
> On 2026-03-11, Peter G. <[email protected]> wrote:
>>
>> that's not the issue. the issue is not to process the same requests
>> twice. process once, keep in cache, sever instantly without processing.
>>
>> this is how a very IO intensive systems usually work. you preload all
>> caches and serve only from caches. sometimes preloading runs for a
>> longer while, simply loading caches. then all that content is server
>> effortlessly without much load.
> 
> that's fine if the requests are all the same and the dataset doesn't change
> often. that is not the case here, there are ~700k files in /cvs, each with
> many commits that could be diffed, and abusive requests cover a lot of them.

no, here just as much. the number of files doesn't matter. consider a
simple 2 level caching setup using cgit and nginx, as a practical example:

cgit, a web interface to git, for every file shown, diff generated and
so on, generates static HTML and then when a next similar request
arrives, it simply servers that instead of any processing. it's easier
to serve 700k static HTML files than running diff 700k times.

taking your figure for granted, how much space with this need, assuming
cgit caches will be kept for a very long time? 700k x (let's say) 50KB
of static content per file = 35 GB. that's 35 GB of static content ready
to be server vs any processing.

cgit can be used a rolling cache, i.e. as requests arrive it simply
fills up its caches, or can be preloaded by an external locally run
script, to cover most of all of its files and their permutations.

so our level 1 cache: 35 GB of immediately available static content

level 2 can be a simple generic installation of nginx using proxy module
and proxy cache directives.
https://nginx.org/en/docs/http/ngx_http_proxy_module.html

it's a dumb cache, caching everything upstream responds with, but it's
automatically a memory and disk cache. say we use 16gb of memory to help
our cgit here.

this gives us 16gb of memory cache handling most recent requests and
35gb of static content handling all requests.

vs

the old cvsweb

>> Ken can maybe help in Go with the move.
> 
> go seems not bad for web applications, but it's not going to fly for a
> vcs to run on all the archs that OpenBSD targets.

not much experience with advanced Go either.

a real-life case study:

there is a major site allegro.pl, it's basically a polish version of
ebay, which grew highly popular over the years, having several *million*
auctions/offers active at any given time. they made some of their
software public, which includes Big Cache

Big Cache is a Go cache designed to server tens of gigabytes of cache
content at very high performance, https://github.com/allegro/bigcache

i've used it before. works as advertised.

a well designed Go application, as gui to git or cvs here, using Big
Cache package which is trivial to implement, could simply cache
everything, and i mean, *everything* it does

options are plenty

Reply via email to