Control: -1 + patch On 21/07/14 14:58, Paul Wise wrote: > On Mon, 2014-07-21 at 15:39 +0200, Abou Al Montacir wrote: > >> Are we really consuming so much bandwidth for that feature? I assume >> this will happen each time a user or a daemon wants to check a >> particular package. I'm not convinced this is worth especially they ask >> for a cache of 1 hour, do we expect that per package we do a check more >> than twice per day (daily daemon + random user) > > I told them the average usage based on the stats from qa.d.o Apache logs > (up to 30K requests per day) and said that was a bit high and asked us > to implement a cache. >
That doesn't surprise me in the least! GetDeb actually switched to using my test redirector and in 5 days I logged nearly 32000 hits at my server... each of which would have been passed to sf.net (this was quickly resolved though). >> What about compressing the files? This can reduce the size dramatically. >> Can you please check for the file you used as example? > > Seems pointless to store the raw RSS, best extract the filenames and > store them in a database instead. > Okay... It took a bit of thinking of how to work it, but I've come up with a working solution that caches the file list for each project requested. I am storing each projects' file list in a separate Berkeley DB so we can check the file modification time and only update when the file is older than the cache limit ($cache_time) in seconds (currently 3600 seconds). Currently it is configured to store these files in a subdirectory of cache ($cache_dir), which will need to be writeable by the web server. Otherwise I don't think there is anything else particularly special to report. I have updated my test server and it is now running the latest version of the script. Regards, Daniel
--- ../sf-redirect-old/sf.wml 2014-07-21 19:24:00.835216162 +0100 +++ sf.wml 2014-07-21 19:45:21.683113723 +0100 @@ -1,21 +1,12 @@ <?php - -$data_dir = '/srv/qa.debian.org/data/watch'; - // need to strip leading slash, sf.net doesn't like double slashes $project=ltrim($_SERVER['PATH_INFO'], '/'); +$cache_dir = './cache'; +$cache_time = 3600; if (!$project) { - header('Location: http://manpages.debian.net/cgi-bin/man.cgi?query=uscan'); - exit; -} - -$fdb = $data_dir . '/sf-list.db'; - -if (!file_exists($fdb)) { - header('HTTP/1.0 500 Internal Server Error'); - die('The files database is not available. Please report this message to'. - ' debian...@lists.debian.org'); + header('Location: http://manpages.debian.net/cgi-bin/man.cgi?query=uscan'); + exit; } // $project is not a file and doesn't have trailing slash @@ -29,40 +20,60 @@ exit; } -$db = dba_open($fdb, 'r', 'db4'); +$db_file = "$cache_dir/$project.db"; -if (!dba_exists($project, $db)) { - header('HTTP/1.0 404 File Not Found'); - die('There is no information about the '.$project.' project.'); +if (file_exists($db_file) and time() - filemtime($db_file) < $cache_time ) { + # Open the db_file for reading + $db = dba_open($db_file, 'r', 'db4'); +} else { + $xml_url = "https://sourceforge.net/projects/$project/rss"; + # Update/create the db_file, then read it's contents + # Load the rss feed using simplexml + $xml = @simplexml_load_file($xml_url, 'SimpleXMLElement', LIBXML_NOCDATA); + if ($xml === false) { + echo "No project named $project could be found, check the project name and try again"; + exit; + } else { + # Get an array of files from the XML + $files = $xml->channel[0]->item; + # Create a new db file + $db = dba_open($db_file . '-new', 'c', 'db4'); + # Add the file list to the db + $i = 0; + foreach ($files as $item) { + dba_insert($i, basename($item->title),$db); + $i++; + } + dba_close($db); + rename($db_file . '-new', $db_file); + $db = dba_open($db_file, 'r', 'db4'); + } } - -?><html> +?> +<html> <head> <title>File listing for project <?php echo htmlspecialchars($project); ?></title> </head> <body> <p> <h1>File listing for project <?php echo htmlspecialchars($project); ?></h1> -Visit <a href="http://sf.net/projects/<?php echo htmlspecialchars($project); ?>"><?php echo -htmlspecialchars($project); ?>'s project page</a>.<br/><br/> +Visit <a href="http://sf.net/projects/<?php echo htmlspecialchars($project); ?>"><?php +echo htmlspecialchars($project); ?>'s project page</a>.<br><br> <?php -echo dba_fetch($project, $db); +$key = dba_firstkey($db); +while ($key !== False) { + $file = dba_fetch($key, $db); + $link = $_SERVER['SCRIPT_NAME'] . "/$project/$file"; + echo "<a href='$link'>$file</a><br>\n"; + $key = dba_nextkey($db); +} ?> </p> -<p> -Thanks to <a href="http://ftp.heanet.ie/">HEAnet's mirror service</a> -for being the source of data for this service. -</p> +<p>Last database update: <?php echo date(DATE_RFC822, filemtime($db_file)); ?></p> <p> Get the source code: <a href="svn://anonscm.debian.org/svn/qa/trunk/wml/watch">checkout SVN repository</a> | <a href="http://anonscm.debian.org/viewvc/qa/trunk/wml/watch/">browse SVN repository</a> </p> -<p> Last database update: -<?php echo date(DATE_RFC822, filemtime($fdb)); ?> -</p> </body> -</html><?php - -dba_close($db); - -?> +</html> +<?php dba_close($db); ?>
signature.asc
Description: OpenPGP digital signature