On Tue, Aug 22, 2000 at 08:35:20PM -0600, montefin wrote: [snip] > When every other site worth a damn has a basic, simple, > clear-up-the-obvious, Search Engine, http://www.debian.org/search > complains it has not found a Search Engine worthy of itself. [snip] > Put an even so-so Search Engine at http://www.debian.org/search and you > will see the traffic and inanity (including my own) on this list > plummet!
yes! hear hear. (i still think we need a newbie-centric FAQ which contains mostly pointers to the existing documentation. help them find what they're looking for!) here's a post from the debian-www list from a month ago; i'd like to see somenoe address this: Erik Rossen wrote: > On Tue, 25 Jul 2000, Craig Small wrote: > > On Fri, Jul 21, 2000 at 09:31:56PM +0200, Erik Rossen wrote: > > > entire website into a .deb package, searchable with htdig. How many > > > megabytes would that make? > > Try Gig, like 4 Gig. > > If I search on Altavista, > > "url:www.debian.org" gives about 65,826 pages (say, 66,000 pages) > > "url:www.debian.org AND NOT url:www.debian.org/Lists-Archives" gives about > 9,334 (say, 9,300 pages) > > Assuming that the 4GB number is due to the 66,000 pages, that makes an > average of about 64kB per page. This number seems to be a bit high for me > - - I suspect that Altavista has been obeying robots.txt and that in reality > there are many more pages. > > Anyhow, assuming that one were to use htDig and budget 12kB per page for > word indices (so that the database could be built incrementally), one > gets: > > For everything that AV has seen so far: 66,000 x 12kB = 792,000kB = 773MB > > Ditto, minus the mail archives: 9,300 x 12kB = 111600kB = 109MB > > Would someone with more experience than me tell us if these numbers pose > any difficulties? Unless there is a real need to keep all of the indices > in RAM, shouldn't it be fairly cheap and easy to get this thing > operational right now? Even if the space required was one order of > magnitude greater that what I've calculated? > Erik Rossen ^ > [EMAIL PROTECTED] /e\ > http://www.multimania.com/rossen --- GPG key ID: 2935D0B9