How many documents do you have in your index? How many unique queries per day, bot and human? What are your cache hit ratios?
Maybe you can increase the size of the caches and not worry about it. Search engine position is important. Have marketing pay for the extra memory (I'm not kidding). Sending all the bot queries to a separate machine is also a reasonable approach. Heck, bill that machine to marketing! wunder On 9/1/08 7:34 AM, "Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote: > Apart from hacking the internals, there's nothing inside Solr which will let > you do that. EHCache is for application layer caches, Solr is an external > server so it can't know about your application. I think that over a period > of time, the caches will be back to normal (through user-generated requests) > and it shouldn't be a big problem. > > How slow are your user queries becoming? Will it help if you limit all bot > queries to certain fixed number of Solr instances? > > On Mon, Sep 1, 2008 at 7:44 PM, Tobias Hill <[EMAIL PROTECTED]> wrote: > >> Maybe I was a bit unclear, let me try with other words. >> >> I didn't have the statistic-page in mind. All I care about is that I don't >> want a massive amount of bot-generated queries affect the internal >> statistics of the caches in Solr. If caching would be possible to switch >> off for bot-queries the cache would reflect the human search pattern >> much better. This in turn increases the cache hit-rate enormously >> for the clients that we do care most about (i.e. humans). >> >> Think about it: Say that you have 10-20 queries per second coming from >> bots exploring the corners of your data (because that is what they do best) >> ... >> wouldn't you consider it a problem that this result (which is highly >> unlikely >> to get another hit during it's lifetime) gets cached pushing out other >> (possibly >> human-generated) items from the cache in a LRU-fashion? >> >> Most other cache solutions I've worked with offer ways to handle things >> like >> >> this by providing silent ways (statistically-wise) to get the data from the >> cache. >> >> For instance, we are using EHCache for another part of our application like >> this: >> >> Result result = >> search.isCacheUpdateAllowed() ? cache.get(search) : cache.*getQuietly* >> (search); >> >> Equally, we never put any results emanating from a bot into that EHCache. >> And when we did the hit-rate on the cache was much worse than it is today. >> >> >> So my query remains: Is there an easy way to instruct solar to handle my >> request >> *quietly* cache-statistically-wise(*)? >> >> Best regards, >> Tobias >> >> >> (*) i.e. instruct solar to: >> a1) serve result from the cache if possible >> a2) ... and if so never update statistics of the cache for this >> "get". >> >> - or - >> >> b1) serve the results from the index >> a2) ... and if so never put that result in the cache. >> >> >> >> >> >> >> 2008/9/1 Shalin Shekhar Mangar <[EMAIL PROTECTED]> >> >>> If you are serving cached queries to the bot, what would be the benefit >> of >>> suppressing those queries from figuring into the cache statistics page? >>> >>> On Mon, Sep 1, 2008 at 2:46 PM, Tobias Hill <[EMAIL PROTECTED]> >> wrote: >>> >>>> Hi all, >>>> >>>> Is there any way to suppress that a certain query gets added to the >>>> caches (or is allowed to affect cache statistics) in Solr? >>>> >>>> *Reason:* We have a very search oriented website. The SEO-aspects >>>> of the site is also important why almost the entire search-space is >>>> traversable for indexing bots (googlebot for instance). These bots >>>> are a substantial part of the traffic on the site*. Needless to say, >> the >>>> usage pattern of a bot is very different from a human being ... and >>>> in short the bots are filling the caches with "corner-data" from the >>>> search-space. As a consequence human initiated searches suffer >>>> a lot and are far from *as cached as they could be*. >>>> >>>> I have no problem with serving a bot a cached page, the only problem >>>> is that the bots are allowed to be part of the cache-statistics. >>>> >>>> Is there any way to easily suppress this? >>>> >>>> Best regards, >>>> Tobias >>>> >>>> >>>> *) Actually this is not rare, see "Release It!: Design and Deploy >>>> Production-Ready Software"-book for more details on this reality. >>>> >>> >>> >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >> > >