All,
thanks for good feedback.

Letting the load-balancer route bots to a specific slaves
and humans to others seems like the way forward this
time.

Thanks,
Tobias




2008/9/1 Walter Underwood <[EMAIL PROTECTED]>

> How many documents do you have in your index? How many unique
> queries per day, bot and human? What are your cache hit ratios?
>
> Maybe you can increase the size of the caches and not worry about
> it. Search engine position is important. Have marketing pay for
> the extra memory (I'm not kidding).
>
> Sending all the bot queries to a separate machine is also
> a reasonable approach. Heck, bill that machine to marketing!
>
> wunder
>
> On 9/1/08 7:34 AM, "Shalin Shekhar Mangar" <[EMAIL PROTECTED]> wrote:
>
> > Apart from hacking the internals, there's nothing inside Solr which will
> let
> > you do that. EHCache is for application layer caches, Solr is an external
> > server so it can't know about your application. I think that over a
> period
> > of time, the caches will be back to normal (through user-generated
> requests)
> > and it shouldn't be a big problem.
> >
> > How slow are your user queries becoming? Will it help if you limit all
> bot
> > queries to certain fixed number of Solr instances?
> >
> > On Mon, Sep 1, 2008 at 7:44 PM, Tobias Hill <[EMAIL PROTECTED]>
> wrote:
> >
> >> Maybe I was a bit unclear, let me try with other words.
> >>
> >> I didn't have the statistic-page in mind. All I care about is that I
> don't
> >> want a massive amount of bot-generated queries affect the internal
> >> statistics of the caches in Solr. If caching would be possible to switch
> >> off for bot-queries the cache would reflect the human search pattern
> >> much better. This in turn increases the cache hit-rate enormously
> >> for the clients that we do care most about (i.e. humans).
> >>
> >> Think about it: Say that you have 10-20 queries per second coming from
> >> bots exploring the corners of your data (because that is what they do
> best)
> >> ...
> >> wouldn't you consider it a problem that this result (which is highly
> >> unlikely
> >> to get another hit during it's lifetime) gets cached pushing out other
> >> (possibly
> >> human-generated) items from the cache in a LRU-fashion?
> >>
> >> Most other cache solutions I've worked with offer ways to handle things
> >> like
> >>
> >> this by providing silent ways (statistically-wise) to get the data from
> the
> >> cache.
> >>
> >> For instance, we are using EHCache for another part of our application
> like
> >> this:
> >>
> >>  Result result =
> >>     search.isCacheUpdateAllowed() ? cache.get(search) :
> cache.*getQuietly*
> >> (search);
> >>
> >> Equally, we never put any results emanating from a bot into that
> EHCache.
> >> And when we did the hit-rate on the cache was much worse than it is
> today.
> >>
> >>
> >> So my query remains: Is there an easy way to instruct solar to handle my
> >> request
> >> *quietly* cache-statistically-wise(*)?
> >>
> >> Best regards,
> >> Tobias
> >>
> >>
> >> (*) i.e. instruct solar to:
> >>      a1) serve result from the cache if possible
> >>          a2) ... and if so never update statistics of the cache for this
> >> "get".
> >>
> >>       - or -
> >>
> >>      b1) serve the results from the index
> >>          a2) ... and if so never put that result in the cache.
> >>
> >>
> >>
> >>
> >>
> >>
> >> 2008/9/1 Shalin Shekhar Mangar <[EMAIL PROTECTED]>
> >>
> >>> If you are serving cached queries to the bot, what would be the benefit
> >> of
> >>> suppressing those queries from figuring into the cache statistics page?
> >>>
> >>> On Mon, Sep 1, 2008 at 2:46 PM, Tobias Hill <[EMAIL PROTECTED]>
> >> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> Is there any way to suppress that a certain query gets added to the
> >>>> caches (or is allowed to affect cache statistics) in Solr?
> >>>>
> >>>> *Reason:* We have a very search oriented website. The SEO-aspects
> >>>> of the site is also important why almost the entire search-space is
> >>>> traversable for indexing bots (googlebot for instance). These bots
> >>>> are a substantial part of the traffic on the site*. Needless to say,
> >> the
> >>>> usage pattern of a bot is very different from a human being ... and
> >>>> in short the bots are filling the caches with "corner-data" from the
> >>>> search-space. As a consequence human initiated searches suffer
> >>>> a lot and are far from *as cached as they could be*.
> >>>>
> >>>> I have no problem with serving a bot a cached page, the only problem
> >>>> is that the bots are allowed to be part of the cache-statistics.
> >>>>
> >>>> Is there any way to easily suppress this?
> >>>>
> >>>> Best regards,
> >>>> Tobias
> >>>>
> >>>>
> >>>> *) Actually this is not rare, see "Release It!: Design and Deploy
> >>>>   Production-Ready Software"-book for more details on this reality.
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Shalin Shekhar Mangar.
> >>>
> >>
> >
> >
>
>

Reply via email to