Hi,

How do people with public search services deal with bots/crawlers?
And I don't mean to ask how one bans them (robots.txt) or slow them down (Delay 
stuff in robots.txt) or prevent them from digging too deep in search results...

What I mean is that when you have publicly exposed search that bots crawl, they 
issue all kinds of crazy "queries" that result in errors, that add noise to 
Solr 
caches, increase Solr cache evictions, etc. etc.

Are there some known recipes for dealing with them, minimizing their negative 
side-effects, while still letting them crawl you?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Reply via email to