Hi, this question is a little off topic, but I thought since so many people
on this are probably experts in this field, someone may know.
I'm experimenting with my own semantic-based search engine, but I want to
test it with a large corpus of web pages. Ideally I would like to have a
list of the
On Fri, 17 Sep 2010 04:46:44 -0700 (PDT), kenf_nc
wrote:
>A slightly different route to take, but one that should help test/refine a
>semantic parser is wikipedia. They make available their entire corpus, or
>any subset you define. The whole thing is like 14 terabytes, but you can get
>smaller se
On Thu, 16 Sep 2010 15:31:02 -0700, you wrote:
>The public terabyte dataset project would be a good match for what you
>need.
>
>http://bixolabs.com/datasets/public-terabyte-dataset-project/
>
>Of course, that means we have to actually finish the crawl & finalize
>the Avro format we use for th
Hi, I'm curious as to what approaches one would take to defend against users
attacking a Solr service, especially if exposed to the internet as opposed
to an intranet. I'm fairly new to Solr, is there anything built in?
Is there anything in place to prevent the search engine from getting
overwhel