Solr scraping: Nutch and other alternatives.

Luis Cappa Banda Mon, 17 Oct 2011 23:31:00 -0700

Hello everyone.

I've been thinking about a way to retrieve information from a domain (for
example, http://www.ign.com) to process and index. My idea is to use Solr as
a searcher. I'm familiarized with Apache Nutch and I know that the latest
version has a gateway to Solr to retrieve and index information with it. I
tried it and it worked fine, but it's a little bit complex to develop
plugins to process info and index it in a new field desired. Perhaps one of
you have tried another (and better) alternative to data mine web
information. Which is your recommendation? Can you give me any scraping
suggestion?


Thank you very much.

Luis Cappa.

Solr scraping: Nutch and other alternatives.

Reply via email to