Hello everyone. I've been thinking about a way to retrieve information from a domain (for example, http://www.ign.com) to process and index. My idea is to use Solr as a searcher. I'm familiarized with Apache Nutch and I know that the latest version has a gateway to Solr to retrieve and index information with it. I tried it and it worked fine, but it's a little bit complex to develop plugins to process info and index it in a new field desired. Perhaps one of you have tried another (and better) alternative to data mine web information. Which is your recommendation? Can you give me any scraping suggestion?
Thank you very much. Luis Cappa.