Actually, can Nutch be used for SCRAPING, not crawling? I don't just want the url, I want the data assigned to specific fields, no matter what site or format it is coming from.
I've done scraping, but it had to be custom tailored for each target. Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. ----- Original Message ---- From: Dennis Gearon <gear...@sbcglobal.net> To: solr-user@lucene.apache.org Sent: Fri, November 12, 2010 8:46:31 PM Subject: filtering or getting accurate crawling results How easy is it to get good results from the Lucene crawling software? Let's say for example I wanted only information about a general subject, but nothing else? (Sorry, not ready to say what exactly at this point) Is it like tuning Solr, or IS it tuning Solr to just not accept what does not fit the desire results? The amount of information that I'd want is LARGE, but a drop in the bucket compared to google itself. Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die.