I'm pretty sparse on my Nutch knowledge, you'd probably get more knowledgable answers on the Nutch mailing list.
Best Erick On Fri, Nov 12, 2010 at 11:52 PM, Dennis Gearon <gear...@sbcglobal.net>wrote: > Actually, can Nutch be used for SCRAPING, not crawling? > > I don't just want the url, I want the data assigned to specific fields, no > matter what site or format it is coming from. > > I've done scraping, but it had to be custom tailored for each target. > > > > Dennis Gearon > > > Signature Warning > ---------------- > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > > > > ----- Original Message ---- > From: Dennis Gearon <gear...@sbcglobal.net> > To: solr-user@lucene.apache.org > Sent: Fri, November 12, 2010 8:46:31 PM > Subject: filtering or getting accurate crawling results > > How easy is it to get good results from the Lucene crawling software? > > Let's say for example I wanted only information about a general subject, > but > nothing else? (Sorry, not ready to say what exactly at this point) Is it > like > tuning Solr, or IS it tuning Solr to just not accept what does not fit the > desire results? > > The amount of information that I'd want is LARGE, but a drop in the bucket > compared to google itself. > > > > Dennis Gearon > > > Signature Warning > ---------------- > It is always a good idea to learn from your own mistakes. It is usually a > better > > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. >