RE: crawling all links of same domain in nutch in solr

2014-07-29 Thread Markus Jelsma
Hi - use the domain URL filter plugin and list the domains, hosts or TLD's you want to restrict the crawl to. -Original message- > From:Vivekanand Ittigi > Sent: Tuesday 29th July 2014 7:17 > To: solr-user@lucene.apache.org > Subject: crawling all links of same d

crawling all links of same domain in nutch in solr

2014-07-28 Thread Vivekanand Ittigi
Hi, Can anyone tel me how to crawl all other pages of same domain. For example i'm feeding a website http://www.techcrunch.com/ in seed.txt. Following property is added in nutch-site.xml db.ignore.internal.links false If true, when adding new links to a page, links from the same host ar