Hi - use the domain URL filter plugin and list the domains, hosts or TLD's you
want to restrict the crawl to.
-Original message-
> From:Vivekanand Ittigi
> Sent: Tuesday 29th July 2014 7:17
> To: solr-user@lucene.apache.org
> Subject: crawling all links of same d
Hi,
Can anyone tel me how to crawl all other pages of same domain.
For example i'm feeding a website http://www.techcrunch.com/ in seed.txt.
Following property is added in nutch-site.xml
db.ignore.internal.links
false
If true, when adding new links to a page, links from
the same host ar