I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to
binary files, such as Word, PDF, etc. The crawler crawls the site but I am
not getting the URLs of the links for the binary files no matter how deep I
set the settings for the site. I see the labels for the links in the
content, but not the URLs. Any ideas on how I could get those URLs back in
my crawl?

Reply via email to