I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to binary files, such as Word, PDF, etc. The crawler crawls the site but I am not getting the URLs of the links for the binary files no matter how deep I set the settings for the site. I see the labels for the links in the content, but not the URLs. Any ideas on how I could get those URLs back in my crawl?
- Indexing URLs for Binaries Teague James
- Re: Indexing URLs for Binaries Reyes, Mark
- RE: Indexing URLs for Binaries Teague James