: Re: Indexing URLs for Binaries
Check suffix-urlfilter.txt in your conf directory for Nutch. You might be
prohibiting those filetypes from the crawl.
- Mark
On 1/3/14, 10:29 AM, "Teague James" wrote:
>I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links
>
Check suffix-urlfilter.txt in your conf directory for Nutch. You might be
prohibiting those filetypes from the crawl.
- Mark
On 1/3/14, 10:29 AM, "Teague James" wrote:
>I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to
>binary files, such as Word, PDF, etc. The craw
I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to
binary files, such as Word, PDF, etc. The crawler crawls the site but I am
not getting the URLs of the links for the binary files no matter how deep I
set the settings for the site. I see the labels for the links in the
conte