Hi Moumita,
Once, I used https://code.google.com/p/boilerpipe/ to remove common
header/footers etc.
Ahmet
On Tuesday, November 11, 2014 10:41 AM, Moumita Dhar01
wrote:
Hi,
I am using Nutch 1.9 and Solr 4.6 to index a web application with approximately
100 distinct URL and contents.
Nutc
Hi,
I am using Nutch 1.9 and Solr 4.6 to index a web application with approximately
100 distinct URL and contents.
Nutch is used to fetch the urls, links and the crawl the entire web application
to extract all the content for all pages, and send the content to Solr.
The problem that I have