Re: Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Reyes, Mark
> >> >> -Original message- >> > From:Reyes, Mark >> > Sent: Friday 1st November 2013 17:24 >> > To: solr-user@lucene.apache.org >> > Subject: Exclude urls without 'www' from Nutch 1.7 crawl >> > >> > I'm current

Re: Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Furkan KAMACI
t; Also, please ask questions on the Nutch list, you're on Solr now :) > > > -Original message- > > From:Reyes, Mark > > Sent: Friday 1st November 2013 17:24 > > To: solr-user@lucene.apache.org > > Subject: Exclude urls without 'www' from Nutch

RE: Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Markus Jelsma
Hi - Use the domain-urlfilter for host, domain and TLD filtering. Also, please ask questions on the Nutch list, you're on Solr now :) -Original message- > From:Reyes, Mark > Sent: Friday 1st November 2013 17:24 > To: solr-user@lucene.apache.org > Subject: Exclude

Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Reyes, Mark
I'm currently using Nutch 1.7 to crawl my domain. My issue is specific to URLs being indexed as www vs. non-www. Specifically, after firing the crawl and index to Solr 4.5 then validating the results on the front-end with AJAX Solr, the search results page lists results/pages that are both 'www