On Sep 11, 2011, at 7:04pm, dpt9876 wrote: > Hi thanks for the reply. > > How does nutch/solr handle the scenario where 1 website calls price, "price" > and another website calls it "cost". Same thing different name, yet I would > want the facet to handle that and not create a different facet. > > Is this combo of nutch and Solr that intelligent and or intuitive?
What you're describing here is web mining, not web crawling. You want to extract price data from web pages, and put that into a specific field in Solr. To do that using Nutch, you'd need to write custom plug-ins that know how to extract the price from a page, and add that as a custom field to the crawl results. The above is a topic for the Nutch mailing list, since Solr is just a downstream consumer of whatever Nutch provides. -- Ken > On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" < > ml-node+s472066n3328340...@n3.nabble.com> wrote: >> >> >> Nope, there's nothing in Solr that crawls anything, you have to feed >> documents in yourself from the websites. >> >> Or, look at the Nutch project, see: http://nutch.apache.org/about.html >> >> which is designed for this kind of problem. >> >> Best >> Erick >> >> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 <daninthetrop...@gmail.com> > wrote: >>> Hi all, >>> I am wondering if Solr will do the following for a project I am working > on. >>> I want to create a search engine with facets for potentially hundreds of >>> websites. >>> Similar to say crawling amazon + buy.com + ebay and someone can search > these >>> 3 sites from my 1 website. >>> (I realise there are better ways of doing the above example, its for >>> illustrative purposes). >>> Eventually I would build that search crawl to index say 200 or 1000 >>> merchants. >>> Someone would come to my site and search for "digital camera". >>> >>> They would get results from all 3 indexes and hopefully dynamic facets eg >>> Price $100-200 >>> Price 200-300 >>> Resolution 1mp-2mp >>> >>> etc etc >>> >>> Can this be done on the fly? >>> >>> I ask this because I am currently developing webscrapers to crawl these >>> websites, dump that data into a db, then was thinking of tacking on a > solr >>> server to crawl my db. >>> >>> Problem with that approach is that crawling the worlds ecommerce sites > will >>> take forever, when it seems solr might do that for me? (I have read about >>> multiple indexes etc). >>> >>> Many thanks >>> >>> -- >>> View this message in context: > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> >> _______________________________________________ >> If you reply to this email, your message will be added to the discussion > below: >> > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html >> >> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini > google with faceted search)?, visit > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg= > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html > Sent from the Solr - User mailing list archive at Nabble.com. -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr