1. Nutch follows the links within HTML web pages to crawl the full graph of
a web of pages.
2. Think of a core as an SQL table - each table/core has a different type of
data.
3. SolrCloud is all about scaling and availability - multiple shards for
larger collections and multiple replicas for both scaling of query response
and availability if nodes go down.
-- Jack Krupansky
-----Original Message-----
From: rashmi maheshwari
Sent: Tuesday, January 28, 2014 11:36 AM
To: solr-user@lucene.apache.org
Subject: Solr & Nutch
Hi,
Question1 --> When Solr could parse html, documents like doc, excel pdf
etc, why do we need nutch to parse html files? what is different?
Questions 2: When do we use multiple core in solar? any practical business
case when we need multiple cores?
Question 3: When do we go for cloud? What is meaning of implementing solr
cloud?
--
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org