Have a look at Nutch2, it is decoupled from HDFS and can store docs in e.g. HBase or other NoSql store.
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 11. feb. 2013 kl. 06:16 skrev SivaKarthik <sivakarthik.kpa...@gmail.com>: > Dear Erick, > Thanks for ur relpy.. > ya..nutch can meet my requirement... > but the problem is, i want to store the crawled document in html or xml > format instead of mapreduce format.. > not sure nutch plugins available to convert into xml files. > please share me if you any idea . > > ThankYou > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4039619.html > Sent from the Solr - User mailing list archive at Nabble.com.