Have a look at Nutch2, it is decoupled from HDFS and can store docs in e.g. 
HBase or other NoSql store.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

11. feb. 2013 kl. 06:16 skrev SivaKarthik <sivakarthik.kpa...@gmail.com>:

> Dear Erick,
>   Thanks for ur relpy..
>   ya..nutch can meet my requirement... 
>  but the problem is, i want to store the crawled document in html or xml
> format instead of mapreduce format..
>  not sure nutch plugins available to convert into xml files.
>  please share me if you any idea .
> 
> ThankYou
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4039619.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to