Thank you very much for your replies, Yes Otis one possibility is to copy my data do HDFS and then apply a Map function to create the intermediate indexes across the cluster using SOLR java library in HDFS.
I have some doubts concerning this solution: 1 - The intermediate indexes created realy need to be merged? I mean is there any mechanis in SOLR CLOUD to easily combine those intermediate indexes and serve them as if they were a "whole index", in a distributed fashion? 2- Can I serve these diferent index with SOLR or SOLR CLOUD directly in HDFS? Google says no :), so maybe I need to copy the indexes to a local file system and point Solr at it. Timothy thak you for your tips. I am looking at Pig. CloudSolrServer seams an interesting piece of the architecture, especially to discover Solr endpoints and then possibly replicate my index, but I was wondering if I need to implement that or if Solr will take care of that for me. Maybe I just didn't get your tip due to my newbie knowledge in Solr. I am sorry if I am confusing some concepts or not being very precise in my words. Jack thank you for sharing DataSax solution, I will definetey take a look since it's free :). But anyway the objective of this project is for me to learn Solr and Hadoop. :) Thank you, Rui Vaz