But note that MapReduce and HDFS are not the only way to go.
For example, can you split your source data?  If you can, you could do
that, put them on N machines, and run indexer on all of them, each for
some number of threads.  Of course, your Solr(Cloud?) cluster better
have enough servers/CPU cores and fast enough disk and network to
handle the input and get maxed out only at a fairly high N.  What that
N should be depends on how quickly you need to index your 1 TB of
data.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jun 25, 2013 at 9:13 AM, James Thomas <jtho...@camstar.com> wrote:
>>> The problem I am facing is how to read those data from hard disks which are 
>>> not HDFS
>
> If you are planning to use a Map-Reduce job to do the indexing then the 
> source data will definitely have to be on HDFS.
> The Map function can transform the source data to Solr documents and send 
> them to Solr  (e.g. via CloudSolrServer Java API) for indexing.
>
> -- James
>
> -----Original Message-----
> From: engy.morsy [mailto:engy.mo...@bibalex.org]
> Sent: Tuesday, June 25, 2013 3:14 AM
> To: solr-user@lucene.apache.org
> Subject: Solr indexer and Hadoop
>
> Hi All,
>
> I have TB of data that need to be indexed. I am trying to use hadoop to index 
> those TB. I am still newbie.
> I thought that the Map function will read data from hard disks and the reduce 
> function will index them. The problem I am facing is how to read those data 
> from hard disks which are not HDFS.
>
> I understand that the data to be indexed must be on HDFS, don't they? or I am 
> missing something here.
>
> I can't convert the nodes on which the data resides to HDFS. Can anyone 
> please help.
>
> I would also appreciate if you can provide a good tutorial for solr indexing 
> using hadoop. I googled alot but I did not find a sufficient one.
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-indexer-and-Hadoop-tp4072951.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to