No. This is just a Hadoop file input class. Distributed Hadoop has to get files from a distributed file service. It sounds like you want some kind of distributed file service that maps a TaskNode (??) on a given server to the files available on that server. There might be something that does this. HDFS works very hard at doing this; are you sure it is not good enough? I am endlessly amazed at the speed of these distributed apps.
Have you done a proof of concept? On Thu, Jul 26, 2012 at 7:40 PM, Trung Pham <tr...@phamcom.com> wrote: > Can it read distributed lucene indexes in SolrCloud? > On Jul 26, 2012 7:11 PM, "Lance Norskog" <goks...@gmail.com> wrote: > >> Mahout includes a file reader for Lucene indexes. It will read from >> HDFS or local disks. >> >> On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni <dar...@ontrenet.com> >> wrote: >> > You raise an interesting possibility. A map/reduce solr handler over >> > solrcloud....... >> > >> > On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote: >> > >> >> I think the performance should be close to Hadoop running on HDFS, if >> >> somehow Hadoop job can directly read the Solr Index file while executing >> >> the job on the local solr node. >> >> >> >> Kindna like how HBase and Cassadra integrate with Hadoop. >> >> >> >> Plus, we can run the map reduce job on a standby Solr4 cluster. >> >> >> >> This way, the documents in Solr will be our primary source of truth. >> And we >> >> have the ability to run near real time search queries and analytics on >> it. >> >> No need to export data around. >> >> >> >> Solr4 is becoming a very interesting solution to many web scale >> problems. >> >> Just missing the map/reduce component. :) >> >> >> >> On Thu, Jul 26, 2012 at 3:01 PM, Darren Govoni <dar...@ontrenet.com> >> wrote: >> >> >> >> > Of course you can do it, but the question is whether this will produce >> >> > the performance results you expect. >> >> > I've seen talk about this in other forums, so you might find some >> prior >> >> > work here. >> >> > >> >> > Solr and HDFS serve somewhat different purposes. The key issue would >> be >> >> > if your map and reduce code >> >> > overloads the Solr endpoint. Even using SolrCloud, I believe all >> >> > requests will have to go through a single >> >> > URL (to be routed), so if you have thousands of map/reduce jobs all >> >> > running simultaneously, the question is whether >> >> > your Solr is architected to handle that amount of throughput. >> >> > >> >> > >> >> > On Thu, 2012-07-26 at 14:55 -0700, Trung Pham wrote: >> >> > >> >> > > Is it possible to run map reduce jobs directly on Solr4? >> >> > > >> >> > > I'm asking this because I want to use Solr4 as the primary storage >> >> > engine. >> >> > > And I want to be able to run near real time analytics against it as >> well. >> >> > > Rather than export solr4 data out to a hadoop cluster. >> >> > >> >> > >> >> > >> > >> > >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> -- Lance Norskog goks...@gmail.com