No. This is just a Hadoop file input class. Distributed Hadoop has to
get files from a distributed file service. It sounds like you want
some kind of distributed file service that maps a TaskNode (??) on a
given server to the files available on that server. There might be
something that does this. HDFS works very hard at doing this; are you
sure it is not good enough? I am endlessly amazed at the speed of
these distributed apps.

Have you done a proof of concept?

On Thu, Jul 26, 2012 at 7:40 PM, Trung Pham <tr...@phamcom.com> wrote:
> Can it read distributed lucene indexes in SolrCloud?
> On Jul 26, 2012 7:11 PM, "Lance Norskog" <goks...@gmail.com> wrote:
>
>> Mahout includes a file reader for Lucene indexes. It will read from
>> HDFS or local disks.
>>
>> On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni <dar...@ontrenet.com>
>> wrote:
>> > You raise an interesting possibility. A map/reduce solr handler over
>> > solrcloud.......
>> >
>> > On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote:
>> >
>> >> I think the performance should be close to Hadoop running on HDFS, if
>> >> somehow Hadoop job can directly read the Solr Index file while executing
>> >> the job on the local solr node.
>> >>
>> >> Kindna like how HBase and Cassadra integrate with Hadoop.
>> >>
>> >> Plus, we can run the map reduce job on a standby Solr4 cluster.
>> >>
>> >> This way, the documents in Solr will be our primary source of truth.
>> And we
>> >> have the ability to run near real time search queries and analytics on
>> it.
>> >> No need to export data around.
>> >>
>> >> Solr4 is becoming a very interesting solution to many web scale
>> problems.
>> >> Just missing the map/reduce component. :)
>> >>
>> >> On Thu, Jul 26, 2012 at 3:01 PM, Darren Govoni <dar...@ontrenet.com>
>> wrote:
>> >>
>> >> > Of course you can do it, but the question is whether this will produce
>> >> > the performance results you expect.
>> >> > I've seen talk about this in other forums, so you might find some
>> prior
>> >> > work here.
>> >> >
>> >> > Solr and HDFS serve somewhat different purposes. The key issue would
>> be
>> >> > if your map and reduce code
>> >> > overloads the Solr endpoint. Even using SolrCloud, I believe all
>> >> > requests will have to go through a single
>> >> > URL (to be routed), so if you have thousands of map/reduce jobs all
>> >> > running simultaneously, the question is whether
>> >> > your Solr is architected to handle that amount of throughput.
>> >> >
>> >> >
>> >> > On Thu, 2012-07-26 at 14:55 -0700, Trung Pham wrote:
>> >> >
>> >> > > Is it possible to run map reduce jobs directly on Solr4?
>> >> > >
>> >> > > I'm asking this because I want to use Solr4 as the primary storage
>> >> > engine.
>> >> > > And I want to be able to run near real time analytics against it as
>> well.
>> >> > > Rather than export solr4 data out to a hadoop cluster.
>> >> >
>> >> >
>> >> >
>> >
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to