That is exactly what I want.
I want the distributed Hadoop TaskNode to be running on the same server
that is holding the local distributed solr index. This way there is no need
to move any data around... I think other people call this feature 'data
locality' of map/reduce.
I believe HBase and Had
No. This is just a Hadoop file input class. Distributed Hadoop has to
get files from a distributed file service. It sounds like you want
some kind of distributed file service that maps a TaskNode (??) on a
given server to the files available on that server. There might be
something that does this.
Can it read distributed lucene indexes in SolrCloud?
On Jul 26, 2012 7:11 PM, "Lance Norskog" wrote:
> Mahout includes a file reader for Lucene indexes. It will read from
> HDFS or local disks.
>
> On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni
> wrote:
> > You raise an interesting possibility.
Mahout includes a file reader for Lucene indexes. It will read from
HDFS or local disks.
On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni wrote:
> You raise an interesting possibility. A map/reduce solr handler over
> solrcloud...
>
> On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote:
>
>> I
You raise an interesting possibility. A map/reduce solr handler over
solrcloud...
On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote:
> I think the performance should be close to Hadoop running on HDFS, if
> somehow Hadoop job can directly read the Solr Index file while executing
> the job o
I think the performance should be close to Hadoop running on HDFS, if
somehow Hadoop job can directly read the Solr Index file while executing
the job on the local solr node.
Kindna like how HBase and Cassadra integrate with Hadoop.
Plus, we can run the map reduce job on a standby Solr4 cluster.
It's not free (for production use anyway), but you might consider DataStax
Enterprise: http://www.datastax.com/products/enterprise
It is a very nice consolidation of Cassandra, Solr and Hadoop. No ETL required.
Cheers,
Jeff
On Jul 26, 2012, at 3:55 PM, Trung Pham wrote:
> Is it possible to r
Of course you can do it, but the question is whether this will produce
the performance results you expect.
I've seen talk about this in other forums, so you might find some prior
work here.
Solr and HDFS serve somewhat different purposes. The key issue would be
if your map and reduce code
overload
Is it possible to run map reduce jobs directly on Solr4?
I'm asking this because I want to use Solr4 as the primary storage engine.
And I want to be able to run near real time analytics against it as well.
Rather than export solr4 data out to a hadoop cluster.