Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Trung Pham
That is exactly what I want. I want the distributed Hadoop TaskNode to be running on the same server that is holding the local distributed solr index. This way there is no need to move any data around... I think other people call this feature 'data locality' of map/reduce. I believe HBase and Had

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Lance Norskog
No. This is just a Hadoop file input class. Distributed Hadoop has to get files from a distributed file service. It sounds like you want some kind of distributed file service that maps a TaskNode (??) on a given server to the files available on that server. There might be something that does this.

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Trung Pham
Can it read distributed lucene indexes in SolrCloud? On Jul 26, 2012 7:11 PM, "Lance Norskog" wrote: > Mahout includes a file reader for Lucene indexes. It will read from > HDFS or local disks. > > On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni > wrote: > > You raise an interesting possibility.

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Lance Norskog
Mahout includes a file reader for Lucene indexes. It will read from HDFS or local disks. On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni wrote: > You raise an interesting possibility. A map/reduce solr handler over > solrcloud... > > On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote: > >> I

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Darren Govoni
You raise an interesting possibility. A map/reduce solr handler over solrcloud... On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote: > I think the performance should be close to Hadoop running on HDFS, if > somehow Hadoop job can directly read the Solr Index file while executing > the job o

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Trung Pham
I think the performance should be close to Hadoop running on HDFS, if somehow Hadoop job can directly read the Solr Index file while executing the job on the local solr node. Kindna like how HBase and Cassadra integrate with Hadoop. Plus, we can run the map reduce job on a standby Solr4 cluster.

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Schmidt Jeff
It's not free (for production use anyway), but you might consider DataStax Enterprise: http://www.datastax.com/products/enterprise It is a very nice consolidation of Cassandra, Solr and Hadoop. No ETL required. Cheers, Jeff On Jul 26, 2012, at 3:55 PM, Trung Pham wrote: > Is it possible to r

Re: Map/Reduce directly against solr4 index.

2012-07-26 Thread Darren Govoni
Of course you can do it, but the question is whether this will produce the performance results you expect. I've seen talk about this in other forums, so you might find some prior work here. Solr and HDFS serve somewhat different purposes. The key issue would be if your map and reduce code overload

Map/Reduce directly against solr4 index.

2012-07-26 Thread Trung Pham
Is it possible to run map reduce jobs directly on Solr4? I'm asking this because I want to use Solr4 as the primary storage engine. And I want to be able to run near real time analytics against it as well. Rather than export solr4 data out to a hadoop cluster.