Can it read distributed lucene indexes in SolrCloud? On Jul 26, 2012 7:11 PM, "Lance Norskog" <goks...@gmail.com> wrote:
> Mahout includes a file reader for Lucene indexes. It will read from > HDFS or local disks. > > On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni <dar...@ontrenet.com> > wrote: > > You raise an interesting possibility. A map/reduce solr handler over > > solrcloud....... > > > > On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote: > > > >> I think the performance should be close to Hadoop running on HDFS, if > >> somehow Hadoop job can directly read the Solr Index file while executing > >> the job on the local solr node. > >> > >> Kindna like how HBase and Cassadra integrate with Hadoop. > >> > >> Plus, we can run the map reduce job on a standby Solr4 cluster. > >> > >> This way, the documents in Solr will be our primary source of truth. > And we > >> have the ability to run near real time search queries and analytics on > it. > >> No need to export data around. > >> > >> Solr4 is becoming a very interesting solution to many web scale > problems. > >> Just missing the map/reduce component. :) > >> > >> On Thu, Jul 26, 2012 at 3:01 PM, Darren Govoni <dar...@ontrenet.com> > wrote: > >> > >> > Of course you can do it, but the question is whether this will produce > >> > the performance results you expect. > >> > I've seen talk about this in other forums, so you might find some > prior > >> > work here. > >> > > >> > Solr and HDFS serve somewhat different purposes. The key issue would > be > >> > if your map and reduce code > >> > overloads the Solr endpoint. Even using SolrCloud, I believe all > >> > requests will have to go through a single > >> > URL (to be routed), so if you have thousands of map/reduce jobs all > >> > running simultaneously, the question is whether > >> > your Solr is architected to handle that amount of throughput. > >> > > >> > > >> > On Thu, 2012-07-26 at 14:55 -0700, Trung Pham wrote: > >> > > >> > > Is it possible to run map reduce jobs directly on Solr4? > >> > > > >> > > I'm asking this because I want to use Solr4 as the primary storage > >> > engine. > >> > > And I want to be able to run near real time analytics against it as > well. > >> > > Rather than export solr4 data out to a hadoop cluster. > >> > > >> > > >> > > > > > > > > > -- > Lance Norskog > goks...@gmail.com >